From akarger at CGR.Harvard.edu  Fri Jul  1 11:14:40 2005
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Fri Jul  1 11:04:01 2005
Subject: [Bioperl-l] why string overload is bad
Message-ID: <339D68B133EAD311971E009027DC47970321A7FC@montecarlo.cgr.harvard.edu>

> -----Original Message-----
> From: Ewan Birney [mailto:birney@ebi.ac.uk] 
> Sent: Tuesday, June 28, 2005 9:57 AM
> To: Stefan Kirov
> Cc: Hilmar Lapp; Bioperl
> Subject: Re: [Bioperl-l] why string overload is bad
> 
> 
> >> Do people really want to go the route of string-overloading the 
> >> annotation classes? To me it's really over the top and is a step 
> >> backwards for ease of using the toolkit.
> > 
> > Hilmar definitely has a point here. 
> 
> I have always been against string overloading. The subtly of the bugs
> generated and non-obvious code paths (when Perl wants a number, does
> it go via hte string-overloaded case...)
> 
> I also (personally) think overloading in C++ is bad. I just 
> think overloading
> is bad wherever.

I wouldn't say "wherever". For example, it's probably worth it for complex
number libraries, so that you don't have to use the "plus" function every
time you want to add variables. (Especially because you need to overload
+-*/% etc., so code will be MUCH more readable with the overloaded values.)

That said, IMO it should be used only in cases that are clear wins, not for
minor convenience, or even slightly increased elegance. And it's better if
the overloading is clearly defined & scoped without side effects. It sounds
like there are side effects here. String overloading is probably more side
effect-prone than number, because you do fewer complicated things with
numbers (math, change to boolean) than strings (tons of perl functions, not
to mention m// and s///).

-Amir Karger
From jason.stajich at duke.edu  Fri Jul  1 14:26:15 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Fri Jul  1 14:17:31 2005
Subject: [Bioperl-l] Getting nucleotide seq from protein accession
In-Reply-To: <42BAF1B7.10109@york.ac.uk>
References: <42BAF1B7.10109@york.ac.uk>
Message-ID: <6A814EB4-D5A4-4510-A8B7-4D67197A29DA@duke.edu>

Did you try the FAQ?

http://www.bioperl.org/Core/Latest/faq.html#Q5.4


On Jun 23, 2005, at 1:30 PM, Kat Hull wrote:

> Hi there,
> I was wondering whether anyone has a solution to my problem. I have  
> a list of protein assession numbers and want to retrieve the  
> corresponding nucleotide sequences automatically.  I thought it  
> would be possible to do this by changing the NCBI url, but this  
> doesn't seem to be the case.
> Is there a bio-perl module that can do this?
>
> Kind regards,
> Kat
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12/


From dsam at ucsd.edu  Fri Jul  1 06:51:50 2005
From: dsam at ucsd.edu (dsam@ucsd.edu)
Date: Fri Jul  1 14:27:50 2005
Subject: [Bioperl-l] Re: go-perl
Message-ID: <43846.66.91.255.236.1120215110.squirrel@acs-webmail.ucsd.edu>

Hello,

I tried to install BioPerl using CPAN, but I get quite a few failed tests.

I need BioPerl in order to install the GO Perl API.
These are needed to calculate semantic similarity based on Gene
Ontology (http://www.cs.man.ac.uk/~phillord/semantic_sim.html).

I was wondering what does each failed test mean (e.g. simpleGOparser)
and if the failed tests can be ignored.
Any insight would be greatly appreciated.

Below are the failed tests:

Thanks,
Daniel

/**************
CPAN
**************/

cpan> d /bioperl/
Distribution    A/AL/ALLENDAY/bioperl-microarray-0.1.tar.gz
Distribution    B/BI/BIRNEY/bioperl-0.05.1.tar.gz
Distribution    B/BI/BIRNEY/bioperl-0.6.2.tar.gz
Distribution    B/BI/BIRNEY/bioperl-0.7.0.tar.gz
Distribution    B/BI/BIRNEY/bioperl-1.0.2.tar.gz
Distribution    B/BI/BIRNEY/bioperl-1.0.tar.gz
Distribution    B/BI/BIRNEY/bioperl-1.2.1.tar.gz
Distribution    B/BI/BIRNEY/bioperl-1.2.2.tar.gz
Distribution    B/BI/BIRNEY/bioperl-1.2.3.tar.gz
Distribution    B/BI/BIRNEY/bioperl-1.2.tar.gz
Distribution    B/BI/BIRNEY/bioperl-1.4.tar.gz
Distribution    B/BI/BIRNEY/bioperl-db-0.1.tar.gz
Distribution    B/BI/BIRNEY/bioperl-ext-1.4.tar.gz
Distribution    B/BI/BIRNEY/bioperl-gui-0.7.tar.gz
Distribution    B/BI/BIRNEY/bioperl-run-1.2.2.tar.gz
Distribution    B/BI/BIRNEY/bioperl-run-1.4.tar.gz
Distribution    B/BO/BOZO/Fry-Lib-BioPerl-0.15.tar.gz
Distribution    C/CR/CRAFFI/Bundle-BioPerl-2.1.5.tar.gz
18 items found

cpan>install B/BI/BIRNEY/bioperl-1.4.tar.gz
...
...
> </seqDiff>
t/Variation_IO...............FAILED tests 15, 20, 25
        Failed 3/25 tests, 88.00% okay
t/WABA.......................ok
t/XEMBL_DB...................ok
Failed Test        Stat Wstat Total Fail  Failed  List of Failed
-------------------------------------------------------------------------------
t/BioFetch_DB.t                  27    2   7.41%  20-21
t/DB.t                           78    2   2.56%  30-31
t/EMBL_DB.t                      15    2  13.33%  13-14
t/Ontology.t        255 65280    50  100 200.00%  1-50
t/TreeIO.t                       41    1   2.44%  42
t/Variation_IO.t                 25    3  12.00%  15 20 25
t/simpleGOparser.t  255 65280    98  196 200.00%  1-98
121 subtests skipped.
Failed 7/179 test scripts, 96.09% okay. 156/8268 subtests failed, 98.11%
okay.
make: *** [test_dynamic] Error 2
  /usr/bin/make test -- NOT OK
Running make install
  make test had returned bad status, won't install without force

cpan>
From skirov at utk.edu  Fri Jul  1 14:46:54 2005
From: skirov at utk.edu (Stefan Kirov)
Date: Fri Jul  1 14:38:03 2005
Subject: [Bioperl-l] Getting nucleotide seq from protein accession
In-Reply-To: <6A814EB4-D5A4-4510-A8B7-4D67197A29DA@duke.edu>
References: <42BAF1B7.10109@york.ac.uk>
	<6A814EB4-D5A4-4510-A8B7-4D67197A29DA@duke.edu>
Message-ID: <42C58F9E.9060204@utk.edu>

Yup,
It's always useful to read the manual first. But it is not as much fun :-) .
Stefan

Jason Stajich wrote:

> Did you try the FAQ?
>
> http://www.bioperl.org/Core/Latest/faq.html#Q5.4
>
>
> On Jun 23, 2005, at 1:30 PM, Kat Hull wrote:
>
>> Hi there,
>> I was wondering whether anyone has a solution to my problem. I have  
>> a list of protein assession numbers and want to retrieve the  
>> corresponding nucleotide sequences automatically.  I thought it  
>> would be possible to do this by changing the NCBI url, but this  
>> doesn't seem to be the case.
>> Is there a bio-perl module that can do this?
>>
>> Kind regards,
>> Kat
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l@portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> -- 
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12/
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l

From jason.stajich at duke.edu  Fri Jul  1 14:48:26 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Fri Jul  1 14:39:34 2005
Subject: [Bioperl-l] TreeIO::nhx doesn't write internal node labels
In-Reply-To: <174361574726.20050630205556@princeton.edu>
References: <174361574726.20050630205556@princeton.edu>
Message-ID: <B52A5F66-B13F-4CEC-BF38-283D5879A38C@duke.edu>

Can you send your code and an example file as a bug in http:// 
bugzilla.open-bio.org?


On Jun 30, 2005, at 12:55 PM, Georgii Bazykin wrote:

> Hi,
>
> I am new to BioPerl, and I am having trouble trying to save a tree in
> NHX format. I load a nexus tree and parse a PAUP log file ("branch
> linkages") to get internal node ids (I will then need to process
> character changes between internal nodes, this is why I need internal
> node ids). I then put write PAUP ids (which are numbers) as ids of
> internal nodes of the tree, and write the tree in nxh format, hoping
> that the internal node labels will be preserved. But the resulting nhx
> file has only empty [&&NHX] labels and no internal node labels. Is
> this a feature, or am I doing something wrong?
>
> Please help!
>
> Yegor
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12/


From cjm at fruitfly.org  Fri Jul  1 15:30:44 2005
From: cjm at fruitfly.org (Chris Mungall)
Date: Fri Jul  1 15:23:31 2005
Subject: [Bioperl-l] Re: go-perl
In-Reply-To: <43846.66.91.255.236.1120215110.squirrel@acs-webmail.ucsd.edu>
References: <43846.66.91.255.236.1120215110.squirrel@acs-webmail.ucsd.edu>
Message-ID: <Pine.OSX.4.58.0507011224440.21307@adsl-68-126-147-89.dsl.pltn13.pacbell.net>


Hi Daniel

In general you should be wary of forcing an install if the tests fail

However, in this case I can tell you that none of the failed tests are of
any consequence for either go-db-perl (bioperl isn't required at all for
go-perl)  or the semantic similarity tool

Cheers
Chris

On Fri, 1 Jul 2005 dsam@ucsd.edu wrote:

> Hello,
>
> I tried to install BioPerl using CPAN, but I get quite a few failed tests.
>
> I need BioPerl in order to install the GO Perl API.
> These are needed to calculate semantic similarity based on Gene
> Ontology (http://www.cs.man.ac.uk/~phillord/semantic_sim.html).
>
> I was wondering what does each failed test mean (e.g. simpleGOparser)
> and if the failed tests can be ignored.
> Any insight would be greatly appreciated.
>
> Below are the failed tests:
>
> Thanks,
> Daniel
>
> /**************
> CPAN
> **************/
>
> cpan> d /bioperl/
> Distribution    A/AL/ALLENDAY/bioperl-microarray-0.1.tar.gz
> Distribution    B/BI/BIRNEY/bioperl-0.05.1.tar.gz
> Distribution    B/BI/BIRNEY/bioperl-0.6.2.tar.gz
> Distribution    B/BI/BIRNEY/bioperl-0.7.0.tar.gz
> Distribution    B/BI/BIRNEY/bioperl-1.0.2.tar.gz
> Distribution    B/BI/BIRNEY/bioperl-1.0.tar.gz
> Distribution    B/BI/BIRNEY/bioperl-1.2.1.tar.gz
> Distribution    B/BI/BIRNEY/bioperl-1.2.2.tar.gz
> Distribution    B/BI/BIRNEY/bioperl-1.2.3.tar.gz
> Distribution    B/BI/BIRNEY/bioperl-1.2.tar.gz
> Distribution    B/BI/BIRNEY/bioperl-1.4.tar.gz
> Distribution    B/BI/BIRNEY/bioperl-db-0.1.tar.gz
> Distribution    B/BI/BIRNEY/bioperl-ext-1.4.tar.gz
> Distribution    B/BI/BIRNEY/bioperl-gui-0.7.tar.gz
> Distribution    B/BI/BIRNEY/bioperl-run-1.2.2.tar.gz
> Distribution    B/BI/BIRNEY/bioperl-run-1.4.tar.gz
> Distribution    B/BO/BOZO/Fry-Lib-BioPerl-0.15.tar.gz
> Distribution    C/CR/CRAFFI/Bundle-BioPerl-2.1.5.tar.gz
> 18 items found
>
> cpan>install B/BI/BIRNEY/bioperl-1.4.tar.gz
> ...
> ...
> > </seqDiff>
> t/Variation_IO...............FAILED tests 15, 20, 25
>         Failed 3/25 tests, 88.00% okay
> t/WABA.......................ok
> t/XEMBL_DB...................ok
> Failed Test        Stat Wstat Total Fail  Failed  List of Failed
> -------------------------------------------------------------------------------
> t/BioFetch_DB.t                  27    2   7.41%  20-21
> t/DB.t                           78    2   2.56%  30-31
> t/EMBL_DB.t                      15    2  13.33%  13-14
> t/Ontology.t        255 65280    50  100 200.00%  1-50
> t/TreeIO.t                       41    1   2.44%  42
> t/Variation_IO.t                 25    3  12.00%  15 20 25
> t/simpleGOparser.t  255 65280    98  196 200.00%  1-98
> 121 subtests skipped.
> Failed 7/179 test scripts, 96.09% okay. 156/8268 subtests failed, 98.11%
> okay.
> make: *** [test_dynamic] Error 2
>   /usr/bin/make test -- NOT OK
> Running make install
>   make test had returned bad status, won't install without force
>
> cpan>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>


From hlapp at gnf.org  Fri Jul  1 16:04:15 2005
From: hlapp at gnf.org (Hilmar Lapp)
Date: Fri Jul  1 15:53:04 2005
Subject: [Bioperl-l] Re: go-perl
In-Reply-To: <43846.66.91.255.236.1120215110.squirrel@acs-webmail.ucsd.edu>
References: <43846.66.91.255.236.1120215110.squirrel@acs-webmail.ucsd.edu>
Message-ID: <9a98b045f729e7056728526cc2ada0fe@gnf.org>

You should probably upgrade to a snapshot from the CVS main trunk (in  
essence equivalent to a 1.5.x version) if you want to use Bioperl.

As ChrisM said, for go-perl bioperl is not required. In fact, the next  
version of Bioperl will optionally depend on go-perl if you want .obo  
formats supported.

	-hilmar

On Jul 1, 2005, at 3:51 AM, dsam@ucsd.edu wrote:

> Hello,
>
> I tried to install BioPerl using CPAN, but I get quite a few failed  
> tests.
>
> I need BioPerl in order to install the GO Perl API.
> These are needed to calculate semantic similarity based on Gene
> Ontology (http://www.cs.man.ac.uk/~phillord/semantic_sim.html).
>
> I was wondering what does each failed test mean (e.g. simpleGOparser)
> and if the failed tests can be ignored.
> Any insight would be greatly appreciated.
>
> Below are the failed tests:
>
> Thanks,
> Daniel
>
> /**************
> CPAN
> **************/
>
> cpan> d /bioperl/
> Distribution    A/AL/ALLENDAY/bioperl-microarray-0.1.tar.gz
> Distribution    B/BI/BIRNEY/bioperl-0.05.1.tar.gz
> Distribution    B/BI/BIRNEY/bioperl-0.6.2.tar.gz
> Distribution    B/BI/BIRNEY/bioperl-0.7.0.tar.gz
> Distribution    B/BI/BIRNEY/bioperl-1.0.2.tar.gz
> Distribution    B/BI/BIRNEY/bioperl-1.0.tar.gz
> Distribution    B/BI/BIRNEY/bioperl-1.2.1.tar.gz
> Distribution    B/BI/BIRNEY/bioperl-1.2.2.tar.gz
> Distribution    B/BI/BIRNEY/bioperl-1.2.3.tar.gz
> Distribution    B/BI/BIRNEY/bioperl-1.2.tar.gz
> Distribution    B/BI/BIRNEY/bioperl-1.4.tar.gz
> Distribution    B/BI/BIRNEY/bioperl-db-0.1.tar.gz
> Distribution    B/BI/BIRNEY/bioperl-ext-1.4.tar.gz
> Distribution    B/BI/BIRNEY/bioperl-gui-0.7.tar.gz
> Distribution    B/BI/BIRNEY/bioperl-run-1.2.2.tar.gz
> Distribution    B/BI/BIRNEY/bioperl-run-1.4.tar.gz
> Distribution    B/BO/BOZO/Fry-Lib-BioPerl-0.15.tar.gz
> Distribution    C/CR/CRAFFI/Bundle-BioPerl-2.1.5.tar.gz
> 18 items found
>
> cpan>install B/BI/BIRNEY/bioperl-1.4.tar.gz
> ...
> ...
>> </seqDiff>
> t/Variation_IO...............FAILED tests 15, 20, 25
>         Failed 3/25 tests, 88.00% okay
> t/WABA.......................ok
> t/XEMBL_DB...................ok
> Failed Test        Stat Wstat Total Fail  Failed  List of Failed
> ----------------------------------------------------------------------- 
> --------
> t/BioFetch_DB.t                  27    2   7.41%  20-21
> t/DB.t                           78    2   2.56%  30-31
> t/EMBL_DB.t                      15    2  13.33%  13-14
> t/Ontology.t        255 65280    50  100 200.00%  1-50
> t/TreeIO.t                       41    1   2.44%  42
> t/Variation_IO.t                 25    3  12.00%  15 20 25
> t/simpleGOparser.t  255 65280    98  196 200.00%  1-98
> 121 subtests skipped.
> Failed 7/179 test scripts, 96.09% okay. 156/8268 subtests failed,  
> 98.11%
> okay.
> make: *** [test_dynamic] Error 2
>   /usr/bin/make test -- NOT OK
> Running make install
>   make test had returned bad status, won't install without force
>
> cpan>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------

From sgoegel at gmail.com  Fri Jul  1 18:27:44 2005
From: sgoegel at gmail.com (SG)
Date: Fri Jul  1 18:19:18 2005
Subject: [Bioperl-l] Download sequence annotations without sequence ??
In-Reply-To: <cc7ec2a05052315263c964c2e@mail.gmail.com>
References: <cc7ec2a05052315263c964c2e@mail.gmail.com>
Message-ID: <200507011727.45112.sgoegel@gmail.com>

I have scripts and modules set up to, for a given blast report, go through and 
download sequences (when not available locally) for certain subjects (hits) 
and extract information such as db_xref fields, geneontology annotations, 
taxon ID, and features.
The one thing I am not using is the actual DNA or amino acid sequence itself.
For large sequences such as genomic DNA, which can be several megabases in 
size or more, it is impractical to download the entire sequence, which I do 
not need.

My question is, does Bioperl currently have a way to download only the 
annotations/features associated with a sequence (in GenBank format, for 
example), but not the sequence itself? If NCBI does not currently offer a way 
to do that, all that would be necessary to do would be to terminate the 
connection with the server when the ORIGIN line is reached.
Of course, that would limit to only one sequence per query, which is perfectly 
fine under the circumstances.
For pipelined downloads (the default), the $/ input separator would have to be 
modified accordingly. I have done this but I want to make sure it's not 
already a standard function of any part of Bioperl. Also, if Bioperl does not 
currently do this, is there interest in a patch to add this functionality 
(assuming I get around to making one)?

SG
From sac at portal.open-bio.org  Fri Jul  1 16:56:07 2005
From: sac at portal.open-bio.org (Steve Chervitz)
Date: Sat Jul  2 12:15:34 2005
Subject: [Bioperl-l] Re: A question about Bioperl module
In-Reply-To: <1120147420.42c417dc5b242@webmail.pobox.upenn.edu>
Message-ID: <BEEAFC6F.1016F%sac@bioperl.org>

Hi Gao Zhang,

No, SeqPattern cannot generate random motifs, and I'm not aware of any
modules in Bioperl than can do so (anyone else know?). The String::Random
module might be sufficient for your needs:

http://search.cpan.org/~steve/String-Random-0.20/Random.pm

Steve


> From: <gaozhang@mail.med.upenn.edu>
> Date: Thu, 30 Jun 2005 12:03:40 -0400
> To: <sac@bioperl.org>
> Subject: A question about Bioperl module
>
>
>
>
> Dear Steve Chervitz,
>
> Hi! This is Gao Zhang, a Ph.D student in Graduate Group
> in Genomics and Computational Biology at University of
> Pennsylvinia. I am working on discovery of motifs using
> DNA sequence and find Bio::Tools::SeqPattern module might be
> helpful for me.
>
> My question is that whether it has any module which is
> able to generate a random motif of width w like 8. In this motif,
> each position will have a dominant letter with probability around
> x like 0.91.
>
> Thank you very much and look forward to your reply!
>
> Best Regards,
>    Gao Zhang
>
>


From senger at ebi.ac.uk  Sun Jul  3 08:50:17 2005
From: senger at ebi.ac.uk (Martin Senger)
Date: Sun Jul  3 08:42:22 2005
Subject: [Bioperl-l] pubmed article download and storing in object
Message-ID: <Pine.OSF.4.21.0507031341320.662601-100000@ice.ebi.ac.uk>

Hi,
   My few cents regarding the Bio::Biblio module:

   1) You are right that the doumentation of the created Perl objects is
poor. I will try to improve it.

   2) The modules in Bio::Biblio are of two categories: the first ones get
you XML from MEDLINE/Pubmed (by default it gets it from the EBI using
SOAP). And the second ones convert it - either to nothing, so you still
have an XML, or to Perl biblio objects (that are poorly documented, as
mentioned above; blam me), or to a simple hash (with similar names of keys
as used in the Perl objects). I agree that it would be nice to have more
outputs (like a printed versions of various level of details).

   3) The best way to see how the Bio::Biblio modules work is to check the
script bioperl-live/scripts/biblio/biblio.PLS (try with -h first). It uses
all the methods - so you can directly use it, or to copy&paste the code
into your own programs.

   With regards,
   Martin

-- 
Martin Senger

EMBL Outstation - Hinxton                Senger@EBI.ac.uk     
European Bioinformatics Institute        Phone: (+44) 1223 494636      
Wellcome Trust Genome Campus             (Switchboard:     494444)
Hinxton                                  Fax  : (+44) 1223 494468
Cambridge CB10 1SD
United Kingdom                           http://industry.ebi.ac.uk/~senger

From ylin9 at gel.ym.edu.tw  Sun Jul  3 09:00:31 2005
From: ylin9 at gel.ym.edu.tw (Yu-Hsuan Lin(???))
Date: Sun Jul  3 11:32:06 2005
Subject: [Bioperl-l] Problem of installing bioperl-run-1.4 
References: <002501c57d7e$68c66170$7a4e818c@sandy>
	<1120146046.42c4127e11197@webmail.duke.edu>
Message-ID: <000501c57fcf$32025a60$7a4e818c@sandy>

Thank your for your reply.
I can run EMBOSS program directly by typing the program name in my home 
directory.
And it is what I got by typing echo $PATH

/usr/lib/j2re1.5-sun/bin:/usr/lib/j2re1.5-sun:/var/local/sbin:/var/local/bin:/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/games:/home/tools/EMBOSS-2.10.0:/home/tools/EMBOSS-2.10.0/emboss

But I still get the same error message when I type "make test" in command 
line.
Please help me with this problem.
Thank you very much.

Vincent.

----- Original Message ----- 
From: "Jason Stajich" <jason.stajich@duke.edu>
To: "Yu-Hsuan Lin(?L?t?a)" <ylin9@gel.ym.edu.tw>
Cc: <bioperl-l@portal.open-bio.org>
Sent: Thursday, June 30, 2005 11:40 PM
Subject: Re: [Bioperl-l] Problem of installing bioperl-run-1.4


> did you make sure the EMBOSS bin directory is in your PATH?
>
> -jason
> -- 
> Jason Stajich
> jason.stajich at duke.edu
> http://www.duke.edu/~jes12/
>
>
> Quoting "Yu-Hsuan Lin(?L?t?a)" <ylin9@gel.ym.edu.tw>:
>
>> Hi, all,
>>
>> I have a problem to install bioperl-run-1.4. Because I want to use EMBOSS
>> program within
>>
>> my bioperl script, I installed bioperl 1.4 and EMBOSS (
>> /home/tools/EMBOSS-2.10.0 ) in
>>
>> my debian linux system. When I tried to type in command line "make test", 
>> it
>> said
>>
>> t/EMBOSS..................ok
>>
>>         28/30 skipped: EMBOSS not installed locally or XML::Twig not
>> installed
>>
>>
>> I also installed XML::Twig and tried symbolic link from /usr/local to
>>
>> /home/tools/EMBOSS-2.10.0 but still get the same message. Can anyone 
>> kindly
>> tell me how to
>>
>> solve this problem or where to find solution ?
>>
>> Thank you very much,
>>
>> Vincent,
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l@portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>
> 


From jason.stajich at duke.edu  Sun Jul  3 11:43:04 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Sun Jul  3 11:34:12 2005
Subject: [Bioperl-l] Problem of installing bioperl-run-1.4 
In-Reply-To: <000501c57fcf$32025a60$7a4e818c@sandy>
References: <002501c57d7e$68c66170$7a4e818c@sandy>
	<1120146046.42c4127e11197@webmail.duke.edu>
	<000501c57fcf$32025a60$7a4e818c@sandy>
Message-ID: <BA1AE152-6073-4563-BCB0-03E872227D2D@duke.edu>


What do you see when you run the test individually, you will get more  
detailed error messages.

$ perl -I. -w t/EMBOSS.t


On Jul 3, 2005, at 9:00 AM, Yu-Hsuan Lin((???)) wrote:

>>
>> Quoting "Yu-Hsuan Lin(?L?t?a)" <ylin9@gel.ym.edu.tw>:
>>
>>
>>> Hi, all,
>>>
>>> I have a problem to install bioperl-run-1.4. Because I want to  
>>> use EMBOSS
>>> program within
>>>
>>> my bioperl script, I installed bioperl 1.4 and EMBOSS (
>>> /home/tools/EMBOSS-2.10.0 ) in
>>>
>>> my debian linux system. When I tried to type in command line  
>>> "make test", it
>>> said
>>>
>>> t/EMBOSS..................ok
>>>
>>>         28/30 skipped: EMBOSS not installed locally or XML::Twig not
>>> installed
>>>
>>>
>>> I also installed XML::Twig and tried symbolic link from /usr/ 
>>> local to
>>>
>>> /home/tools/EMBOSS-2.10.0 but still get the same message. Can  
>>> anyone kindly
>>> tell me how to
>>>
>>> solve this problem or where to find solution ?
>>>
>>> Thank you very much,
>>>
>>> Vincent,
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l@portal.open-bio.org
>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>
>
>

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12/


From ylin9 at gel.ym.edu.tw  Mon Jul  4 01:06:41 2005
From: ylin9 at gel.ym.edu.tw (Yu-Hsuan Lin(???))
Date: Mon Jul  4 01:30:27 2005
Subject: [Bioperl-l] Problem of installing bioperl-run-1.4 
References: <002501c57d7e$68c66170$7a4e818c@sandy>
	<1120146046.42c4127e11197@webmail.duke.edu>
	<000501c57fcf$32025a60$7a4e818c@sandy>
	<BA1AE152-6073-4563-BCB0-03E872227D2D@duke.edu>
Message-ID: <002c01c58056$2a305300$7a4e818c@sandy>

$ perl -I. -w t/EMBOSS.t
1..30
# Running under perl version 5.008006 for linux
# Current time local: Mon Jul  4 13:01:26 2005
# Current time GMT:   Mon Jul  4 05:01:26 2005
# Using Test.pm version 1.25
ok 1
ok 2
ok 3 # skip EMBOSS not installed locally or XML::Twig not installed
ok 4 # skip EMBOSS not installed locally or XML::Twig not installed
ok 5 # skip EMBOSS not installed locally or XML::Twig not installed
ok 6 # skip EMBOSS not installed locally or XML::Twig not installed
ok 7 # skip EMBOSS not installed locally or XML::Twig not installed
ok 8 # skip EMBOSS not installed locally or XML::Twig not installed
ok 9 # skip EMBOSS not installed locally or XML::Twig not installed
ok 10 # skip EMBOSS not installed locally or XML::Twig not installed
ok 11 # skip EMBOSS not installed locally or XML::Twig not installed
ok 12 # skip EMBOSS not installed locally or XML::Twig not installed
ok 13 # skip EMBOSS not installed locally or XML::Twig not installed
ok 14 # skip EMBOSS not installed locally or XML::Twig not installed
ok 15 # skip EMBOSS not installed locally or XML::Twig not installed
ok 16 # skip EMBOSS not installed locally or XML::Twig not installed
ok 17 # skip EMBOSS not installed locally or XML::Twig not installed
ok 18 # skip EMBOSS not installed locally or XML::Twig not installed
ok 19 # skip EMBOSS not installed locally or XML::Twig not installed
ok 20 # skip EMBOSS not installed locally or XML::Twig not installed
ok 21 # skip EMBOSS not installed locally or XML::Twig not installed
ok 22 # skip EMBOSS not installed locally or XML::Twig not installed
ok 23 # skip EMBOSS not installed locally or XML::Twig not installed
ok 24 # skip EMBOSS not installed locally or XML::Twig not installed
ok 25 # skip EMBOSS not installed locally or XML::Twig not installed
ok 26 # skip EMBOSS not installed locally or XML::Twig not installed
ok 27 # skip EMBOSS not installed locally or XML::Twig not installed
ok 28 # skip EMBOSS not installed locally or XML::Twig not installed
ok 29 # skip EMBOSS not installed locally or XML::Twig not installed
ok 30 # skip EMBOSS not installed locally or XML::Twig not installed

I don't think I need to set $PATH for XML::Twig, should I ?
I installed XML::Twig with CPAN, and it up to date.

Vincent.
  ----- Original Message ----- 
  From: Jason Stajich 
  To: Yu-Hsuan Lin ((???)) 
  Cc: bioperl-l@portal.open-bio.org 
  Sent: Sunday, July 03, 2005 11:43 PM
  Subject: Re: [Bioperl-l] Problem of installing bioperl-run-1.4 


  What do you see when you run the test individually, you will get more detailed error messages.


  $ perl -I. -w t/EMBOSS.t


  On Jul 3, 2005, at 9:00 AM, Yu-Hsuan Lin((???)) wrote:


      Quoting "Yu-Hsuan Lin(?L?t?a)" <ylin9@gel.ym.edu.tw>:


        Hi, all,


        I have a problem to install bioperl-run-1.4. Because I want to use EMBOSS

        program within


        my bioperl script, I installed bioperl 1.4 and EMBOSS (

        /home/tools/EMBOSS-2.10.0 ) in


        my debian linux system. When I tried to type in command line "make test", it

        said


        t/EMBOSS..................ok


                28/30 skipped: EMBOSS not installed locally or XML::Twig not

        installed


        I also installed XML::Twig and tried symbolic link from /usr/local to


        /home/tools/EMBOSS-2.10.0 but still get the same message. Can anyone kindly

        tell me how to


        solve this problem or where to find solution ?


        Thank you very much,


        Vincent,


        _______________________________________________

        Bioperl-l mailing list

        Bioperl-l@portal.open-bio.org

        http://portal.open-bio.org/mailman/listinfo/bioperl-l


  --
  Jason Stajich
  Duke University
  http://www.duke.edu/~jes12/


From michael.watson at bbsrc.ac.uk  Mon Jul  4 03:59:18 2005
From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C))
Date: Mon Jul  4 03:50:34 2005
Subject: [Bioperl-l] BLAST scores
Message-ID: <8975119BCD0AC5419D61A9CF1A923E950172D847@iahce2knas1.iah.bbsrc.reserved>

On another note, I was parsing BLAST output using Bio::SearchIO and
found it took ages - so I switched to BPLite and my parsing took about a
tenth of the time - you may want to try it :-)

-----Original Message-----
From: bioperl-l-bounces@portal.open-bio.org
[mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Josh
Lauricha
Sent: 30 June 2005 22:03
To: bioperl-l@portal.open-bio.org
Subject: [Bioperl-l] BLAST scores


Not really a BioPerl question, but... 

I ran a bunch of blasts using the tablular output. However, I need the
score reported and it apparently doesn't do that. The reason I'm using
the tabular format is to speed parsing, since that was taking more than
half the CPU time... Anyhow, is there anyway to compute the score from
the e-value and/or bit scores? Or am I stuck rerunning all those blasts?

Thanks

-- 

------------------------------------------------------
| Josh Lauricha            | Ford, you're turning    |
| laurichj@bioinfo.ucr.edu | into a penguin. Stop    |
| Bioinformatics, UCR      | it                      |
|----------------------------------------------------|
| OpenPG:                                            |
|  4E7D 0FC0 DB6C E91D 4D7B C7F3 9BE9 8740 E4DC 6184 |
|----------------------------------------------------|
_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l

From Marc.Logghe at devgen.com  Mon Jul  4 05:43:08 2005
From: Marc.Logghe at devgen.com (Marc Logghe)
Date: Mon Jul  4 05:34:29 2005
Subject: [Bioperl-l] SeqWithQuality and biosql
Message-ID: <0C528E3670D8CE4B8E013F6749231AA62F53C7@ANTARESIA.be.devgen.com>

Hi all,
I am currently exploring the possibility to store a
Bio::Seq::SeqWithQuality object in biosql.
Has anyone ever tried this ?
One possibility would be to 
1) split up the Bio::Seq::SeqWithQuality object into a plain
Bio::Seq::RichSeq and a Bio::Seq::PrimaryQual
2) store them separately in biosql; different namespaces
3) link them with a relation term.
4) make a custom adaptor to fetch the persistent objects from biosql and
reconstruct the Bio::Seq::SeqWithQuality

Does that make sense ? Any other suggestions/possibilities ?
As a test I tried to load a Bio::Seq::PrimaryQual in biosql using the
load_seqdatabase.pl but it fails because Bio::Seq::PrimaryQual does not
have a namespace method.
I hope I'm wrong but I have the impression there is a long way to go ;-)

Marc


From heikki at ebi.ac.uk  Mon Jul  4 12:15:20 2005
From: heikki at ebi.ac.uk (Heikki Lehvaslaiho)
Date: Mon Jul  4 12:06:42 2005
Subject: [Bioperl-l] SeqWithQuality and biosql
In-Reply-To: <0C528E3670D8CE4B8E013F6749231AA62F53C7@ANTARESIA.be.devgen.com>
References: <0C528E3670D8CE4B8E013F6749231AA62F53C7@ANTARESIA.be.devgen.com>
Message-ID: <200507041715.20558.heikki@ebi.ac.uk>


Marc,

I have not actually talked about this with Chad, but I've had a long time plan 
to refactor Bio::Seq::SeqWithQuality to inherit from Bio::Seq::MetaI. It does 
not at the moment because Chad was there first. Some time later there were 
some other needs to attach meta information to residues and to avoid having 
several implementations in bioperl I wrote Bio::Seq::MetaI and its 
implementation classes.


I do not know if there are any issues why Bio::Seq::SeqWithQuality could not 
be Bio::Seq::MetaI, but it would be good thing to explore that, and implement 
only one very generic way to store residue-based meta values in biosql.

 -Heikki


On Monday 04 July 2005 10:43, Marc Logghe wrote:
> Hi all,
> I am currently exploring the possibility to store a
> Bio::Seq::SeqWithQuality object in biosql.
> Has anyone ever tried this ?
> One possibility would be to
> 1) split up the Bio::Seq::SeqWithQuality object into a plain
> Bio::Seq::RichSeq and a Bio::Seq::PrimaryQual
> 2) store them separately in biosql; different namespaces
> 3) link them with a relation term.
> 4) make a custom adaptor to fetch the persistent objects from biosql and
> reconstruct the Bio::Seq::SeqWithQuality
>
> Does that make sense ? Any other suggestions/possibilities ?
> As a test I tried to load a Bio::Seq::PrimaryQual in biosql using the
> load_seqdatabase.pl but it fails because Bio::Seq::PrimaryQual does not
> have a namespace method.
> I hope I'm wrong but I have the impression there is a long way to go ;-)
>
> Marc
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/                      http://www.ebi.ac.uk/mutations/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_ebi _ac _uk
    _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
   _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
  _/  _/  _/  Cambridge, CB10 1SD, United Kingdom
     _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________
From heikki at ebi.ac.uk  Tue Jul  5 10:32:47 2005
From: heikki at ebi.ac.uk (Heikki Lehvaslaiho)
Date: Tue Jul  5 10:26:01 2005
Subject: [Bioperl-l] Bio::Tree::Compatible, Bio::Tree::Draw::Cladogram
Message-ID: <200507051532.48179.heikki@ebi.ac.uk>


Gabriel,

While testing bioperl module SYNOPSIS sections for runnability I found out  
that there are two modules in bioperl-live that have external dependencies 
that are not in Makefile.PL:

Bio::Tree::Compatible 
  Testing compatibility of phylogenetic trees with nested taxa.
  depends on Set::Scalar

Bio::Tree::Draw::Cladogram
  Drawing phylogenetic trees in Encapsulated PostScript (EPS) format.
  depends on PostScript::TextBlock.pm

They both are yours. 

I have not been that active on the mailing list lately, so I searched the list 
for a discussion on these new modules. I started getting a bit alarmed that 
there were none, no emails ever to the bioperl mailing list from you. 
Finally, I checked the t (test) directory and there were no tests for these 
modules.

Could we have that discussion now and hopefully at the end of the discussion 
add the dependencies to the Makefile.PL? In a project this big, we have to 
keep each others informed so that we can keep all parts of bioperl functional 
and to avoid confusing and alienating users.


Where do these modules come from?
What functionality do they add?
Is the name space used correct?
Could we see test code that demonstrates the functionality?
Is there something else that you are planning to do?


Yours,

 -Heikki, who feels that he is probably overreacting ;-)
                 ... so do not take it personally


-- 
______ _/      _/_____________________________________________________
      _/      _/                      http://www.ebi.ac.uk/mutations/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_ebi _ac _uk
    _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
   _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
  _/  _/  _/  Cambridge, CB10 1SD, United Kingdom
     _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________
From jinsun at indiana.edu  Tue Jul  5 10:05:15 2005
From: jinsun at indiana.edu (jinsun@indiana.edu)
Date: Tue Jul  5 10:47:48 2005
Subject: [Bioperl-l] [Bioperl-guts-l] a question of retrieval information
Message-ID: <89DD54D7-C124-4BB2-8476-472AF90EC92F@indiana.edu>


To whom it is concerned:

I try to write a perl program using bioperl and want to retrieve  
information
from ncbi website. The purpose of this program is to get protein's  
annotation
with a gi number. For example if given a gi number 16128448, I would  
get acrAB
operon repressor [Escherichia coli K12] gi|16128448|ref|NP_414997.1| 
[16128448].

I wrote bioperl like this:

use Bio::DB::GenBank;

$gb = new Bio::DB::GenBank;
$seqobj = $gb->get_Seq_by_gi('16128448');
$ann_coll = $seqobj->annotation;

for $ann ($ann_coll->get_Annotations) {
     print "Features: ",$ann->as_text if ($ann->tagname eq "features");
     print "Comment: ",$ann->as_text if ($ann->tagname eq "comment");
     print "Title: ",$ann->as_text if ($ann->tagname eq "title");
     print "Organism: ",$ann->as_text if ($ann->tagname eq "organism");
     print "Definition: ",$ann->as_text if ($ann->tagname eq  
"definition");
}

It does not work. For some gi numbers I can not get $seqobj, for  
other gi numbers
I can get $seqobj but not any annotation.

Could you please help me how to get information from ncbi with a  
program?
Thank you.

Jingjun Sun


----- End forwarded message -----

_______________________________________________
Bioperl-guts-l mailing list
Bioperl-guts-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-guts-l

From johan.viklund at gmail.com  Tue Jul  5 11:01:40 2005
From: johan.viklund at gmail.com (Johan Viklund)
Date: Tue Jul  5 10:52:43 2005
Subject: [Bioperl-l] bioperl-db: exporting data
Message-ID: <5e924f0a05070508012bbb63d3@mail.gmail.com>

Hi

I'm trying to add COG annotations from Entrez Gene to sequences (from
refseq in genbank format) I have in a biosql database (on mysql). The
problem is I can't get them out again with the bioentry2flat.pl script
(the bioentries appears without what i've added).

I don't use bioperl for this (i've got ~40000 COG annotations (linked
to GeneIDs)). Instead I add it to the seqfeature_qualifer_value table
similar to the way GeneID:s are represented (as far as i've figured),
with term_id corresponding to db_xref, the same seqfeature_id as the
GeneID had and rank i've tried a few different variations but none
seem to work (the first free that's larger than GeneID and 1).

How should I add this annotation to the database so it gets exported
when I use bioperl?

I've also got another question: What is rank for?

-- 
Johan Viklund
E-mail: <johan.viklund.0705@student.uu.se>
        <johan.viklund@gmail.com>

From heikki at ebi.ac.uk  Tue Jul  5 11:06:10 2005
From: heikki at ebi.ac.uk (Heikki Lehvaslaiho)
Date: Tue Jul  5 10:56:53 2005
Subject: [Bioperl-l] Fwd: [Bioperl-guts-l] a question of retrieval
	information
Message-ID: <200507051606.10387.heikki@ebi.ac.uk>


----------  Forwarded Message  ----------

Subject: [Bioperl-guts-l] a question of retrieval information
Date: Tuesday 05 July 2005 15:05
From: jinsun@indiana.edu
To: bioperl-guts-l@bioperl.org

To whom it is concerned:

I try to write a perl program using bioperl and want to retrieve information
from ncbi website. The purpose of this program is to get protein's annotation
with a gi number. For example if given a gi number 16128448, I would get
 acrAB operon repressor [Escherichia coli K12]
 gi|16128448|ref|NP_414997.1|[16128448].

I wrote bioperl like this:

use Bio::DB::GenBank;

$gb = new Bio::DB::GenBank;
$seqobj = $gb->get_Seq_by_gi('16128448');
$ann_coll = $seqobj->annotation;

for $ann ($ann_coll->get_Annotations) {
    print "Features: ",$ann->as_text if ($ann->tagname eq "features");
    print "Comment: ",$ann->as_text if ($ann->tagname eq "comment");
    print "Title: ",$ann->as_text if ($ann->tagname eq "title");
    print "Organism: ",$ann->as_text if ($ann->tagname eq "organism");
    print "Definition: ",$ann->as_text if ($ann->tagname eq "definition");
}

It does not work. For some gi numbers I can not get $seqobj, for other gi
 numbers I can get $seqobj but not any annotation.

Could you please help me how to get information from ncbi with a program?
Thank you.

Jingjun Sun


----- End forwarded message -----

_______________________________________________
Bioperl-guts-l mailing list
Bioperl-guts-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-guts-l

-------------------------------------------------------

-- 
______ _/      _/_____________________________________________________
      _/      _/                      http://www.ebi.ac.uk/mutations/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_ebi _ac _uk
    _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
   _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
  _/  _/  _/  Cambridge, CB10 1SD, United Kingdom
     _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________
From wackattack at gmail.com  Tue Jul  5 03:23:04 2005
From: wackattack at gmail.com (Wacki)
Date: Tue Jul  5 10:59:21 2005
Subject: [Bioperl-l] Problems with Bioperl graphics
Message-ID: <2b8a4eeb05070500235eb437f8@mail.gmail.com>

I followed the tutorial here:

http://bioperl.org/HOWTOs/Graphics-HOWTO/gettingStarted.html

And ran this exact code:

http://biokdd.informatics.indiana.edu/jnowacki/render_blast1.txt

The image produced is shown here:

http://biokdd.informatics.indiana.edu/jnowacki/test.png

It doesn't have the name of the hits.  What is wrong?  The code is
exactly the same as the tutorial is it not?


code:

#!/usr/bin/perl

# This is code example 2 in the Graphics-HOWTO
use strict;
use lib '/home/lstein/projects/bioperl-live';
use Bio::Graphics;
use Bio::SeqFeature::Generic;

my $panel = Bio::Graphics::Panel->new(-length => 1000,
                                      -width  => 800,
                                      -pad_left => 10,
                                      -pad_right => 10,
                                      );
my $full_length = Bio::SeqFeature::Generic->new(-start=>1,-end=>1000);
$panel->add_track($full_length,
                  -glyph   => 'arrow',
                  -tick    => 2,
                  -fgcolor => 'black',
                  -double  => 1,
                  );

my $track = $panel->add_track(-glyph => 'graded_segments',
                              -label  => 4,
                              -bgcolor => 'blue',
                              -min_score => 0,
                              -max_score => 1000);

while (<>) { # read blast file
chomp;
next if /^\#/;  # ignore comments
my($name,$score,$start,$end) = split /\t+/;
my $feature = Bio::SeqFeature::Generic->new(-display_name=>$name,-score=>$score,
                                            -start=>$start,-end=>$end);
$track->add_feature($feature);
}

binmode(STDOUT);
print $panel->png;

From crabtree at tigr.ORG  Tue Jul  5 11:23:22 2005
From: crabtree at tigr.ORG (Crabtree, Jonathan)
Date: Tue Jul  5 11:16:26 2005
Subject: [Bioperl-l] Problems with Bioperl graphics
Message-ID: <CAAF27359A31D44FA9A90AF7E299C36F8A057D@EXCHANGE.TIGR.ORG>


One difference between your code and the tutorial is that you've set
-label to 4 in your call to add_track(); in the tutorial this parameter
is set to 1.  Try changing the 4 to 1 and see what happens.  Another
difference is that you're using the 'graded_segments' glyph instead of
the 'generic' glyph (I don't think this should matter, but you were
asking whether your code differs from that in the tutorial ;)

Jonathan


> -----Original Message-----
> From: bioperl-l-bounces@portal.open-bio.org 
> [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Wacki
> Sent: Tuesday, July 05, 2005 3:23 AM
> To: bioperl-l@bioperl.org
> Subject: [Bioperl-l] Problems with Bioperl graphics
> 
> 
> I followed the tutorial here:
> 
> http://bioperl.org/HOWTOs/Graphics-HOWTO/gettingStarted.html
> 
> And ran this exact code:
> 
> http://biokdd.informatics.indiana.edu/jnowacki/render_blast1.txt
> 
> The image produced is shown here:
> 
> http://biokdd.informatics.indiana.edu/jnowacki/test.png
> 
> It doesn't have the name of the hits.  What is wrong?  The 
> code is exactly the same as the tutorial is it not?
> 
> 
> 
> 
> 
> 
> code:
> 
> #!/usr/bin/perl
> 
> # This is code example 2 in the Graphics-HOWTO
> use strict;
> use lib '/home/lstein/projects/bioperl-live';
> use Bio::Graphics;
> use Bio::SeqFeature::Generic;
> 
> my $panel = Bio::Graphics::Panel->new(-length => 1000,
>                                       -width  => 800,
>                                       -pad_left => 10,
>                                       -pad_right => 10,
>                                       );
> my $full_length = Bio::SeqFeature::Generic->new(-start=>1,-end=>1000);
> $panel->add_track($full_length,
>                   -glyph   => 'arrow',
>                   -tick    => 2,
>                   -fgcolor => 'black',
>                   -double  => 1,
>                   );
> 
> my $track = $panel->add_track(-glyph => 'graded_segments',
>                               -label  => 4,
>                               -bgcolor => 'blue',
>                               -min_score => 0,
>                               -max_score => 1000);
> 
> while (<>) { # read blast file
> chomp;
> next if /^\#/;  # ignore comments
> my($name,$score,$start,$end) = split /\t+/;
> my $feature = 
> Bio::SeqFeature::Generic->new(-display_name=>$name,-score=>$score,
>                                             
> -start=>$start,-end=>$end); $track->add_feature($feature); }
> 
> binmode(STDOUT);
> print $panel->png;
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org 
> http://portal.open-> bio.org/mailman/listinfo/bioperl-l
> 

From heikki at ebi.ac.uk  Tue Jul  5 11:32:33 2005
From: heikki at ebi.ac.uk (Heikki Lehvaslaiho)
Date: Tue Jul  5 11:24:55 2005
Subject: [Bioperl-l] [Bioperl-guts-l] a question of retrieval information
In-Reply-To: <89DD54D7-C124-4BB2-8476-472AF90EC92F@indiana.edu>
References: <89DD54D7-C124-4BB2-8476-472AF90EC92F@indiana.edu>
Message-ID: <200507051632.33248.heikki@ebi.ac.uk>

Jinsun,

Your code works for me. It retrieves the sequence text file, creates the 
objects and prints out the comment. Are you sure the other gi numbers are 
valid? Try them first  on Entrez to see entry.

There are no annotations with names like 'features', 'title' or 'definition'. 
These are attributes of the sequence object itself.

Try
 $seqobj->id
 $seqobj->desc
 $seqobj->species
        $seqobj->all_SeqFeatures


Some of these return strings, some objects or arrays of objects.

These are good places to start learn how bioperl works:
 http://bio.perl.org/HOWTOs/
 http://bio.perl.org/Core/Latest/faq.html

Yours,

 -Heikki


P.S. Do not post to the guts mailing list. It is only for automatically 
generated reports.

 
On Tuesday 05 July 2005 15:05, jinsun@indiana.edu wrote:
> To whom it is concerned:
>
> I try to write a perl program using bioperl and want to retrieve
> information
> from ncbi website. The purpose of this program is to get protein's
> annotation
> with a gi number. For example if given a gi number 16128448, I would
> get acrAB
> operon repressor [Escherichia coli K12] gi|16128448|ref|NP_414997.1|
> [16128448].
>
> I wrote bioperl like this:
>
> use Bio::DB::GenBank;
>
> $gb = new Bio::DB::GenBank;
> $seqobj = $gb->get_Seq_by_gi('16128448');
> $ann_coll = $seqobj->annotation;
>
> for $ann ($ann_coll->get_Annotations) {
>      print "Features: ",$ann->as_text if ($ann->tagname eq "features");
>      print "Comment: ",$ann->as_text if ($ann->tagname eq "comment");
>      print "Title: ",$ann->as_text if ($ann->tagname eq "title");
>      print "Organism: ",$ann->as_text if ($ann->tagname eq "organism");
>      print "Definition: ",$ann->as_text if ($ann->tagname eq
> "definition");
> }
>
> It does not work. For some gi numbers I can not get $seqobj, for
> other gi numbers
> I can get $seqobj but not any annotation.
>
> Could you please help me how to get information from ncbi with a
> program?
> Thank you.
>
> Jingjun Sun
>
>
>
> ----- End forwarded message -----
>
> _______________________________________________
> Bioperl-guts-l mailing list
> Bioperl-guts-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-guts-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/                      http://www.ebi.ac.uk/mutations/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_ebi _ac _uk
    _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
   _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
  _/  _/  _/  Cambridge, CB10 1SD, United Kingdom
     _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________
From chad at dieselwurks.com  Tue Jul  5 13:41:27 2005
From: chad at dieselwurks.com (Chad Matsalla)
Date: Tue Jul  5 13:32:35 2005
Subject: [Bioperl-l] SeqWithQuality and biosql
In-Reply-To: <200507041715.20558.heikki@ebi.ac.uk>
References: <0C528E3670D8CE4B8E013F6749231AA62F53C7@ANTARESIA.be.devgen.com>
	<200507041715.20558.heikki@ebi.ac.uk>
Message-ID: <Pine.LNX.4.62.0507051139180.3487@sausage.usask.ca>


On Mon, 4 Jul 2005, Heikki Lehvaslaiho wrote:
> I have not actually talked about this with Chad, but I've had a long time plan
> to refactor Bio::Seq::SeqWithQuality to inherit from Bio::Seq::MetaI. It does
> not at the moment because Chad was there first.

Ha! I won! I remember doing a victory dance.

> Some time later there were some other needs to attach meta information
> to residues and to avoid having several implementations in bioperl I
> wrote Bio::Seq::MetaI and its implementation classes.

So how can I help the retrofit? Is my help necessary?

> I do not know if there are any issues why Bio::Seq::SeqWithQuality could not
> be Bio::Seq::MetaI, but it would be good thing to explore that, and implement
> only one very generic way to store residue-based meta values in biosql.

This sounds good. I'm willing to help as much as I can.

Chad Matsalla


From hlapp at gnf.org  Tue Jul  5 14:55:10 2005
From: hlapp at gnf.org (Hilmar Lapp)
Date: Tue Jul  5 14:43:40 2005
Subject: [Bioperl-l] Re: SeqWithQuality and biosql
In-Reply-To: <0C528E3670D8CE4B8E013F6749231AA62F53D1@ANTARESIA.be.devgen.com>
References: <0C528E3670D8CE4B8E013F6749231AA62F53D1@ANTARESIA.be.devgen.com>
Message-ID: <4672e7ad470df9973b998dd1188db923@gnf.org>

(I don't think posting to bioperl was a mistake, so I'm including it 
here again)

I think I like Mark's proposal best, i.e., the fundamental model of at 
most one sequence for each bioentry (e.g., Bio::SeqI object) is left 
intact, and the problem is reformulated as how to encode/decode 
sequences from alphabet cross-products as strings.

Encoding/decoding wouldn't be difficult to implement, even such that 
the encoded string is still humanly readable. Biojava has a natural 
provision for doing this (SymbolTokenizer?), but Bioperl does not, 
i.e., in Bioperl the object model assumes that the sequence is a flat 
string, and the alphabet is also a flat string; there is no object you 
could ask to provide you with an encoder/decoder appropriate for either 
the alphabet or the type of sequence object.

I'd like to hear some feedback from the Bioperl folks as to whether 
you'd consider this capability a generally useful addition to Bioperl. 
(It could be designed in a number of ways ranging from more intrusive 
to completely neutral - e.g., adding this as a method to SeqI [like 
$seq->seq_encoder()], or making $seq->alphabet() return an object with 
this and other capabilities, or creating a separate factory class that 
would return the appropriate encoder known to [or registered with] it 
based on a given alphabet and type of sequence object.)

As for Bio::Seq::MetaI, this could certainly be the interface for 
SeqWithQuality, but wouldn't solve the de/serialization problem. Also, 
at least conceptually MetaI-derived classes could represent 
multi-dimensional meta-information, right? That is, the problem of how 
to encode/decode the meta-information isn't trivial or restricted to 
two dimensions here either.

As for creating a specialized adaptor in Bioperl-db, that would 
certainly work too and would most likely be the fastest way to get 
something that works. However, long-term it would solve the problem 
only for SeqWithQuality and not for the more general problem of how to 
store sequences that are based on cross-product alphabets. BTW if you 
do implement a specialized adaptor, then instead of storing two 
bioentries and connecting them you might as well implement the sequence 
encoding/decoding for this particular object in the adaptor - you'd 
gain speed because instead of increasing the number of database 
operations you'd spend a couple more CPU cycles in Perl code, and you 
wouldn't be burdened with two bioentries that aren't coupled by foreign 
key constraint.

As for consensus for how to encode sequence with quality values, I'd 
include a delimiter between the alphabet operands in the cross-product. 
I.e., using e.g. slash as the delimiter: 'A/22 T/30 A/32 G/35 C/35'. 
This can be easily extended to multi-dimensional cross-products so long 
as the delimiter between them isn't a symbol in any of the alphabets.

	-hilmar


On Jul 5, 2005, at 12:39 AM, Marc Logghe wrote:

> Thanks for the feedback.
> Good to know I am not alone in this ;-)
> I totally agree with Mark that there should be a kind of consensus on
> how to store this in Bio*.
> Yesterday I mistakenly posted my original mail to the bioperl list.
> Heikki responded to that; it might be a good starting point but I am 
> not
> familiar with it:
> http://portal.open-bio.org/pipermail/bioperl-l/2005-July/019271.html
> So far the long term solustion.
> In short term, to have at least something that works, I'll experiment a
> little with storing separate objects. I remember one of the
> presentations of Hilmar, where he gave the example of making an adaptor
> and storing 2 sequence objects that interacted with each other as a
> result of a Two Hybrid experiment in yeast.
> Cheers,
> Marc
>
>
>>
>> I'd think storing it in BioSQL as 2-byte pairs would be good.
>> First byte is the base (an ASCII character), second byte is
>> the quality (an 8-bit integer). Sure it wastes a few bits but
>> so does normal DNA...
>>
>>
>> Richard Holland
>> Bioinformatics Specialist
>> GIS extension 8199
>> ---------------------------------------------
>> This email is confidential and may be privileged. If you are
>> not the intended recipient, please delete it and notify us
>> immediately. Please do not copy or use it for any purpose, or
>> disclose its content to any other person. Thank you.
>> ---------------------------------------------
>>
>>
>>> -----Original Message-----
>>> From: biosql-l-bounces@portal.open-bio.org
>>> [mailto:biosql-l-bounces@portal.open-bio.org] On Behalf Of
>>> mark.schreiber@novartis.com
>>> Sent: Tuesday, July 05, 2005 1:44 PM
>>> To: Marc Logghe
>>> Cc: biosql-l-bounces@portal.open-bio.org; biosql-l@open-bio.org
>>> Subject: Re: [BioSQL-l] FW: SeqWithQuality and biosql
>>>
>>>
>>> Hello -
>>>
>>> I was wondering about similar issues with biojava. As you
>> may (or may
>>> not) know biojava can make sequences from symbols in any
>> alphabet, two
>>> examples are DNA and the integer alphabet (a collection of Symbols
>>> that are integers). Biojava can also make compound
>> alphabets, one such
>>> example is the Phred alphabet which is the multiplication of DNA x
>>> Integer (technically a subset of Integer from 0 to 99).
>>>
>>> Because sequence in BioSQL is stored in a CLOB if you can
>> encode your
>>> SeqWithQuality as a String of characters you can store it.
>>> With the case
>>> above (which is probably similar to yours) you would need 400
>>> characters to store it which is too large for ASCI but
>> could be done
>>> in Unicode. The downside is your persitance layer needs to
>> know how to
>>> encode and decode your SeqWithQuality. I'm not familiar how BioPerl
>>> would do this. BioJava would need to Implement a
>> SymbolTokenizer for
>>> the alphabet and then persistance would happen
>> automatically (assuming
>>> your DB is OK with Unicode). An alternative would be to make a
>>> tokenizer that uses more than single character tokens for
>> encoding (eg
>>> A23 G40 T34 C22 etc).
>>>
>>> The alternative you suggest of storing two sequences with a
>>> relationship is also nice (because you can retreive each part
>>> seperately) but also requires your persitance layer to know
>> about it.
>>> However, it has big disadvantages because they are not
>> strongly tied
>>> to each other. If you manipulate one you might invalidate
>> the other.
>>> Also if you delete one the other will probably not be deleted in a
>>> cascade.
>>>
>>> Not sure if any of this helps but a consensus on how to store this
>>> kind of information would be good so the bio* projects do
>> it the same
>>> way.
>>> Consensus in this case will probably mean whatever the first
>>> implementation is.
>>>
>>> - Mark
>>>
>>>
>>>
>>>
>>>
>>> "Marc Logghe" <Marc.Logghe@devgen.com> Sent by:
>>> biosql-l-bounces@portal.open-bio.org
>>> 07/04/2005 05:56 PM
>>>
>>>
>>>         To:     <biosql-l@open-bio.org>
>>>         cc:     (bcc: Mark Schreiber/GP/Novartis)
>>>         Subject:        [BioSQL-l] FW: SeqWithQuality and biosql
>>>
>>>
>>> Apologies for cross posting, I had picked the wrong mail adress :-(
>>>
>>> -----Original Message-----
>>> From: Marc Logghe
>>> Sent: Monday, July 04, 2005 11:43 AM
>>> To: bioperl-l@portal.open-bio.org
>>> Subject: SeqWithQuality and biosql
>>>
>>> Hi all,
>>> I am currently exploring the possibility to store a
>>> Bio::Seq::SeqWithQuality object in biosql.
>>> Has anyone ever tried this ?
>>> One possibility would be to
>>> 1) split up the Bio::Seq::SeqWithQuality object into a plain
>>> Bio::Seq::RichSeq and a Bio::Seq::PrimaryQual
>>> 2) store them separately in biosql; different namespaces
>>> 3) link them with a relation term.
>>> 4) make a custom adaptor to fetch the persistent objects
>> from biosql
>>> and reconstruct the Bio::Seq::SeqWithQuality
>>>
>>> Does that make sense ? Any other suggestions/possibilities ?
>>> As a test I tried to load a Bio::Seq::PrimaryQual in biosql
>> using the
>>> load_seqdatabase.pl but it fails because Bio::Seq::PrimaryQual does
>>> not have a namespace method.
>>> I hope I'm wrong but I have the impression there is a long
>> way to go
>>> ;-)
>>>
>>> Marc
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> BioSQL-l mailing list
>>> BioSQL-l@open-bio.org
>>> http://open-bio.org/mailman/listinfo/biosql-l
>>>
>>>
>>>
>>> _______________________________________________
>>> BioSQL-l mailing list
>>> BioSQL-l@open-bio.org
>>> http://open-bio.org/mailman/listinfo/biosql-l
>>>
>>
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l@open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------

From hlapp at gnf.org  Tue Jul  5 23:29:30 2005
From: hlapp at gnf.org (Hilmar Lapp)
Date: Tue Jul  5 23:18:18 2005
Subject: [Bioperl-l] Re: SeqWithQuality and biosql
In-Reply-To: <OFAA2A729F.FA88750C-ON48257036.000C8955-48257036.00101614@EU.novartis.net>
References: <OFAA2A729F.FA88750C-ON48257036.000C8955-48257036.00101614@EU.novartis.net>
Message-ID: <12d0914aa33fca6d2e5175ddf85cd0d4@gnf.org>


On Jul 5, 2005, at 7:55 PM, mark.schreiber@novartis.com wrote:

> I would propose the
> following for compound alphabets...
>
> (aca)(gtc) for codon alphabets.
> (g17)(t40) for quality type alphabets.

In your convention wouldn't this need to be
(g(17))(t(40))

Otherwise you'd have trouble representing higher-dimensional 
cross-products unless you alternate chars and digits which would be a 
useless restriction.

	-hilmar
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------

From hlapp at gmx.net  Wed Jul  6 03:47:21 2005
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed Jul  6 03:43:02 2005
Subject: [Bioperl-l] bioperl-db: exporting data
In-Reply-To: <5e924f0a05070508012bbb63d3@mail.gmail.com>
References: <5e924f0a05070508012bbb63d3@mail.gmail.com>
Message-ID: <eedb6cb2613fe06259b294a066e2d81d@gmx.net>

The way you're describing doesn't sound too far off. The rank is an 
ordering index as well as a component of the unique key constraint, 
i.e.,  you can't have two seqfeature qualifier values for the same 
feature and tag name unless the rank is different.

Have you convinced yourself that you con log in to the database and 
retrieve those additions by hand (using SQL)?

Can you reduce this to a test case where you load a single sequence 
record, then issue SQL to add your custom annotation, and then retrieve 
the record again. Email me the entry you loaded, the SQL statements you 
issued, and the entry you got out.

	 -hilmar

On Jul 5, 2005, at 8:01 AM, Johan Viklund wrote:

> Hi
>
> I'm trying to add COG annotations from Entrez Gene to sequences (from
> refseq in genbank format) I have in a biosql database (on mysql). The
> problem is I can't get them out again with the bioentry2flat.pl script
> (the bioentries appears without what i've added).
>
> I don't use bioperl for this (i've got ~40000 COG annotations (linked
> to GeneIDs)). Instead I add it to the seqfeature_qualifer_value table
> similar to the way GeneID:s are represented (as far as i've figured),
> with term_id corresponding to db_xref, the same seqfeature_id as the
> GeneID had and rank i've tried a few different variations but none
> seem to work (the first free that's larger than GeneID and 1).
>
> How should I add this annotation to the database so it gets exported
> when I use bioperl?
>
> I've also got another question: What is rank for?
>
> -- 
> Johan Viklund
> E-mail: <johan.viklund.0705@student.uu.se>
>         <johan.viklund@gmail.com>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From mark.schreiber at novartis.com  Tue Jul  5 22:55:40 2005
From: mark.schreiber at novartis.com (mark.schreiber@novartis.com)
Date: Wed Jul  6 09:44:41 2005
Subject: [Bioperl-l] Re: SeqWithQuality and biosql
Message-ID: <OFAA2A729F.FA88750C-ON48257036.000C8955-48257036.00101614@EU.novartis.net>

The BioJava SymbolTokenizer can either tokenize to characters or Strings. 
Obviously not all alphabets can sensibly tokenize to characters (eg large 
compound alphabets). Currently by default it would tokenize a compound 
symbol to its compound names. For example the codon ACA would be 

(adenosine cytosine adenosine)

This is obviously not ideal for a database and it can easily be changed in 
biojava without breaking things (to be honest, tokenization of compound 
alphas in biojava is not a common task at all). I would propose the 
following for compound alphabets...

(aca)(gtc) for codon alphabets.
(g17)(t40) for quality type alphabets.

I like the use of brakets because it is possible in biojava to do 
something like this ((DNAxDNAxDNA)xPROTEIN) which would represent an 
alignement of codons with their amino acids or even 
((DNAxDNA)x(DNAxDNAxDNA)), which I'm not sure you would ever use but their 
might be a good reason for it. The brackets help to disambiguate better 
than spaces would. For example

((ctc)S) for the first example or,
((atg)(gc)) for the second example.

To make this work there also needs to be a uniform way to store the 
alphabet name in the sequence table. The above examples show how biojava 
constructs alphabet names but there maybe (probably are) better ways.

For quality information you could use (DNAxINTEGER), techincally the 
biojava name would be (DNAxSubIntegerAlphabet[0..99]). Of course you don't 
have to use this convention and aliasing would be nice (eg the 'official' 
name for INTEGER in BioJava would be 'Alphabet of all integers' which is a 
bit long winded!)

- Mark


Hilmar Lapp <hlapp@gnf.org>
07/06/2005 02:55 AM

 
        To:     "Marc Logghe" <Marc.Logghe@devgen.com>
        cc:     Mark Schreiber/GP/Novartis@PH, Bioperl <bioperl-l@bioperl.org>, OBDA 
BioSQL <biosql-l@open-bio.org>, Richard HOLLAND 
<hollandr@gis.a-star.edu.sg>
        Subject:        Re: SeqWithQuality and biosql


(I don't think posting to bioperl was a mistake, so I'm including it 
here again)

I think I like Mark's proposal best, i.e., the fundamental model of at 
most one sequence for each bioentry (e.g., Bio::SeqI object) is left 
intact, and the problem is reformulated as how to encode/decode 
sequences from alphabet cross-products as strings.

Encoding/decoding wouldn't be difficult to implement, even such that 
the encoded string is still humanly readable. Biojava has a natural 
provision for doing this (SymbolTokenizer?), but Bioperl does not, 
i.e., in Bioperl the object model assumes that the sequence is a flat 
string, and the alphabet is also a flat string; there is no object you 
could ask to provide you with an encoder/decoder appropriate for either 
the alphabet or the type of sequence object.

I'd like to hear some feedback from the Bioperl folks as to whether 
you'd consider this capability a generally useful addition to Bioperl. 
(It could be designed in a number of ways ranging from more intrusive 
to completely neutral - e.g., adding this as a method to SeqI [like 
$seq->seq_encoder()], or making $seq->alphabet() return an object with 
this and other capabilities, or creating a separate factory class that 
would return the appropriate encoder known to [or registered with] it 
based on a given alphabet and type of sequence object.)

As for Bio::Seq::MetaI, this could certainly be the interface for 
SeqWithQuality, but wouldn't solve the de/serialization problem. Also, 
at least conceptually MetaI-derived classes could represent 
multi-dimensional meta-information, right? That is, the problem of how 
to encode/decode the meta-information isn't trivial or restricted to 
two dimensions here either.

As for creating a specialized adaptor in Bioperl-db, that would 
certainly work too and would most likely be the fastest way to get 
something that works. However, long-term it would solve the problem 
only for SeqWithQuality and not for the more general problem of how to 
store sequences that are based on cross-product alphabets. BTW if you 
do implement a specialized adaptor, then instead of storing two 
bioentries and connecting them you might as well implement the sequence 
encoding/decoding for this particular object in the adaptor - you'd 
gain speed because instead of increasing the number of database 
operations you'd spend a couple more CPU cycles in Perl code, and you 
wouldn't be burdened with two bioentries that aren't coupled by foreign 
key constraint.

As for consensus for how to encode sequence with quality values, I'd 
include a delimiter between the alphabet operands in the cross-product. 
I.e., using e.g. slash as the delimiter: 'A/22 T/30 A/32 G/35 C/35'. 
This can be easily extended to multi-dimensional cross-products so long 
as the delimiter between them isn't a symbol in any of the alphabets.

                 -hilmar


On Jul 5, 2005, at 12:39 AM, Marc Logghe wrote:

> Thanks for the feedback.
> Good to know I am not alone in this ;-)
> I totally agree with Mark that there should be a kind of consensus on
> how to store this in Bio*.
> Yesterday I mistakenly posted my original mail to the bioperl list.
> Heikki responded to that; it might be a good starting point but I am 
> not
> familiar with it:
> http://portal.open-bio.org/pipermail/bioperl-l/2005-July/019271.html
> So far the long term solustion.
> In short term, to have at least something that works, I'll experiment a
> little with storing separate objects. I remember one of the
> presentations of Hilmar, where he gave the example of making an adaptor
> and storing 2 sequence objects that interacted with each other as a
> result of a Two Hybrid experiment in yeast.
> Cheers,
> Marc
>
>
>>
>> I'd think storing it in BioSQL as 2-byte pairs would be good.
>> First byte is the base (an ASCII character), second byte is
>> the quality (an 8-bit integer). Sure it wastes a few bits but
>> so does normal DNA...
>>
>>
>> Richard Holland
>> Bioinformatics Specialist
>> GIS extension 8199
>> ---------------------------------------------
>> This email is confidential and may be privileged. If you are
>> not the intended recipient, please delete it and notify us
>> immediately. Please do not copy or use it for any purpose, or
>> disclose its content to any other person. Thank you.
>> ---------------------------------------------
>>
>>
>>> -----Original Message-----
>>> From: biosql-l-bounces@portal.open-bio.org
>>> [mailto:biosql-l-bounces@portal.open-bio.org] On Behalf Of
>>> mark.schreiber@novartis.com
>>> Sent: Tuesday, July 05, 2005 1:44 PM
>>> To: Marc Logghe
>>> Cc: biosql-l-bounces@portal.open-bio.org; biosql-l@open-bio.org
>>> Subject: Re: [BioSQL-l] FW: SeqWithQuality and biosql
>>>
>>>
>>> Hello -
>>>
>>> I was wondering about similar issues with biojava. As you
>> may (or may
>>> not) know biojava can make sequences from symbols in any
>> alphabet, two
>>> examples are DNA and the integer alphabet (a collection of Symbols
>>> that are integers). Biojava can also make compound
>> alphabets, one such
>>> example is the Phred alphabet which is the multiplication of DNA x
>>> Integer (technically a subset of Integer from 0 to 99).
>>>
>>> Because sequence in BioSQL is stored in a CLOB if you can
>> encode your
>>> SeqWithQuality as a String of characters you can store it.
>>> With the case
>>> above (which is probably similar to yours) you would need 400
>>> characters to store it which is too large for ASCI but
>> could be done
>>> in Unicode. The downside is your persitance layer needs to
>> know how to
>>> encode and decode your SeqWithQuality. I'm not familiar how BioPerl
>>> would do this. BioJava would need to Implement a
>> SymbolTokenizer for
>>> the alphabet and then persistance would happen
>> automatically (assuming
>>> your DB is OK with Unicode). An alternative would be to make a
>>> tokenizer that uses more than single character tokens for
>> encoding (eg
>>> A23 G40 T34 C22 etc).
>>>
>>> The alternative you suggest of storing two sequences with a
>>> relationship is also nice (because you can retreive each part
>>> seperately) but also requires your persitance layer to know
>> about it.
>>> However, it has big disadvantages because they are not
>> strongly tied
>>> to each other. If you manipulate one you might invalidate
>> the other.
>>> Also if you delete one the other will probably not be deleted in a
>>> cascade.
>>>
>>> Not sure if any of this helps but a consensus on how to store this
>>> kind of information would be good so the bio* projects do
>> it the same
>>> way.
>>> Consensus in this case will probably mean whatever the first
>>> implementation is.
>>>
>>> - Mark
>>>
>>>
>>>
>>>
>>>
>>> "Marc Logghe" <Marc.Logghe@devgen.com> Sent by:
>>> biosql-l-bounces@portal.open-bio.org
>>> 07/04/2005 05:56 PM
>>>
>>>
>>>         To:     <biosql-l@open-bio.org>
>>>         cc:     (bcc: Mark Schreiber/GP/Novartis)
>>>         Subject:        [BioSQL-l] FW: SeqWithQuality and biosql
>>>
>>>
>>> Apologies for cross posting, I had picked the wrong mail adress :-(
>>>
>>> -----Original Message-----
>>> From: Marc Logghe
>>> Sent: Monday, July 04, 2005 11:43 AM
>>> To: bioperl-l@portal.open-bio.org
>>> Subject: SeqWithQuality and biosql
>>>
>>> Hi all,
>>> I am currently exploring the possibility to store a
>>> Bio::Seq::SeqWithQuality object in biosql.
>>> Has anyone ever tried this ?
>>> One possibility would be to
>>> 1) split up the Bio::Seq::SeqWithQuality object into a plain
>>> Bio::Seq::RichSeq and a Bio::Seq::PrimaryQual
>>> 2) store them separately in biosql; different namespaces
>>> 3) link them with a relation term.
>>> 4) make a custom adaptor to fetch the persistent objects
>> from biosql
>>> and reconstruct the Bio::Seq::SeqWithQuality
>>>
>>> Does that make sense ? Any other suggestions/possibilities ?
>>> As a test I tried to load a Bio::Seq::PrimaryQual in biosql
>> using the
>>> load_seqdatabase.pl but it fails because Bio::Seq::PrimaryQual does
>>> not have a namespace method.
>>> I hope I'm wrong but I have the impression there is a long
>> way to go
>>> ;-)
>>>
>>> Marc
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> BioSQL-l mailing list
>>> BioSQL-l@open-bio.org
>>> http://open-bio.org/mailman/listinfo/biosql-l
>>>
>>>
>>>
>>> _______________________________________________
>>> BioSQL-l mailing list
>>> BioSQL-l@open-bio.org
>>> http://open-bio.org/mailman/listinfo/biosql-l
>>>
>>
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l@open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From hollandr at gis.a-star.edu.sg  Tue Jul  5 23:38:51 2005
From: hollandr at gis.a-star.edu.sg (Richard HOLLAND)
Date: Wed Jul  6 09:44:42 2005
Subject: [Bioperl-l] RE: SeqWithQuality and biosql
Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D5601DCB226@BIONIC.biopolis.one-north.com>

Good point.

To correctly represent compound alphabets in a consistent manner would
require extra tables in BioSQL (version 1.1?). Some kind of alphabet
table with a name and a related table with alphabet ids and ranks to
construct cross products etc.

Why not store the delimiter as an attribute of the alphabet in this
table. That way we can use whatever delimiters we like. I don't think
grouping is necessary - after all we know from the alphabet definition
that there are a fixed number of tokens per symbol and what order they
come in, so we just read the first three tokens to build the first
symbol, and so on.


Richard Holland
Bioinformatics Specialist
GIS extension 8199
---------------------------------------------
This email is confidential and may be privileged. If you are not the
intended recipient, please delete it and notify us immediately. Please
do not copy or use it for any purpose, or disclose its content to any
other person. Thank you.
---------------------------------------------


> -----Original Message-----
> From: Hilmar Lapp [mailto:hlapp@gnf.org] 
> Sent: Wednesday, July 06, 2005 11:30 AM
> To: mark.schreiber@novartis.com
> Cc: Bioperl; Richard HOLLAND
> Subject: Re: SeqWithQuality and biosql
> 
> 
> 
> On Jul 5, 2005, at 7:55 PM, mark.schreiber@novartis.com wrote:
> 
> > I would propose the
> > following for compound alphabets...
> >
> > (aca)(gtc) for codon alphabets.
> > (g17)(t40) for quality type alphabets.
> 
> In your convention wouldn't this need to be
> (g(17))(t(40))
> 
> Otherwise you'd have trouble representing higher-dimensional 
> cross-products unless you alternate chars and digits which would be a 
> useless restriction.
> 
> 	-hilmar
> -- 
> -------------------------------------------------------------
> Hilmar Lapp                            email: lapp at gnf.org
> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> -------------------------------------------------------------
> 
> 

From mark.schreiber at novartis.com  Wed Jul  6 01:37:21 2005
From: mark.schreiber at novartis.com (mark.schreiber@novartis.com)
Date: Wed Jul  6 09:44:43 2005
Subject: [Bioperl-l] RE: SeqWithQuality and biosql
Message-ID: <OF4F0555F1.941C3730-ON48257036.001EA669-48257036.001EE388@EU.novartis.net>

Actually under my proposal

(a(17)) would imply (DNAx(SubInteger[0..9]xSubInteger[0..9]))


"Richard HOLLAND" <hollandr@gis.a-star.edu.sg>
07/06/2005 11:38 AM

 
        To:     "Hilmar Lapp" <hlapp@gnf.org>, Mark Schreiber/GP/Novartis@PH
        cc:     "Bioperl" <bioperl-l@bioperl.org>, <biosql-l@open-bio.org>
        Subject:        RE: SeqWithQuality and biosql


Good point.

To correctly represent compound alphabets in a consistent manner would
require extra tables in BioSQL (version 1.1?). Some kind of alphabet
table with a name and a related table with alphabet ids and ranks to
construct cross products etc.

Why not store the delimiter as an attribute of the alphabet in this
table. That way we can use whatever delimiters we like. I don't think
grouping is necessary - after all we know from the alphabet definition
that there are a fixed number of tokens per symbol and what order they
come in, so we just read the first three tokens to build the first
symbol, and so on.


Richard Holland
Bioinformatics Specialist
GIS extension 8199
---------------------------------------------
This email is confidential and may be privileged. If you are not the
intended recipient, please delete it and notify us immediately. Please
do not copy or use it for any purpose, or disclose its content to any
other person. Thank you.
---------------------------------------------


> -----Original Message-----
> From: Hilmar Lapp [mailto:hlapp@gnf.org] 
> Sent: Wednesday, July 06, 2005 11:30 AM
> To: mark.schreiber@novartis.com
> Cc: Bioperl; Richard HOLLAND
> Subject: Re: SeqWithQuality and biosql
> 
> 
> 
> On Jul 5, 2005, at 7:55 PM, mark.schreiber@novartis.com wrote:
> 
> > I would propose the
> > following for compound alphabets...
> >
> > (aca)(gtc) for codon alphabets.
> > (g17)(t40) for quality type alphabets.
> 
> In your convention wouldn't this need to be
> (g(17))(t(40))
> 
> Otherwise you'd have trouble representing higher-dimensional 
> cross-products unless you alternate chars and digits which would be a 
> useless restriction.
> 
>                -hilmar
> -- 
> -------------------------------------------------------------
> Hilmar Lapp                            email: lapp at gnf.org
> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> -------------------------------------------------------------
> 
> 


From heikki at ebi.ac.uk  Wed Jul  6 12:28:43 2005
From: heikki at ebi.ac.uk (Heikki Lehvaslaiho)
Date: Wed Jul  6 12:20:00 2005
Subject: [Bioperl-l] Re: Bio::Tree::Compatible, Bio::Tree::Draw::Cladogram
In-Reply-To: <42CC0C6A.80001@jaist.ac.jp>
References: <200507051532.48179.heikki@ebi.ac.uk> <42CC0C6A.80001@jaist.ac.jp>
Message-ID: <200507061728.43482.heikki@ebi.ac.uk>

Hi Gabriel,

I thought this must have been through Jason ;-)  He is the most active 
contributor to bioperl but this just demonstrates how complex bioperl have 
got. One person just can not monitor everything. We can easily put blame on 
him. He can take it like a man!


On Wednesday 06 July 2005 17:52, Gabriel Valiente wrote:
> Dear Heikki,
>
> I've been discussing all about these modules with Jason Stajich, I just
> didn't know of the need for moving the discussion to any mailing list.
> Sorry about that. Please tell me how to proceed, I haven't yet
> subscribed to any BioPerl mailing list.

To keep everyone informed about commits it is customary to 
- be a member of the bioperl mailing list
  http://bio.perl.org/MailList.shtml
- announce plans and major code commits to the list
- commit tests, preferable at the same time as code

(There is no more than a paragraph in biodesign.pod, so here is a beginning of 
a new tutorial... I just remembered that I wrote something about this in 
docbook format more than a year ago, but I can not find it now. If I ever 
gave the text to anyone, I'd love to see it again!)

Take a look at a few example test file in the t directory, e.g. t/Spidey.t.
They all contain a BEGIN statement of variable complexity that uses the Test 
module and declares how many tests there will be. Test (see 'man Test') 
exports function ok() that takes care of printing the output. That output is 
all this file should write out when run using 'perl -w t/Spidey.t' or using 
make to run the perl test harness (see 'man Test::Harness').

Running these tests periodically enables maintainers see if a change somewhere 
has broken some other feature.

You normally start with testing if you can 'use' your new module, then that 
you can create an object and then proceed by testing at least all the public 
methods. Data files can be put into t/data. You can have output from your 
test script, but it should clean up all new files at exit (do that within END 
block).

> In a nutshell, I've written these modules to support my research on
> algorithms in bioinformatics. Bio::Tree::Draw::Cladogram is in an early
> stage, I'm still working on the optimal tanglegram layout problem (to
> minimize the number of edge crossings among the taxa of the two trees).

If there is more interest, it would be cool to have an abstaction layer and be 
able to output more formats.

I guess we can not do anything to the PostScript::TextBloc dependency here.

> Bio::Tree::Compatible is perhaps in much better shape, I've tested it
> over all pairs of trees from the TreeBASE database. There's a paper
> (still under review) about it, the preprint is available from any of:
>
>     http://www.lsi.upc.es/dept/techreps/listado_concreto.php?id=766
>     http://arxiv.org/abs/cs.DM/0505086


Is the use of Set::Scalar really necessary? It is yet an other dependency, 
although I do like it myself, and it might turn out to be useful to other 
modules, too, in the future.

> I don't know much about including test code in the distribution. Please
> give me some guidelines, I definitely want to see these modules (and
> whatever else I may write in the future) included in the whole BioPerl
> distribution. I'm on vacation now, but will try to keep along the
> discussion anyway.

They are in. No real hurry with the tests.

Yours,
 -Heikki

> Thanks,
>
> Gabriel
>
> >Gabriel,
> >
> >While testing bioperl module SYNOPSIS sections for runnability I found out
> >that there are two modules in bioperl-live that have external dependencies
> >that are not in Makefile.PL:
> >
> >Bio::Tree::Compatible
> >  Testing compatibility of phylogenetic trees with nested taxa.
> >  depends on Set::Scalar
> >
> >Bio::Tree::Draw::Cladogram
> >  Drawing phylogenetic trees in Encapsulated PostScript (EPS) format.
> >  depends on PostScript::TextBlock.pm
> >
> >They both are yours.
> >
> >I have not been that active on the mailing list lately, so I searched the
> > list for a discussion on these new modules. I started getting a bit
> > alarmed that there were none, no emails ever to the bioperl mailing list
> > from you. Finally, I checked the t (test) directory and there were no
> > tests for these modules.
> >
> >Could we have that discussion now and hopefully at the end of the
> > discussion add the dependencies to the Makefile.PL? In a project this
> > big, we have to keep each others informed so that we can keep all parts
> > of bioperl functional and to avoid confusing and alienating users.
> >
> >
> >Where do these modules come from?
> >What functionality do they add?
> >Is the name space used correct?
> >Could we see test code that demonstrates the functionality?
> >Is there something else that you are planning to do?
> >
> >
> >Yours,
> >
> > -Heikki, who feels that he is probably overreacting ;-)
> >                 ... so do not take it personally

-- 
______ _/      _/_____________________________________________________
      _/      _/                      http://www.ebi.ac.uk/mutations/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_ebi _ac _uk
    _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
   _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
  _/  _/  _/  Cambridge, CB10 1SD, United Kingdom
     _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________
From hlapp at gnf.org  Wed Jul  6 12:30:06 2005
From: hlapp at gnf.org (Hilmar Lapp)
Date: Wed Jul  6 12:22:08 2005
Subject: [Bioperl-l] RE: SeqWithQuality and biosql
In-Reply-To: <OF4F0555F1.941C3730-ON48257036.001EA669-48257036.001EE388@EU.novartis.net>
References: <OF4F0555F1.941C3730-ON48257036.001EA669-48257036.001EE388@EU.novartis.net>
Message-ID: <aa7f992af71e51f4ecce18469d13370c@gnf.org>


On Jul 5, 2005, at 10:37 PM, mark.schreiber@novartis.com wrote:

> Actually under my proposal
>
> (a(17)) would imply (DNAx(SubInteger[0..9]xSubInteger[0..9]))
>

That's why I didn't like it - how would you encode 
(DNAx(SubInteger[0..99]xSubInteger[0..99]) in this proposal? Require 
each component to be two-digit? There ought to be delimiters between 
the operands, no?

	-hilmar
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From sm_middha at yahoo.com  Wed Jul  6 12:37:05 2005
From: sm_middha at yahoo.com (sumit middha)
Date: Wed Jul  6 12:28:06 2005
Subject: [Bioperl-l] FASTA.pm issue
In-Reply-To: <BEE079CE.2043%brian_osborne@cognia.com>
Message-ID: <20050706163705.67979.qmail@web30711.mail.mud.yahoo.com>


Well heres a small test code I made to explain my
problem. Please let me know your suggestions.
Thanks.

--------------code-----------------
#!/usr/bin/perl -w
use strict;
use Bio::DB::Fasta;
use Bio::DB::Flat;
use Bio::Index::Fasta;
use Bio::Seq;

my $db = Bio::DB::Fasta->new("f1");
#my $db = Bio::Index::Fasta->new("f1");
my $seqobj = $db->get_Seq_by_id("abc"); 
my $str = $seqobj->seq();
print $str;

exit; 
-----------end of code ------------

And here is the error I get (which I did not a few
months back)

> perl -w test.pl
AnyDBM_File doesn't define an EXISTS method at
/usr/local/lib/perl5/site_perl/5.8.5/Bio/DB/Fasta.pm
line 577

and f1 fasta file is 
> cat f1
>abc
AGCATCG


--- Brian Osborne <brian_osborne@cognia.com> wrote:

> Sumit,
> 
> You'll have to show us the code that gives you the
> error, I think.
> 
> 
> Brian O.
> 
> 
> On 6/23/05 1:07 PM, "sumit middha"
> <sm_middha@yahoo.com> wrote:
> 
> > 
> > Thanks for the reply Brian.
> > Changing it to Bio::Index::Fasta helped, but gave
> > another problem in my script, which I dont have a
> > clue.
> > 
> > ------------- EXCEPTION  -------------
> > MSG: Can't open 'SDBM_File' dbm file
> > '../Dyak/dyak_chr_ucsc.fa.rev' : No such file or
> > directory
> > STACK Bio::Index::Abstract::open_dbm
> >
>
/usr/local/lib/perl5/site_perl/5.8.5/Bio/Index/Abstract.pm:392
> > STACK Bio::Index::Abstract::new
> >
>
/usr/local/lib/perl5/site_perl/5.8.5/Bio/Index/Abstract.pm:150
> > STACK Bio::Index::AbstractSeq::new
> >
>
/usr/local/lib/perl5/site_perl/5.8.5/Bio/Index/AbstractSeq.pm:91
> > STACK toplevel get_ortho.pl:31
> > 
> > I know that the file exists, and has been
> formatted as
> > a database to use BLAST search.
> > 
> > sumit
> > 
> > --- Brian Osborne <brian_osborne@cognia.com>
> wrote:
> > 
> >> Sumit,
> >> 
> >> In perl 5.8 a module that's using a tied hash is
> >> supposed to have an EXISTS
> >> method, but it appears that AnyDBM_File doesn't.
> You
> >> could try using
> >> Bio::Index::Fasta instead, or Bio::DB::Flat.
> >> 
> >> Brian O.
> >> 
> >> 
> >> On 6/22/05 6:24 PM, "sumit middha"
> >> <sm_middha@yahoo.com> wrote:
> >> 
> >>> 
> >>> Hello,
> >>> 
> >>> I have a trouble with using fasta module
> >>> 
> >>> I use the required statements
> >>> 
> >>> use Bio::DB::Fasta;
> >>> use Bio::Seq;
> >>> 
> >>> The error was:
> >>> 
> >>> AnyDBM_File doesn't define an EXISTS method at
> >>> 
> >>
> /usr/local/lib/perl5/site_perl/5.8.5/Bio/DB/Fasta.pm
> >>> line 577
> >>> 
> >>> thanks,
> >>> sm


__________________________________ 
Do you Yahoo!? 
Make Yahoo! your home page 
http://www.yahoo.com/r/hs
From sm_middha at yahoo.com  Wed Jul  6 12:37:05 2005
From: sm_middha at yahoo.com (sumit middha)
Date: Wed Jul  6 12:28:07 2005
Subject: [Bioperl-l] FASTA.pm issue
In-Reply-To: <BEE079CE.2043%brian_osborne@cognia.com>
Message-ID: <20050706163705.67979.qmail@web30711.mail.mud.yahoo.com>


Well heres a small test code I made to explain my
problem. Please let me know your suggestions.
Thanks.

--------------code-----------------
#!/usr/bin/perl -w
use strict;
use Bio::DB::Fasta;
use Bio::DB::Flat;
use Bio::Index::Fasta;
use Bio::Seq;

my $db = Bio::DB::Fasta->new("f1");
#my $db = Bio::Index::Fasta->new("f1");
my $seqobj = $db->get_Seq_by_id("abc"); 
my $str = $seqobj->seq();
print $str;

exit; 
-----------end of code ------------

And here is the error I get (which I did not a few
months back)

> perl -w test.pl
AnyDBM_File doesn't define an EXISTS method at
/usr/local/lib/perl5/site_perl/5.8.5/Bio/DB/Fasta.pm
line 577

and f1 fasta file is 
> cat f1
>abc
AGCATCG


--- Brian Osborne <brian_osborne@cognia.com> wrote:

> Sumit,
> 
> You'll have to show us the code that gives you the
> error, I think.
> 
> 
> Brian O.
> 
> 
> On 6/23/05 1:07 PM, "sumit middha"
> <sm_middha@yahoo.com> wrote:
> 
> > 
> > Thanks for the reply Brian.
> > Changing it to Bio::Index::Fasta helped, but gave
> > another problem in my script, which I dont have a
> > clue.
> > 
> > ------------- EXCEPTION  -------------
> > MSG: Can't open 'SDBM_File' dbm file
> > '../Dyak/dyak_chr_ucsc.fa.rev' : No such file or
> > directory
> > STACK Bio::Index::Abstract::open_dbm
> >
>
/usr/local/lib/perl5/site_perl/5.8.5/Bio/Index/Abstract.pm:392
> > STACK Bio::Index::Abstract::new
> >
>
/usr/local/lib/perl5/site_perl/5.8.5/Bio/Index/Abstract.pm:150
> > STACK Bio::Index::AbstractSeq::new
> >
>
/usr/local/lib/perl5/site_perl/5.8.5/Bio/Index/AbstractSeq.pm:91
> > STACK toplevel get_ortho.pl:31
> > 
> > I know that the file exists, and has been
> formatted as
> > a database to use BLAST search.
> > 
> > sumit
> > 
> > --- Brian Osborne <brian_osborne@cognia.com>
> wrote:
> > 
> >> Sumit,
> >> 
> >> In perl 5.8 a module that's using a tied hash is
> >> supposed to have an EXISTS
> >> method, but it appears that AnyDBM_File doesn't.
> You
> >> could try using
> >> Bio::Index::Fasta instead, or Bio::DB::Flat.
> >> 
> >> Brian O.
> >> 
> >> 
> >> On 6/22/05 6:24 PM, "sumit middha"
> >> <sm_middha@yahoo.com> wrote:
> >> 
> >>> 
> >>> Hello,
> >>> 
> >>> I have a trouble with using fasta module
> >>> 
> >>> I use the required statements
> >>> 
> >>> use Bio::DB::Fasta;
> >>> use Bio::Seq;
> >>> 
> >>> The error was:
> >>> 
> >>> AnyDBM_File doesn't define an EXISTS method at
> >>> 
> >>
> /usr/local/lib/perl5/site_perl/5.8.5/Bio/DB/Fasta.pm
> >>> line 577
> >>> 
> >>> thanks,
> >>> sm


__________________________________ 
Do you Yahoo!? 
Make Yahoo! your home page 
http://www.yahoo.com/r/hs
From lehvasla at ebi.ac.uk  Wed Jul  6 17:40:49 2005
From: lehvasla at ebi.ac.uk (lehvasla@ebi.ac.uk)
Date: Wed Jul  6 18:09:21 2005
Subject: [Bioperl-l] FASTA.pm issue
In-Reply-To: <20050706163705.67979.qmail@web30711.mail.mud.yahoo.com>
References: <BEE079CE.2043%brian_osborne@cognia.com>
	<20050706163705.67979.qmail@web30711.mail.mud.yahoo.com>
Message-ID: <49934.84.12.20.100.1120686049.squirrel@webmail.ebi.ac.uk>


Dumit,

Your code works under perl v5.8.4. I do not get any errors or warnings.
There has to be some change between perl releases. What is the version of
your AnyDBM_File? Mine is

perl -MAnyDBM_File -le 'print AnyDBM_File->VERSION;'
1.00

     -Heikki


>
> Well heres a small test code I made to explain my
> problem. Please let me know your suggestions.
> Thanks.
>
> --------------code-----------------
> #!/usr/bin/perl -w
> use strict;
> use Bio::DB::Fasta;
> use Bio::DB::Flat;
> use Bio::Index::Fasta;
> use Bio::Seq;
>
> my $db = Bio::DB::Fasta->new("f1");
> #my $db = Bio::Index::Fasta->new("f1");
> my $seqobj = $db->get_Seq_by_id("abc");
> my $str = $seqobj->seq();
> print $str;
>
> exit;
> -----------end of code ------------
>
> And here is the error I get (which I did not a few
> months back)
>
>> perl -w test.pl
> AnyDBM_File doesn't define an EXISTS method at
> /usr/local/lib/perl5/site_perl/5.8.5/Bio/DB/Fasta.pm
> line 577
>
> and f1 fasta file is
>> cat f1
>>abc
> AGCATCG
>
>
> --- Brian Osborne <brian_osborne@cognia.com> wrote:
>
>> Sumit,
>>
>> You'll have to show us the code that gives you the
>> error, I think.
>>
>>
>> Brian O.
>>
>>
>> On 6/23/05 1:07 PM, "sumit middha"
>> <sm_middha@yahoo.com> wrote:
>>
>> >
>> > Thanks for the reply Brian.
>> > Changing it to Bio::Index::Fasta helped, but gave
>> > another problem in my script, which I dont have a
>> > clue.
>> >
>> > ------------- EXCEPTION  -------------
>> > MSG: Can't open 'SDBM_File' dbm file
>> > '../Dyak/dyak_chr_ucsc.fa.rev' : No such file or
>> > directory
>> > STACK Bio::Index::Abstract::open_dbm
>> >
>>
> /usr/local/lib/perl5/site_perl/5.8.5/Bio/Index/Abstract.pm:392
>> > STACK Bio::Index::Abstract::new
>> >
>>
> /usr/local/lib/perl5/site_perl/5.8.5/Bio/Index/Abstract.pm:150
>> > STACK Bio::Index::AbstractSeq::new
>> >
>>
> /usr/local/lib/perl5/site_perl/5.8.5/Bio/Index/AbstractSeq.pm:91
>> > STACK toplevel get_ortho.pl:31
>> >
>> > I know that the file exists, and has been
>> formatted as
>> > a database to use BLAST search.
>> >
>> > sumit
>> >
>> > --- Brian Osborne <brian_osborne@cognia.com>
>> wrote:
>> >
>> >> Sumit,
>> >>
>> >> In perl 5.8 a module that's using a tied hash is
>> >> supposed to have an EXISTS
>> >> method, but it appears that AnyDBM_File doesn't.
>> You
>> >> could try using
>> >> Bio::Index::Fasta instead, or Bio::DB::Flat.
>> >>
>> >> Brian O.
>> >>
>> >>
>> >> On 6/22/05 6:24 PM, "sumit middha"
>> >> <sm_middha@yahoo.com> wrote:
>> >>
>> >>>
>> >>> Hello,
>> >>>
>> >>> I have a trouble with using fasta module
>> >>>
>> >>> I use the required statements
>> >>>
>> >>> use Bio::DB::Fasta;
>> >>> use Bio::Seq;
>> >>>
>> >>> The error was:
>> >>>
>> >>> AnyDBM_File doesn't define an EXISTS method at
>> >>>
>> >>
>> /usr/local/lib/perl5/site_perl/5.8.5/Bio/DB/Fasta.pm
>> >>> line 577
>> >>>
>> >>> thanks,
>> >>> sm
>
>
>
> __________________________________
> Do you Yahoo!?
> Make Yahoo! your home page
> http://www.yahoo.com/r/hs
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>


From lehvasla at ebi.ac.uk  Wed Jul  6 17:40:57 2005
From: lehvasla at ebi.ac.uk (lehvasla@ebi.ac.uk)
Date: Wed Jul  6 18:09:27 2005
Subject: [Bioperl-l] FASTA.pm issue
Message-ID: <49935.84.12.20.100.1120686057.squirrel@webmail.ebi.ac.uk>


Dumit,

Your code works under perl v5.8.4. I do not get any errors or warnings.
There has to be some change between perl releases. What is the version of
your AnyDBM_File? Mine is

perl -MAnyDBM_File -le 'print AnyDBM_File->VERSION;'
1.00

     -Heikki


>
> Well heres a small test code I made to explain my
> problem. Please let me know your suggestions.
> Thanks.
>
> --------------code-----------------
> #!/usr/bin/perl -w
> use strict;
> use Bio::DB::Fasta;
> use Bio::DB::Flat;
> use Bio::Index::Fasta;
> use Bio::Seq;
>
> my $db = Bio::DB::Fasta->new("f1");
> #my $db = Bio::Index::Fasta->new("f1");
> my $seqobj = $db->get_Seq_by_id("abc");
> my $str = $seqobj->seq();
> print $str;
>
> exit;
> -----------end of code ------------
>
> And here is the error I get (which I did not a few
> months back)
>
>> perl -w test.pl
> AnyDBM_File doesn't define an EXISTS method at
> /usr/local/lib/perl5/site_perl/5.8.5/Bio/DB/Fasta.pm
> line 577
>
> and f1 fasta file is
>> cat f1
>>abc
> AGCATCG
>
>
> --- Brian Osborne <brian_osborne@cognia.com> wrote:
>
>> Sumit,
>>
>> You'll have to show us the code that gives you the
>> error, I think.
>>
>>
>> Brian O.
>>
>>
>> On 6/23/05 1:07 PM, "sumit middha"
>> <sm_middha@yahoo.com> wrote:
>>
>> >
>> > Thanks for the reply Brian.
>> > Changing it to Bio::Index::Fasta helped, but gave
>> > another problem in my script, which I dont have a
>> > clue.
>> >
>> > ------------- EXCEPTION  -------------
>> > MSG: Can't open 'SDBM_File' dbm file
>> > '../Dyak/dyak_chr_ucsc.fa.rev' : No such file or
>> > directory
>> > STACK Bio::Index::Abstract::open_dbm
>> >
>>
> /usr/local/lib/perl5/site_perl/5.8.5/Bio/Index/Abstract.pm:392
>> > STACK Bio::Index::Abstract::new
>> >
>>
> /usr/local/lib/perl5/site_perl/5.8.5/Bio/Index/Abstract.pm:150
>> > STACK Bio::Index::AbstractSeq::new
>> >
>>
> /usr/local/lib/perl5/site_perl/5.8.5/Bio/Index/AbstractSeq.pm:91
>> > STACK toplevel get_ortho.pl:31
>> >
>> > I know that the file exists, and has been
>> formatted as
>> > a database to use BLAST search.
>> >
>> > sumit
>> >
>> > --- Brian Osborne <brian_osborne@cognia.com>
>> wrote:
>> >
>> >> Sumit,
>> >>
>> >> In perl 5.8 a module that's using a tied hash is
>> >> supposed to have an EXISTS
>> >> method, but it appears that AnyDBM_File doesn't.
>> You
>> >> could try using
>> >> Bio::Index::Fasta instead, or Bio::DB::Flat.
>> >>
>> >> Brian O.
>> >>
>> >>
>> >> On 6/22/05 6:24 PM, "sumit middha"
>> >> <sm_middha@yahoo.com> wrote:
>> >>
>> >>>
>> >>> Hello,
>> >>>
>> >>> I have a trouble with using fasta module
>> >>>
>> >>> I use the required statements
>> >>>
>> >>> use Bio::DB::Fasta;
>> >>> use Bio::Seq;
>> >>>
>> >>> The error was:
>> >>>
>> >>> AnyDBM_File doesn't define an EXISTS method at
>> >>>
>> >>
>> /usr/local/lib/perl5/site_perl/5.8.5/Bio/DB/Fasta.pm
>> >>> line 577
>> >>>
>> >>> thanks,
>> >>> sm
>
>
>
> __________________________________
> Do you Yahoo!?
> Make Yahoo! your home page
> http://www.yahoo.com/r/hs
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>


From mark.schreiber at novartis.com  Wed Jul  6 20:59:13 2005
From: mark.schreiber at novartis.com (mark.schreiber@novartis.com)
Date: Wed Jul  6 21:23:52 2005
Subject: [Bioperl-l] RE: SeqWithQuality and biosql
Message-ID: <OF819DE46D.80C07DD4-ON48257037.00052036-48257037.00056CAD@EU.novartis.net>

Good point. I would prefer a system that only uses delimiters for 
ambiguous cases like the one you show but I guess thats pretty complex so 
maybe delimiters for every sub-alphabet.

- Mark


Hilmar Lapp <hlapp@gnf.org>
07/07/2005 12:30 AM

 
        To:     Mark Schreiber/GP/Novartis@PH
        cc:     "Richard HOLLAND" <hollandr@gis.a-star.edu.sg>, Bioperl 
<bioperl-l@bioperl.org>, biosql-l@open-bio.org
        Subject:        Re: [Bioperl-l] RE: SeqWithQuality and biosql


On Jul 5, 2005, at 10:37 PM, mark.schreiber@novartis.com wrote:

> Actually under my proposal
>
> (a(17)) would imply (DNAx(SubInteger[0..9]xSubInteger[0..9]))
>

That's why I didn't like it - how would you encode 
(DNAx(SubInteger[0..99]xSubInteger[0..99]) in this proposal? Require 
each component to be two-digit? There ought to be delimiters between 
the operands, no?

                 -hilmar
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From sm_middha at yahoo.com  Thu Jul  7 01:05:50 2005
From: sm_middha at yahoo.com (sumit middha)
Date: Thu Jul  7 00:56:56 2005
Subject: [Bioperl-l] FASTA.pm issue
In-Reply-To: <49935.84.12.20.100.1120686057.squirrel@webmail.ebi.ac.uk>
Message-ID: <20050707050551.28599.qmail@web30710.mail.mud.yahoo.com>


Mine is 

> perl -v

This is perl, v5.8.5 built for sun4-solaris

> perl -MAnyDBM_File -le 'print AnyDBM_File->VERSION;'
1.00

:( any guesses ??


--- lehvasla@ebi.ac.uk wrote:

> 
> Dumit,
> 
> Your code works under perl v5.8.4. I do not get any
> errors or warnings.
> There has to be some change between perl releases.
> What is the version of
> your AnyDBM_File? Mine is
> 
> perl -MAnyDBM_File -le 'print AnyDBM_File->VERSION;'
> 1.00
> 
>      -Heikki
> 
> 
> >
> > Well heres a small test code I made to explain my
> > problem. Please let me know your suggestions.
> > Thanks.
> >
> > --------------code-----------------
> > #!/usr/bin/perl -w
> > use strict;
> > use Bio::DB::Fasta;
> > use Bio::DB::Flat;
> > use Bio::Index::Fasta;
> > use Bio::Seq;
> >
> > my $db = Bio::DB::Fasta->new("f1");
> > #my $db = Bio::Index::Fasta->new("f1");
> > my $seqobj = $db->get_Seq_by_id("abc");
> > my $str = $seqobj->seq();
> > print $str;
> >
> > exit;
> > -----------end of code ------------
> >
> > And here is the error I get (which I did not a few
> > months back)
> >
> >> perl -w test.pl
> > AnyDBM_File doesn't define an EXISTS method at
> >
> /usr/local/lib/perl5/site_perl/5.8.5/Bio/DB/Fasta.pm
> > line 577
> >
> > and f1 fasta file is
> >> cat f1
> >>abc
> > AGCATCG
> >
> >
> > --- Brian Osborne <brian_osborne@cognia.com>
> wrote:
> >
> >> Sumit,
> >>
> >> You'll have to show us the code that gives you
> the
> >> error, I think.
> >>
> >>
> >> Brian O.
> >>
> >>
> >> On 6/23/05 1:07 PM, "sumit middha"
> >> <sm_middha@yahoo.com> wrote:
> >>
> >> >
> >> > Thanks for the reply Brian.
> >> > Changing it to Bio::Index::Fasta helped, but
> gave
> >> > another problem in my script, which I dont have
> a
> >> > clue.
> >> >
> >> > ------------- EXCEPTION  -------------
> >> > MSG: Can't open 'SDBM_File' dbm file
> >> > '../Dyak/dyak_chr_ucsc.fa.rev' : No such file
> or
> >> > directory
> >> > STACK Bio::Index::Abstract::open_dbm
> >> >
> >>
> >
>
/usr/local/lib/perl5/site_perl/5.8.5/Bio/Index/Abstract.pm:392
> >> > STACK Bio::Index::Abstract::new
> >> >
> >>
> >
>
/usr/local/lib/perl5/site_perl/5.8.5/Bio/Index/Abstract.pm:150
> >> > STACK Bio::Index::AbstractSeq::new
> >> >
> >>
> >
>
/usr/local/lib/perl5/site_perl/5.8.5/Bio/Index/AbstractSeq.pm:91
> >> > STACK toplevel get_ortho.pl:31
> >> >
> >> > I know that the file exists, and has been
> >> formatted as
> >> > a database to use BLAST search.
> >> >
> >> > sumit
> >> >
> >> > --- Brian Osborne <brian_osborne@cognia.com>
> >> wrote:
> >> >
> >> >> Sumit,
> >> >>
> >> >> In perl 5.8 a module that's using a tied hash
> is
> >> >> supposed to have an EXISTS
> >> >> method, but it appears that AnyDBM_File
> doesn't.
> >> You
> >> >> could try using
> >> >> Bio::Index::Fasta instead, or Bio::DB::Flat.
> >> >>
> >> >> Brian O.
> >> >>
> >> >>
> >> >> On 6/22/05 6:24 PM, "sumit middha"
> >> >> <sm_middha@yahoo.com> wrote:
> >> >>
> >> >>>
> >> >>> Hello,
> >> >>>
> >> >>> I have a trouble with using fasta module
> >> >>>
> >> >>> I use the required statements
> >> >>>
> >> >>> use Bio::DB::Fasta;
> >> >>> use Bio::Seq;
> >> >>>
> >> >>> The error was:
> >> >>>
> >> >>> AnyDBM_File doesn't define an EXISTS method
> at
> >> >>>
> >> >>
> >>
> /usr/local/lib/perl5/site_perl/5.8.5/Bio/DB/Fasta.pm
> >> >>> line 577
> >> >>>
> >> >>> thanks,
> >> >>> sm
> >
> >
> >
> > __________________________________
> > Do you Yahoo!?
> > Make Yahoo! your home page
> > http://www.yahoo.com/r/hs
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@portal.open-bio.org
> >
>
http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> 
> 


____________________________________________________
Sell on Yahoo! Auctions ? no fees. Bid on great items.  
http://auctions.yahoo.com/
From sm_middha at yahoo.com  Thu Jul  7 01:05:50 2005
From: sm_middha at yahoo.com (sumit middha)
Date: Thu Jul  7 00:56:58 2005
Subject: [Bioperl-l] FASTA.pm issue
In-Reply-To: <49935.84.12.20.100.1120686057.squirrel@webmail.ebi.ac.uk>
Message-ID: <20050707050551.28599.qmail@web30710.mail.mud.yahoo.com>


Mine is 

> perl -v

This is perl, v5.8.5 built for sun4-solaris

> perl -MAnyDBM_File -le 'print AnyDBM_File->VERSION;'
1.00

:( any guesses ??


--- lehvasla@ebi.ac.uk wrote:

> 
> Dumit,
> 
> Your code works under perl v5.8.4. I do not get any
> errors or warnings.
> There has to be some change between perl releases.
> What is the version of
> your AnyDBM_File? Mine is
> 
> perl -MAnyDBM_File -le 'print AnyDBM_File->VERSION;'
> 1.00
> 
>      -Heikki
> 
> 
> >
> > Well heres a small test code I made to explain my
> > problem. Please let me know your suggestions.
> > Thanks.
> >
> > --------------code-----------------
> > #!/usr/bin/perl -w
> > use strict;
> > use Bio::DB::Fasta;
> > use Bio::DB::Flat;
> > use Bio::Index::Fasta;
> > use Bio::Seq;
> >
> > my $db = Bio::DB::Fasta->new("f1");
> > #my $db = Bio::Index::Fasta->new("f1");
> > my $seqobj = $db->get_Seq_by_id("abc");
> > my $str = $seqobj->seq();
> > print $str;
> >
> > exit;
> > -----------end of code ------------
> >
> > And here is the error I get (which I did not a few
> > months back)
> >
> >> perl -w test.pl
> > AnyDBM_File doesn't define an EXISTS method at
> >
> /usr/local/lib/perl5/site_perl/5.8.5/Bio/DB/Fasta.pm
> > line 577
> >
> > and f1 fasta file is
> >> cat f1
> >>abc
> > AGCATCG
> >
> >
> > --- Brian Osborne <brian_osborne@cognia.com>
> wrote:
> >
> >> Sumit,
> >>
> >> You'll have to show us the code that gives you
> the
> >> error, I think.
> >>
> >>
> >> Brian O.
> >>
> >>
> >> On 6/23/05 1:07 PM, "sumit middha"
> >> <sm_middha@yahoo.com> wrote:
> >>
> >> >
> >> > Thanks for the reply Brian.
> >> > Changing it to Bio::Index::Fasta helped, but
> gave
> >> > another problem in my script, which I dont have
> a
> >> > clue.
> >> >
> >> > ------------- EXCEPTION  -------------
> >> > MSG: Can't open 'SDBM_File' dbm file
> >> > '../Dyak/dyak_chr_ucsc.fa.rev' : No such file
> or
> >> > directory
> >> > STACK Bio::Index::Abstract::open_dbm
> >> >
> >>
> >
>
/usr/local/lib/perl5/site_perl/5.8.5/Bio/Index/Abstract.pm:392
> >> > STACK Bio::Index::Abstract::new
> >> >
> >>
> >
>
/usr/local/lib/perl5/site_perl/5.8.5/Bio/Index/Abstract.pm:150
> >> > STACK Bio::Index::AbstractSeq::new
> >> >
> >>
> >
>
/usr/local/lib/perl5/site_perl/5.8.5/Bio/Index/AbstractSeq.pm:91
> >> > STACK toplevel get_ortho.pl:31
> >> >
> >> > I know that the file exists, and has been
> >> formatted as
> >> > a database to use BLAST search.
> >> >
> >> > sumit
> >> >
> >> > --- Brian Osborne <brian_osborne@cognia.com>
> >> wrote:
> >> >
> >> >> Sumit,
> >> >>
> >> >> In perl 5.8 a module that's using a tied hash
> is
> >> >> supposed to have an EXISTS
> >> >> method, but it appears that AnyDBM_File
> doesn't.
> >> You
> >> >> could try using
> >> >> Bio::Index::Fasta instead, or Bio::DB::Flat.
> >> >>
> >> >> Brian O.
> >> >>
> >> >>
> >> >> On 6/22/05 6:24 PM, "sumit middha"
> >> >> <sm_middha@yahoo.com> wrote:
> >> >>
> >> >>>
> >> >>> Hello,
> >> >>>
> >> >>> I have a trouble with using fasta module
> >> >>>
> >> >>> I use the required statements
> >> >>>
> >> >>> use Bio::DB::Fasta;
> >> >>> use Bio::Seq;
> >> >>>
> >> >>> The error was:
> >> >>>
> >> >>> AnyDBM_File doesn't define an EXISTS method
> at
> >> >>>
> >> >>
> >>
> /usr/local/lib/perl5/site_perl/5.8.5/Bio/DB/Fasta.pm
> >> >>> line 577
> >> >>>
> >> >>> thanks,
> >> >>> sm
> >
> >
> >
> > __________________________________
> > Do you Yahoo!?
> > Make Yahoo! your home page
> > http://www.yahoo.com/r/hs
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@portal.open-bio.org
> >
>
http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> 
> 


____________________________________________________
Sell on Yahoo! Auctions ? no fees. Bid on great items.  
http://auctions.yahoo.com/
From khoueiry at ibdm.univ-mrs.fr  Thu Jul  7 04:42:30 2005
From: khoueiry at ibdm.univ-mrs.fr (khoueiry)
Date: Thu Jul  7 04:32:43 2005
Subject: [Bioperl-l] FASTA.pm issue
In-Reply-To: <20050706163705.67979.qmail@web30711.mail.mud.yahoo.com>
References: <20050706163705.67979.qmail@web30711.mail.mud.yahoo.com>
Message-ID: <1120725750.27317.1.camel@DavidLinux>

Hi sumit,

I suggest you to change your index method. Try that... (In fact your
code and the below one works well for me)

------
#!/usr/bin/perl -w
use strict;
use Bio::Index::Fasta;


#Indexing....
my $type = $ENV{'BIOPER_INDEX_TYPE'};
if ($type) {
  $Bio::Index::Abstract::USE_DBM_TYPE = $type;
}

my $index = Bio::Index::Fasta->new( "/home/pierre/BioperlTest/f1.idx",
'WRITE' );
$index->make_index("/home/pierre/BioperlTest/f1");


my $seqobj = $index->fetch("abc");
my $str = $seqobj->seq();
print $str."\n";

exit; 

-----------

Le mercredi 06 juillet 2005 ? 09:37 -0700, sumit middha a ?crit :

> Well heres a small test code I made to explain my
> problem. Please let me know your suggestions.
> Thanks.
> 
> --------------code-----------------
> #!/usr/bin/perl -w
> use strict;
> use Bio::DB::Fasta;
> use Bio::DB::Flat;
> use Bio::Index::Fasta;
> use Bio::Seq;
> 
> my $db = Bio::DB::Fasta->new("f1");
> #my $db = Bio::Index::Fasta->new("f1");
> my $seqobj = $db->get_Seq_by_id("abc"); 
> my $str = $seqobj->seq();
> print $str;
> 
> exit; 
> -----------end of code ------------
> 
> And here is the error I get (which I did not a few
> months back)
> 
> > perl -w test.pl
> AnyDBM_File doesn't define an EXISTS method at
> /usr/local/lib/perl5/site_perl/5.8.5/Bio/DB/Fasta.pm
> line 577
> 
> and f1 fasta file is 
> > cat f1
> >abc
> AGCATCG
> 
> 
> --- Brian Osborne <brian_osborne@cognia.com> wrote:
> 
> > Sumit,
> > 
> > You'll have to show us the code that gives you the
> > error, I think.
> > 
> > 
> > Brian O.
> > 
> > 
> > On 6/23/05 1:07 PM, "sumit middha"
> > <sm_middha@yahoo.com> wrote:
> > 
> > > 
> > > Thanks for the reply Brian.
> > > Changing it to Bio::Index::Fasta helped, but gave
> > > another problem in my script, which I dont have a
> > > clue.
> > > 
> > > ------------- EXCEPTION  -------------
> > > MSG: Can't open 'SDBM_File' dbm file
> > > '../Dyak/dyak_chr_ucsc.fa.rev' : No such file or
> > > directory
> > > STACK Bio::Index::Abstract::open_dbm
> > >
> >
> /usr/local/lib/perl5/site_perl/5.8.5/Bio/Index/Abstract.pm:392
> > > STACK Bio::Index::Abstract::new
> > >
> >
> /usr/local/lib/perl5/site_perl/5.8.5/Bio/Index/Abstract.pm:150
> > > STACK Bio::Index::AbstractSeq::new
> > >
> >
> /usr/local/lib/perl5/site_perl/5.8.5/Bio/Index/AbstractSeq.pm:91
> > > STACK toplevel get_ortho.pl:31
> > > 
> > > I know that the file exists, and has been
> > formatted as
> > > a database to use BLAST search.
> > > 
> > > sumit
> > > 
> > > --- Brian Osborne <brian_osborne@cognia.com>
> > wrote:
> > > 
> > >> Sumit,
> > >> 
> > >> In perl 5.8 a module that's using a tied hash is
> > >> supposed to have an EXISTS
> > >> method, but it appears that AnyDBM_File doesn't.
> > You
> > >> could try using
> > >> Bio::Index::Fasta instead, or Bio::DB::Flat.
> > >> 
> > >> Brian O.
> > >> 
> > >> 
> > >> On 6/22/05 6:24 PM, "sumit middha"
> > >> <sm_middha@yahoo.com> wrote:
> > >> 
> > >>> 
> > >>> Hello,
> > >>> 
> > >>> I have a trouble with using fasta module
> > >>> 
> > >>> I use the required statements
> > >>> 
> > >>> use Bio::DB::Fasta;
> > >>> use Bio::Seq;
> > >>> 
> > >>> The error was:
> > >>> 
> > >>> AnyDBM_File doesn't define an EXISTS method at
> > >>> 
> > >>
> > /usr/local/lib/perl5/site_perl/5.8.5/Bio/DB/Fasta.pm
> > >>> line 577
> > >>> 
> > >>> thanks,
> > >>> sm
> 
> 
> 		
> __________________________________ 
> Do you Yahoo!? 
> Make Yahoo! your home page 
> http://www.yahoo.com/r/hs
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
From heikki at ebi.ac.uk  Thu Jul  7 05:13:45 2005
From: heikki at ebi.ac.uk (Heikki Lehvaslaiho)
Date: Thu Jul  7 05:05:13 2005
Subject: [Bioperl-l] use cases for SeqWithQuality, please
Message-ID: <200507071013.45813.heikki@ebi.ac.uk>


I've compared Bio::Seq::SeqWithQuality with Bio::Seq::MetaI schema and there 
does not seem to be too many differences. All the functionality seems to be 
there already. The main problem is that there are many different ways to call 
the constructor. 

There are so many ways to call it and some methods are already depreciated 
that it would be better to write a replacement module than try to rewrite all 
methods.


Could I ask those who use  Bio::Seq::SeqWithQuality now to send be sample code 
that shows how they call this module in practise. 

With that information I could write a Bio::Seq::Quality that implements 
Bio::Seq::MetaI and we could depreciate Bio::Seq::SeqWithQuality.

Are you happy with that, Chad?

 -Heikki

-- 
______ _/      _/_____________________________________________________
      _/      _/                      http://www.ebi.ac.uk/mutations/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_ebi _ac _uk
    _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
   _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
  _/  _/  _/  Cambridge, CB10 1SD, United Kingdom
     _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________
From victor.ruotti at gmail.com  Thu Jul  7 11:05:24 2005
From: victor.ruotti at gmail.com (Victor)
Date: Thu Jul  7 10:56:32 2005
Subject: [Bioperl-l] Overlapping Features with GFF dbase
Message-ID: <36d7e55505070708053442bb23@mail.gmail.com>

Hello,
I was wondering if someone can point me out on how best to retrieve a set of 
overlapping features from the GFF schema. Right now I am looking at the 
GFF.pm <http://GFF.pm> to do this by:

use Bio::DB::GFF;
my $db = Bio::DB:GFF->new(-dsn =>'mydbase',
-aggregators =>'gene_model{CDS ,five_prime_UTR,three_prime_UTR'});

my $gene_stream = $db=>get_seq_stream('gene_model:UCSC_hg16');

while (my $gene = $gene_stream->next_seq) {
print $gene->name, "\n";
for my $part ($gene->get_SeqFeatures) {
print "\t", join("\t", $part->method,$part->start,$part->end), "\n";
}
print "\n";
}

This gets all the genes from the GFF schema. Should I be using another while 
loop to retrieve other features that overlap with these genes? It there a 
bioperl module to retrieve overlapping features? I would like to be able to 
get all the features that overlap with a particular gene or a whole set of 
genes.

Thanks in advance.
Victor

From sm_middha at yahoo.com  Thu Jul  7 11:11:26 2005
From: sm_middha at yahoo.com (sumit middha)
Date: Thu Jul  7 11:02:52 2005
Subject: [Bioperl-l] FASTA.pm issue
In-Reply-To: <1120725750.27317.1.camel@DavidLinux>
Message-ID: <20050707151126.4199.qmail@web30705.mail.mud.yahoo.com>


Nopes, that did not help either. I tried it on a
different machine and both the codes worked. My guess
is that something might have gone bad with the perl
installed in this machine, but cannot guess what it
can be, and how to correct that !

> perl test.pl
Use of uninitialized value in numeric gt (>) at
/usr/local/lib/perl5/5.8.5/sun4-solaris/DB_File.pm
line 271.
Deep recursion on subroutine "DB_File::AUTOLOAD" at
/usr/local/lib/perl5/5.8.5/sun4-solaris/DB_File.pm
line 234.

Thanks for your help.

--- khoueiry <khoueiry@ibdm.univ-mrs.fr> wrote:

> Hi sumit,
> 
> I suggest you to change your index method. Try
> that... (In fact your
> code and the below one works well for me)
> 
> ------
> #!/usr/bin/perl -w
> use strict;
> use Bio::Index::Fasta;
> 
> 
> #Indexing....
> my $type = $ENV{'BIOPER_INDEX_TYPE'};
> if ($type) {
>   $Bio::Index::Abstract::USE_DBM_TYPE = $type;
> }
> 
> my $index = Bio::Index::Fasta->new(
> "/home/pierre/BioperlTest/f1.idx",
> 'WRITE' );
> $index->make_index("/home/pierre/BioperlTest/f1");
> 
> 
> my $seqobj = $index->fetch("abc");
> my $str = $seqobj->seq();
> print $str."\n";
> 
> exit; 
> 
> -----------
> 
> Le mercredi 06 juillet 2005 ??? 09:37 -0700, sumit
> middha a ???crit :
> 
> > Well heres a small test code I made to explain my
> > problem. Please let me know your suggestions.
> > Thanks.
> > 
> > --------------code-----------------
> > #!/usr/bin/perl -w
> > use strict;
> > use Bio::DB::Fasta;
> > use Bio::DB::Flat;
> > use Bio::Index::Fasta;
> > use Bio::Seq;
> > 
> > my $db = Bio::DB::Fasta->new("f1");
> > #my $db = Bio::Index::Fasta->new("f1");
> > my $seqobj = $db->get_Seq_by_id("abc"); 
> > my $str = $seqobj->seq();
> > print $str;
> > 
> > exit; 
> > -----------end of code ------------
> > 
> > And here is the error I get (which I did not a few
> > months back)
> > 
> > > perl -w test.pl
> > AnyDBM_File doesn't define an EXISTS method at
> >
> /usr/local/lib/perl5/site_perl/5.8.5/Bio/DB/Fasta.pm
> > line 577
> > 
> > and f1 fasta file is 
> > > cat f1
> > >abc
> > AGCATCG
> > 
> > 
> > --- Brian Osborne <brian_osborne@cognia.com>
> wrote:
> > 
> > > Sumit,
> > > 
> > > You'll have to show us the code that gives you
> the
> > > error, I think.
> > > 
> > > 
> > > Brian O.
> > > 
> > > 
> > > On 6/23/05 1:07 PM, "sumit middha"
> > > <sm_middha@yahoo.com> wrote:
> > > 
> > > > 
> > > > Thanks for the reply Brian.
> > > > Changing it to Bio::Index::Fasta helped, but
> gave
> > > > another problem in my script, which I dont
> have a
> > > > clue.
> > > > 
> > > > ------------- EXCEPTION  -------------
> > > > MSG: Can't open 'SDBM_File' dbm file
> > > > '../Dyak/dyak_chr_ucsc.fa.rev' : No such file
> or
> > > > directory
> > > > STACK Bio::Index::Abstract::open_dbm
> > > >
> > >
> >
>
/usr/local/lib/perl5/site_perl/5.8.5/Bio/Index/Abstract.pm:392
> > > > STACK Bio::Index::Abstract::new
> > > >
> > >
> >
>
/usr/local/lib/perl5/site_perl/5.8.5/Bio/Index/Abstract.pm:150
> > > > STACK Bio::Index::AbstractSeq::new
> > > >
> > >
> >
>
/usr/local/lib/perl5/site_perl/5.8.5/Bio/Index/AbstractSeq.pm:91
> > > > STACK toplevel get_ortho.pl:31
> > > > 
> > > > I know that the file exists, and has been
> > > formatted as
> > > > a database to use BLAST search.
> > > > 
> > > > sumit
> > > > 
> > > > --- Brian Osborne <brian_osborne@cognia.com>
> > > wrote:
> > > > 
> > > >> Sumit,
> > > >> 
> > > >> In perl 5.8 a module that's using a tied hash
> is
> > > >> supposed to have an EXISTS
> > > >> method, but it appears that AnyDBM_File
> doesn't.
> > > You
> > > >> could try using
> > > >> Bio::Index::Fasta instead, or Bio::DB::Flat.
> > > >> 
> > > >> Brian O.
> > > >> 
> > > >> 
> > > >> On 6/22/05 6:24 PM, "sumit middha"
> > > >> <sm_middha@yahoo.com> wrote:
> > > >> 
> > > >>> 
> > > >>> Hello,
> > > >>> 
> > > >>> I have a trouble with using fasta module
> > > >>> 
> > > >>> I use the required statements
> > > >>> 
> > > >>> use Bio::DB::Fasta;
> > > >>> use Bio::Seq;
> > > >>> 
> > > >>> The error was:
> > > >>> 
> > > >>> AnyDBM_File doesn't define an EXISTS method
> at
> > > >>> 
> > > >>
> > >
> /usr/local/lib/perl5/site_perl/5.8.5/Bio/DB/Fasta.pm
> > > >>> line 577
> > > >>> 
> > > >>> thanks,
> > > >>> sm
> > 
> > 
> > 		
> > __________________________________ 
> > Do you Yahoo!? 
> > Make Yahoo! your home page 
> > http://www.yahoo.com/r/hs
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@portal.open-bio.org
> >
>
http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 


____________________________________________________
Sell on Yahoo! Auctions ? no fees. Bid on great items.  
http://auctions.yahoo.com/
From lstein at cshl.edu  Thu Jul  7 12:28:39 2005
From: lstein at cshl.edu (Lincoln Stein)
Date: Thu Jul  7 12:21:29 2005
Subject: [Bioperl-l] Overlapping Features with GFF dbase
In-Reply-To: <36d7e55505070708053442bb23@mail.gmail.com>
References: <36d7e55505070708053442bb23@mail.gmail.com>
Message-ID: <200507071228.40400.lstein@cshl.edu>

Hi Victor,

Once you get a gene, you can do this:

	my @overlapping_features = $gene->features;

The same filtering syntax that you use, as well as the get_seq_stream() method 
call, works with features as well as segments.

Lincoln

On Thursday 07 July 2005 11:05 am, Victor wrote:
> Hello,
> I was wondering if someone can point me out on how best to retrieve a set
> of overlapping features from the GFF schema. Right now I am looking at the
> GFF.pm <http://GFF.pm> to do this by:
>
> use Bio::DB::GFF;
> my $db = Bio::DB:GFF->new(-dsn =>'mydbase',
> -aggregators =>'gene_model{CDS ,five_prime_UTR,three_prime_UTR'});
>
> my $gene_stream = $db=>get_seq_stream('gene_model:UCSC_hg16');
>
> while (my $gene = $gene_stream->next_seq) {
> print $gene->name, "\n";
> for my $part ($gene->get_SeqFeatures) {
> print "\t", join("\t", $part->method,$part->start,$part->end), "\n";
> }
> print "\n";
> }
>
> This gets all the genes from the GFF schema. Should I be using another
> while loop to retrieve other features that overlap with these genes? It
> there a bioperl module to retrieve overlapping features? I would like to be
> able to get all the features that overlap with a particular gene or a whole
> set of genes.
>
> Thanks in advance.
> Victor
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
From F.Zhang at surrey.ac.uk  Thu Jul  7 12:09:43 2005
From: F.Zhang at surrey.ac.uk (F.Zhang)
Date: Thu Jul  7 23:16:31 2005
Subject: [Bioperl-l] about features extraction from PDB files
Message-ID: <FB87FF23EB14174C9AF6E259DE2F5A430B0F0B@EVS-EC1-NODE1.surrey.ac.uk>

Skipped content of type multipart/alternative-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/jpeg
Size: 7957 bytes
Desc: image001.jpg
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050707/1637979e/attachment.jpg
From fgarret at ub.edu  Fri Jul  8 10:39:21 2005
From: fgarret at ub.edu (Filipe Garrett)
Date: Fri Jul  8 10:31:26 2005
Subject: [Bioperl-l] How to get the intron phase
Message-ID: <42CE9019.40502@ub.edu>

Hi all,

I'm new to bioperl and I was looking for a way to obtain the intron 
phases from genes in a FASTA format like this:

 >CG3427-RA type=transcript; 
loc=2R:complement(2273725..2274587,2274647..2274996,2275280..2275413,2275634..2275804,2275864..2276117,2276188..2276549,2277349..2277510,2277748..2277924,2278864..2279008,2279228..2279373,2279935..2280127,2280182..2280323,2280392..2280478,2280739..2280836,2281121..2281172,2285453..2285599,2300275..2300819); 
ID=CG3427-RA; name=Epac-RA; 
db_xref=FlyBase:FBtr0086132,FlyBase:FBgn0033102,Gadfly:CG3427-RA; 
release=r4.1; species=dmel; len=4028
CTCTCCAGCGGCGCACAACTCGATCGCTGGCCCAGAGGTTCAGTTCGGTT
TGGTTCGGTTCGGTTTGAATCTCTGCCTCTGTTTACGCCTCTATATC...

I've looked at the script directory and found the phase method inside 
the Bio::SeqFeature::Gene::Intron object, but the examples are from data 
parsed from a GFF file.

Can I bypass the GFF stuff and use the FASTA header information directly?

Thanks in advance,

Bests
From jason.stajich at duke.edu  Fri Jul  8 11:12:04 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Fri Jul  8 11:03:26 2005
Subject: [Bioperl-l] How to get the intron phase
In-Reply-To: <42CE9019.40502@ub.edu>
References: <42CE9019.40502@ub.edu>
Message-ID: <AF9DAEE0-4005-49F2-B94C-D5A7A2A7A6F9@duke.edu>

You can calculate it pretty easily, just build the split-location  
object from the location string.

use Bio::Factory::FTLocationFactory;
my $fh;
my $file = shift @ARGV;
open($fh, "grep '^>' $file") || die;

while(<$fh> ){
     if( /loc=(\S+):(\S+);/ ) {
      my ($seqid,$locationstr) = ( $1,$2);
     my $location = Bio::Factory::FTLocationFactory->from_string 
($locationstr);
   my $runninglength = 0;
    my $i = 0;
    my @exons =  $location->each_Location;
    my $last = scalar @exons;
    for my $exon (@exons) {
    # I may be sloppy here, pls check that this is working the way  
you expect
   # defining A^TG is phase 1 and AT^G is phase 2 i
    my $phase = ( $runninglength += $exon->length) % 3;
     if( $i != $last) {
      print "phase of intron $i is $phase\n";
     }
    $i++;
    }
     }
}

On Jul 8, 2005, at 10:39 AM, Filipe Garrett wrote:

> Hi all,
>
> I'm new to bioperl and I was looking for a way to obtain the intron  
> phases from genes in a FASTA format like this:
>
> >CG3427-RA type=transcript; loc=2R:complement 
> (2273725..2274587,2274647..2274996,2275280..2275413,2275634..2275804,2 
> 275864..2276117,2276188..2276549,2277349..2277510,2277748..2277924,227 
> 8864..2279008,2279228..2279373,2279935..2280127,2280182..2280323,22803 
> 92..2280478,2280739..2280836,2281121..2281172,2285453..2285599,2300275 
> ..2300819); ID=CG3427-RA; name=Epac-RA;  
> db_xref=FlyBase:FBtr0086132,FlyBase:FBgn0033102,Gadfly:CG3427-RA;  
> release=r4.1; species=dmel; len=4028
> CTCTCCAGCGGCGCACAACTCGATCGCTGGCCCAGAGGTTCAGTTCGGTT
> TGGTTCGGTTCGGTTTGAATCTCTGCCTCTGTTTACGCCTCTATATC...
>
> I've looked at the script directory and found the phase method  
> inside the Bio::SeqFeature::Gene::Intron object, but the examples  
> are from data parsed from a GFF file.
>
> Can I bypass the GFF stuff and use the FASTA header information  
> directly?
>
> Thanks in advance,
>
> Bests
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/


From chandan.kr.singh at gmail.com  Fri Jul  8 15:05:23 2005
From: chandan.kr.singh at gmail.com (CHANDAN SINGH)
Date: Fri Jul  8 14:56:16 2005
Subject: [Bioperl-l] DEBUGGED Remoteblast from Bio::Perl through proxy
Message-ID: <2d4f32050708120534f59f1a@mail.gmail.com>

Hi eveybody 
Those of u ,having problem in blasting sequences from Bio::Perl module through 
proxy  and getting 
                              "time  out "  or  " no route to host " errors 
do  need to set the  environment proxy variable ( hello smarty we all know it ) 
and just give the following argument 
                                                     ( env_proxy => 1 ) 
to 
                  $self->{'_ua'} = new LWP::UserAgent(  );
as 
              $self->{'_ua'} = new LWP::UserAgent( env_proxy => 1 );

in  the following sub in Bio::Tools::Run::RemoteBlast.pm
sub ua {
    my ($self, $value) = @_;    
    if( ! defined $self->{'_ua'} ) {
	$self->{'_ua'} = new LWP::UserAgent(  );
	my $nm = ref($self);
	$nm =~ s/::/_/g;
	$self->{'_ua'}->agent("bioperl-$nm/$MODVERSION");
    }
    return $self->{'_ua'};
}
I saw this bug in the stable version and also in the one downloaded
from CVS yesterday .

From chandan.kr.singh at gmail.com  Fri Jul  8 15:37:13 2005
From: chandan.kr.singh at gmail.com (CHANDAN SINGH)
Date: Fri Jul  8 15:28:29 2005
Subject: [Bioperl-l] Remote::Blast
In-Reply-To: <425D8AAF.6040903@ime.usp.br>
References: <425D8AAF.6040903@ime.usp.br>
Message-ID: <2d4f32050708123744e99cfe@mail.gmail.com>

Dear Thiago 
I was out of touch with bioperl for quite sometime and today i solved
my problem
but it seems from your last email that your problem was  slow proxy
connection or
hence time out ,while in my case ,the RemoteBlast.pm module was not reading the 
env proxy variable . I dint used to get any output .You can see the
solution in my recent mail to the group .
 
Regards Chandan


On 4/14/05, Thiago Motta Venancio <venancio@ime.usp.br> wrote:
> Hi all.
> I am using the Remote::Blast module.
> The script was running ok, but it become out because of a 500 error and
> gaves a timeout www.ncbi.nih.go:80.
> Later, it came back, but the vast majority of sequences returned no
> matches, some of them are not really no matches.
> Any lights?
> Thanks in advance
> Thiago
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

From chandan.kr.singh at gmail.com  Fri Jul  8 16:00:15 2005
From: chandan.kr.singh at gmail.com (CHANDAN SINGH)
Date: Fri Jul  8 15:51:09 2005
Subject: [Bioperl-l] DEBUGGED Remoteblast from Bio::Perl through proxy..cont
Message-ID: <2d4f32050708130030b7a0dd@mail.gmail.com>

sorry for the reduplication of the mail but i had forgot to mention 
the  more  bugging bug which is How come ,others dont get this problem or 
have i misunderstood something .
do reply  

Hi eveybody
Those of u ,having problem in blasting sequences from Bio::Perl module through
proxy  and getting
                             "time  out "  or  " no route to host " errors
do  need to set the  environment proxy variable ( hello smarty we all know it )
and just give the following argument
                                                    ( env_proxy => 1 )
to
                 $self->{'_ua'} = new LWP::UserAgent(  );
as
             $self->{'_ua'} = new LWP::UserAgent( env_proxy => 1 );

in  the following sub in Bio::Tools::Run::RemoteBlast.pm
sub ua {
   my ($self, $value) = @_;
   if( ! defined $self->{'_ua'} ) {
       $self->{'_ua'} = new LWP::UserAgent(  );
       my $nm = ref($self);
       $nm =~ s/::/_/g;
       $self->{'_ua'}->agent("bioperl-$nm/$MODVERSION");
   }
   return $self->{'_ua'};
}
I saw this bug in the stable version and also in the one downloaded
from CVS yesterday .

From chandan.kr.singh at gmail.com  Fri Jul  8 16:00:15 2005
From: chandan.kr.singh at gmail.com (CHANDAN SINGH)
Date: Fri Jul  8 15:51:10 2005
Subject: [Bioperl-l] DEBUGGED Remoteblast from Bio::Perl through proxy..cont
Message-ID: <2d4f32050708130030b7a0dd@mail.gmail.com>

sorry for the reduplication of the mail but i had forgot to mention 
the  more  bugging bug which is How come ,others dont get this problem or 
have i misunderstood something .
do reply  

Hi eveybody
Those of u ,having problem in blasting sequences from Bio::Perl module through
proxy  and getting
                             "time  out "  or  " no route to host " errors
do  need to set the  environment proxy variable ( hello smarty we all know it )
and just give the following argument
                                                    ( env_proxy => 1 )
to
                 $self->{'_ua'} = new LWP::UserAgent(  );
as
             $self->{'_ua'} = new LWP::UserAgent( env_proxy => 1 );

in  the following sub in Bio::Tools::Run::RemoteBlast.pm
sub ua {
   my ($self, $value) = @_;
   if( ! defined $self->{'_ua'} ) {
       $self->{'_ua'} = new LWP::UserAgent(  );
       my $nm = ref($self);
       $nm =~ s/::/_/g;
       $self->{'_ua'}->agent("bioperl-$nm/$MODVERSION");
   }
   return $self->{'_ua'};
}
I saw this bug in the stable version and also in the one downloaded
from CVS yesterday .

From chad at dieselwurks.com  Fri Jul  8 18:15:13 2005
From: chad at dieselwurks.com (Chad Matsalla)
Date: Fri Jul  8 18:06:15 2005
Subject: [Bioperl-l] Re: use cases for SeqWithQuality, please
In-Reply-To: <200507071013.45813.heikki@ebi.ac.uk>
References: <200507071013.45813.heikki@ebi.ac.uk>
Message-ID: <Pine.LNX.4.62.0507081614050.11490@sausage.usask.ca>


On Thu, 7 Jul 2005, Heikki Lehvaslaiho wrote:
> There are so many ways to call it and some methods are already depreciated
> that it would be better to write a replacement module than try to rewrite all
> methods.

That sounds ok.

> With that information I could write a Bio::Seq::Quality that
> implements Bio::Seq::MetaI and we could depreciate
> Bio::Seq::SeqWithQuality.
>
> Are you happy with that, Chad?

Absolutely. I'll dig through our code to find use cases. After that
you'll let me know how I can help?

Chad


-- 
George Orwell was an optimist.
From J.A.Page at newcastle.ac.uk  Sat Jul  9 18:26:00 2005
From: J.A.Page at newcastle.ac.uk (Jaqueline Ann Page)
Date: Sun Jul 10 08:08:32 2005
Subject: [Bioperl-l] Advice on using  bioperl
Message-ID: <E13D711AE2CCE5479792D0C1B0DF27C5011B51F2@moonraker.campus.ncl.ac.uk>


Hi Everyone

I had trouble using the remote blast bioperl as I could't set the proxy. So I used NCBI webblasst.pl code on their web site. This sends queries to qblast gets back the result into a variable called $content ( containing the blast report).  I dont know
 to pass this to bioperl code. How do I create a $blast_report object to pass it to.  Then I would be able to use my $result = $blast_report->next_result;


        while(  $result = $in->next_result )


etc

Thanks in advance

Jackie

From chandan.kr.singh at gmail.com  Sun Jul 10 09:14:14 2005
From: chandan.kr.singh at gmail.com (CHANDAN SINGH)
Date: Sun Jul 10 09:06:22 2005
Subject: [Bioperl-l] Advice on using bioperl
In-Reply-To: <E13D711AE2CCE5479792D0C1B0DF27C5011B51F2@moonraker.campus.ncl.ac.uk>
References: <E13D711AE2CCE5479792D0C1B0DF27C5011B51F2@moonraker.campus.ncl.ac.uk>
Message-ID: <2d4f32050710061426db929@mail.gmail.com>

Hi JAP 
I dont understand why u cant set the proxy . If the environment proxy
variable is set
u can easily use  Remoteblast.pm .I had posted  a mail regarding this
two days ago  .
It might help u . Do reply if it helps .
See u 
Chandan


On 7/10/05, Jaqueline Ann Page <J.A.Page@newcastle.ac.uk> wrote:
> 
> Hi Everyone
> 
> I had trouble using the remote blast bioperl as I could't set the proxy. So I used NCBI webblasst.pl code on their web site. This sends queries to qblast gets back the result into a variable called $content ( containing the blast report).  I dont know
>  to pass this to bioperl code. How do I create a $blast_report object to pass it to.  Then I would be able to use my $result = $blast_report->next_result;
> 
> 
>         while(  $result = $in->next_result )
> 
> 
> etc
> 
> Thanks in advance
> 
> Jackie
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

From oxcorder at cs.uu.nl  Mon Jul 11 05:56:50 2005
From: oxcorder at cs.uu.nl (Otto X. Cordero)
Date: Mon Jul 11 05:48:12 2005
Subject: [Bioperl-l] MSG: Replacing one sequence 
Message-ID: <55393.131.211.52.202.1121075810.squirrel@mail.students.cs.uu.nl>

Dear all,

I have a simple script that converts my alignments from fasta to phylip
format. It is mostly a copy-paste from the code in the module
documentation, very simple stuff. I noticed that some sequences where
replaced:

-------------------- WARNING ---------------------
MSG: Replacing one sequence [305.Q8XFS3.NR/1-1275]

Can anyone explain why this happens?

Thanks very much,

Otto.

=======================================
Otto X. Cordero
Theoretical Biology and Bioinformatics
Utrecht University
+31 30 2539043
Room Z508, Padualaan 8, 3584 CH Utrecht
The Netherlands
From n.haigh at sheffield.ac.uk  Mon Jul 11 11:50:13 2005
From: n.haigh at sheffield.ac.uk (Nathan Haigh)
Date: Mon Jul 11 11:41:04 2005
Subject: [Bioperl-l] Gene Features
Message-ID: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAAIBuBbth5gEeZS9eEGziB38KAAAAQAAAAveKwuRjV6UOkOQJiZQLWFwEAAAAA@sheffield.ac.uk>

I'm working on Arabidopsis thaliana and I'd like to identify candidate genes
based on their gene features. In particular I'd like to identify genes with
introns within a specific range.

I have obtained a file from TIGR describing gene features:

ftp://ftp.arabidopsis.org/home/tair/Maps/seqviewer_data/sv_gene_feature.data

 
I wondered if anyone might have some code for doing this type of thing?
Would the use of Bio::SeqFeature be overkill and can it be used without
actually having the gene sequences?

Thanks

Nathan

 
----------------------------------

Nathan Haigh

Bioinformatics PostDoctoral Research Associate

 
Room B2 211

Department of Animal and Plant Sciences

University of Sheffield

Western Bank

Sheffield

S10 2TN

 
Tel: +44 (0)114 22 20112

Mob: +44 (0)7742 533 569

Fax: +44 (0)114 22 20002

 
From heikki at ebi.ac.uk  Mon Jul 11 12:02:52 2005
From: heikki at ebi.ac.uk (Heikki Lehvaslaiho)
Date: Mon Jul 11 11:53:23 2005
Subject: [Bioperl-l] Announce: Bio::Seq::Quality
Message-ID: <200507111702.52264.heikki@ebi.ac.uk>


Bio::Seq::Quality is a new module that allows you to store per-residue quality 
and trace index values using Bio::Seq::MetaI interface. It replaces 
Bio::Seq::SeqWithQuality which is now deprecated.

Solutions to persistence should focus on storing Bio::Seq::Meta and 
Bio::Seq::Meta::Array objects. It should be easy to stringify most real world 
meta values. Then the persistence could be implemented by storing the 
sequence object and N number of meta strings.

All the functional code is in Bio::Seq::Meta::Array, Bio::Seq::Quality merely 
adds a convenient interface.

The POD contains a discussion of differences from Bio::Seq::SeqWithQuality. 
If the following, or anything else,  is a problem let me know as soon as 
possible:

  The greatest difference to Bio::Seq::SeqWithQuality is that in this
  implementation quality for all sequence residues are automatically
  assigned a value of '0' (zero) unless you set it to something
  else. Length of the quality array always equals the length of the
  sequence. Therefore, length() never returns "DIFFERENT".


Enjoy,
 -Heikki


-- 
______ _/      _/_____________________________________________________
      _/      _/                      http://www.ebi.ac.uk/mutations/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_ebi _ac _uk
    _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
   _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
  _/  _/  _/  Cambridge, CB10 1SD, United Kingdom
     _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________
From jason.stajich at duke.edu  Mon Jul 11 14:29:26 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Mon Jul 11 14:23:23 2005
Subject: [Bioperl-l] MSG: Replacing one sequence 
In-Reply-To: <55393.131.211.52.202.1121075810.squirrel@mail.students.cs.uu.nl>
References: <55393.131.211.52.202.1121075810.squirrel@mail.students.cs.uu.nl>
Message-ID: <11CE92FD-AEF1-47CA-816A-C2D271087F81@duke.edu>

sequence names are probably not unique.

-jason
On Jul 11, 2005, at 5:56 AM, Otto X. Cordero wrote:

> Dear all,
>
> I have a simple script that converts my alignments from fasta to  
> phylip
> format. It is mostly a copy-paste from the code in the module
> documentation, very simple stuff. I noticed that some sequences where
> replaced:
>
> -------------------- WARNING ---------------------
> MSG: Replacing one sequence [305.Q8XFS3.NR/1-1275]
>
> Can anyone explain why this happens?
>
> Thanks very much,
>
> Otto.
>
> =======================================
> Otto X. Cordero
> Theoretical Biology and Bioinformatics
> Utrecht University
> +31 30 2539043
> Room Z508, Padualaan 8, 3584 CH Utrecht
> The Netherlands
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/


From jason.stajich at duke.edu  Mon Jul 11 14:32:25 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Mon Jul 11 14:25:23 2005
Subject: [Bioperl-l] DEBUGGED Remoteblast from Bio::Perl through
	proxy..cont
In-Reply-To: <2d4f32050708130030b7a0dd@mail.gmail.com>
References: <2d4f32050708130030b7a0dd@mail.gmail.com>
Message-ID: <E8F33A87-675C-4098-8FE6-748EB6379547@duke.edu>

Thanks - I think you can just reset the LWP object directly if you  
like in your script code w/o modifying the module:
  $remoteblast->ua(LWP::UserAgent->new(env_proxy =>1));

We can certainly update the module to add this default initialization  
though.

You should submit it as feature request at http://bugzilla.open- 
bio.org/  so we can track whether or not someone has done it.

On Jul 8, 2005, at 4:00 PM, CHANDAN SINGH wrote:

> sorry for the reduplication of the mail but i had forgot to mention
> the  more  bugging bug which is How come ,others dont get this  
> problem or
> have i misunderstood something .
> do reply
>
> Hi eveybody
> Those of u ,having problem in blasting sequences from Bio::Perl  
> module through
> proxy  and getting
>                              "time  out "  or  " no route to host "  
> errors
> do  need to set the  environment proxy variable ( hello smarty we  
> all know it )
> and just give the following argument
>                                                     ( env_proxy => 1 )
> to
>                  $self->{'_ua'} = new LWP::UserAgent(  );
> as
>              $self->{'_ua'} = new LWP::UserAgent( env_proxy => 1 );
>
> in  the following sub in Bio::Tools::Run::RemoteBlast.pm
> sub ua {
>    my ($self, $value) = @_;
>    if( ! defined $self->{'_ua'} ) {
>        $self->{'_ua'} = new LWP::UserAgent(  );
>        my $nm = ref($self);
>        $nm =~ s/::/_/g;
>        $self->{'_ua'}->agent("bioperl-$nm/$MODVERSION");
>    }
>    return $self->{'_ua'};
> }
> I saw this bug in the stable version and also in the one downloaded
> from CVS yesterday .
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/


From chandan.kr.singh at gmail.com  Mon Jul 11 14:45:58 2005
From: chandan.kr.singh at gmail.com (CHANDAN SINGH)
Date: Mon Jul 11 14:36:45 2005
Subject: [Bioperl-l] DEBUGGED Remoteblast from Bio::Perl through
	proxy..cont
In-Reply-To: <E8F33A87-675C-4098-8FE6-748EB6379547@duke.edu>
References: <2d4f32050708130030b7a0dd@mail.gmail.com>
	<E8F33A87-675C-4098-8FE6-748EB6379547@duke.edu>
Message-ID: <2d4f3205071111456c0d2103@mail.gmail.com>

Hi Jason 
I am not sure if it can be done the way u tell it .Anyway i 'll try it. 
I had submited  the problem  to bugzilla in april itself and i 've
installed bioperl
from cvs recently . It is quite possible that it is not included yet . 
See u 
Chandan  

On 7/12/05, Jason Stajich <jason.stajich@duke.edu> wrote:
> Thanks - I think you can just reset the LWP object directly if you like in
> your script code w/o modifying the module: 
>  $remoteblast->ua(LWP::UserAgent->new(env_proxy =>1));
> 
> We can certainly update the module to add this default initialization
> though.
> 
> You should submit it as feature request at http://bugzilla.open-bio.org/  so
> we can track whether or not someone has done it.
> 
> 
> On Jul 8, 2005, at 4:00 PM, CHANDAN SINGH wrote:
> 
> sorry for the reduplication of the mail but i had forgot to mention 
> the  more  bugging bug which is How come ,others dont get this problem or 
> have i misunderstood something .
> do reply  
> 
> Hi eveybody
> Those of u ,having problem in blasting sequences from Bio::Perl module
> through
> proxy  and getting
>                              "time  out "  or  " no route to host " errors
> do  need to set the  environment proxy variable ( hello smarty we all know
> it )
> and just give the following argument
>                                                     ( env_proxy => 1 )
> to
>                  $self->{'_ua'} = new LWP::UserAgent(  );
> as
>              $self->{'_ua'} = new LWP::UserAgent( env_proxy => 1 );
> 
> in  the following sub in Bio::Tools::Run::RemoteBlast.pm
> sub ua {
>    my ($self, $value) = @_;
>    if( ! defined $self->{'_ua'} ) {
>        $self->{'_ua'} = new LWP::UserAgent(  );
>        my $nm = ref($self);
>        $nm =~ s/::/_/g;
>        $self->{'_ua'}->agent("bioperl-$nm/$MODVERSION");
>    }
>    return $self->{'_ua'};
> }
> I saw this bug in the stable version and also in the one downloaded
> from CVS yesterday .
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l 
> 
>  
> 
> -- 
> 
> Jason Stajich 
> 
> jason.stajich at duke.edu 
> 
> http://www.duke.edu/~jes12/ 
>  
>

From jason.stajich at duke.edu  Mon Jul 11 14:56:21 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Mon Jul 11 14:47:16 2005
Subject: [Bioperl-l] DEBUGGED Remoteblast from Bio::Perl through
	proxy..cont
In-Reply-To: <E8F33A87-675C-4098-8FE6-748EB6379547@duke.edu>
References: <2d4f32050708130030b7a0dd@mail.gmail.com>
	<E8F33A87-675C-4098-8FE6-748EB6379547@duke.edu>
Message-ID: <19A5C4D1-80D5-44A3-9687-2A985D75EEE1@duke.edu>

Sorry I meant just do this.
$remoteblast->ua->env_proxy;

The ua function is not currently written to accept storing a new ua  
object but of course you can just do:
  $remoteblast->{'_ua'} = LWP::UserAgent->new(env_proxy => 1).

-jason

On Jul 11, 2005, at 2:32 PM, Jason Stajich wrote:

> Thanks - I think you can just reset the LWP object directly if you  
> like in your script code w/o modifying the module:
>  $remoteblast->ua(LWP::UserAgent->new(env_proxy =>1));
>
> We can certainly update the module to add this default  
> initialization though.
>
> You should submit it as feature request at http://bugzilla.open- 
> bio.org/  so we can track whether or not someone has done it.
>
> On Jul 8, 2005, at 4:00 PM, CHANDAN SINGH wrote:
>
>
>> sorry for the reduplication of the mail but i had forgot to mention
>> the  more  bugging bug which is How come ,others dont get this  
>> problem or
>> have i misunderstood something .
>> do reply
>>
>> Hi eveybody
>> Those of u ,having problem in blasting sequences from Bio::Perl  
>> module through
>> proxy  and getting
>>                              "time  out "  or  " no route to host  
>> " errors
>> do  need to set the  environment proxy variable ( hello smarty we  
>> all know it )
>> and just give the following argument
>>                                                     ( env_proxy =>  
>> 1 )
>> to
>>                  $self->{'_ua'} = new LWP::UserAgent(  );
>> as
>>              $self->{'_ua'} = new LWP::UserAgent( env_proxy => 1 );
>>
>> in  the following sub in Bio::Tools::Run::RemoteBlast.pm
>> sub ua {
>>    my ($self, $value) = @_;
>>    if( ! defined $self->{'_ua'} ) {
>>        $self->{'_ua'} = new LWP::UserAgent(  );
>>        my $nm = ref($self);
>>        $nm =~ s/::/_/g;
>>        $self->{'_ua'}->agent("bioperl-$nm/$MODVERSION");
>>    }
>>    return $self->{'_ua'};
>> }
>> I saw this bug in the stable version and also in the one downloaded
>> from CVS yesterday .
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l@portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
> --
> Jason Stajich
> jason.stajich at duke.edu
> http://www.duke.edu/~jes12/
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/


From chandan.kr.singh at gmail.com  Mon Jul 11 15:12:58 2005
From: chandan.kr.singh at gmail.com (CHANDAN SINGH)
Date: Mon Jul 11 15:04:16 2005
Subject: [Bioperl-l] DEBUGGED Remoteblast from Bio::Perl through
	proxy..cont
In-Reply-To: <19A5C4D1-80D5-44A3-9687-2A985D75EEE1@duke.edu>
References: <2d4f32050708130030b7a0dd@mail.gmail.com>
	<E8F33A87-675C-4098-8FE6-748EB6379547@duke.edu>
	<19A5C4D1-80D5-44A3-9687-2A985D75EEE1@duke.edu>
Message-ID: <2d4f32050711121236c3186a@mail.gmail.com>

I know there are ways to do it but if u remember my program was nothing but 
the second example in bptutorial on net and no such one liner can help it .
You seem to be referring to your script which might be a different one .
That example is disheartening enough for a newbie . It seems there are options 
to include proxy if we directly use the RemoteBlast.pm . 

chandan 

On 7/12/05, Jason Stajich <jason.stajich@duke.edu> wrote:
> Sorry I meant just do this.
> $remoteblast->ua->env_proxy;
> 
> The ua function is not currently written to accept storing a new ua object
> but of course you can just do:
>  $remoteblast->{'_ua'} = LWP::UserAgent->new(env_proxy => 1).
> 
> -jason
> 
> 
> 
> On Jul 11, 2005, at 2:32 PM, Jason Stajich wrote:
> 
> Thanks - I think you can just reset the LWP object directly if you like in
> your script code w/o modifying the module:
>  $remoteblast->ua(LWP::UserAgent->new(env_proxy =>1));
> 
> We can certainly update the module to add this default initialization
> though.
> 
> You should submit it as feature request at http://bugzilla.open-bio.org/  so
> we can track whether or not someone has done it.
> 
> On Jul 8, 2005, at 4:00 PM, CHANDAN SINGH wrote:
> 
>  
> 
> sorry for the reduplication of the mail but i had forgot to mention
> the  more  bugging bug which is How come ,others dont get this problem or
> have i misunderstood something .
> do reply
> 
> Hi eveybody
> Those of u ,having problem in blasting sequences from Bio::Perl module
> through
> proxy  and getting
>                              "time  out "  or  " no route to host " errors
> do  need to set the  environment proxy variable ( hello smarty we all know
> it )
> and just give the following argument
>                                                     ( env_proxy => 1 )
> to
>                  $self->{'_ua'} = new LWP::UserAgent(  );
> as
>              $self->{'_ua'} = new LWP::UserAgent( env_proxy => 1 );
> 
> in  the following sub in Bio::Tools::Run::RemoteBlast.pm
> sub ua {
>    my ($self, $value) = @_;
>    if( ! defined $self->{'_ua'} ) {
>        $self->{'_ua'} = new LWP::UserAgent(  );
>        my $nm = ref($self);
>        $nm =~ s/::/_/g;
>        $self->{'_ua'}->agent("bioperl-$nm/$MODVERSION");
>    }
>    return $self->{'_ua'};
> }
> I saw this bug in the stable version and also in the one downloaded
> from CVS yesterday .
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 
>  
> 
> --
> Jason Stajich
> jason.stajich at duke.edu
> http://www.duke.edu/~jes12/
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l 
> 
>  
> 
> -- 
> 
> Jason Stajich 
> 
> jason.stajich at duke.edu 
> 
> http://www.duke.edu/~jes12/ 
>  
>

From mckays at cshl.edu  Mon Jul 11 12:29:54 2005
From: mckays at cshl.edu (Sheldon McKay)
Date: Mon Jul 11 18:44:55 2005
Subject: [Bioperl-l] extract info from .game.xml
In-Reply-To: <cc7ec2a050711090570c9417b@mail.gmail.com>
References: <cc7ec2a05061417032adeed4a@mail.gmail.com>
	<497101aad05f378c5e1805c206b1cfd8@cshl.edu>
	<cc7ec2a050711090570c9417b@mail.gmail.com>
Message-ID: <a3d51a90abed5795ffefbf5971598837@cshl.edu>


Hi Tuan,

Your game XML file contains only sequence and computational_analysis 
elements, with no annotation elements.  Unfortunately lack of 
annotations is fatal and computational analysis features are not 
supported in the bioperl parser.  Lack of annotations does not 
necessarily need to be fatal, though.  I will see what I can do about 
that.

Sheldon

On Jul 11, 2005, at 12:05 PM, Tuan A. Tran wrote:

> Hi Sheldon,
>
> Thanks very much for your email. Yes, I am still interested in doing 
> that.
> It is quite a while ago so I don't remember what I might have done
> wrong. Anyway, I seem to recall that in attached data file, there is
> not any 'annotation' anywhere. I have not checked since then. I just
> downloaded the attached file from flybase.org.
>
>  I hope that you can help me to figure out.
>
>  Sincerely,
>  Tuan
>
>
> On 7/7/05, Sheldon McKay <mckays@cshl.edu> wrote:
>> Hi,
>>
>> Sorry for taking so long to reply.  If you are still interested in
>> doing this, could you send me the file you are trying to parse and i
>> will see if I can figure out what is wrong?
>>
>> Thanks,
>> Sheldon
>>
>>
>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>> Sheldon McKay, PhD
>> Cold Spring Harbor Laboratory
>> 1 Bungtown Road
>> Cold Spring Harbor, NY 11724
>> On Jun 14, 2005, at 8:03 PM, Tuan A. Tran wrote:
>>
>>> Hi,
>>>
>>> I am trying to extract some information from a file filename.game.xml
>>> (I got this file from flybase.org). I wrote a simple script to test
>>> it. However, I keep getting the following message
>>>
>>> ------------- EXCEPTION  -------------
>>> MSG: No annotations
>>> STACK Bio::SeqIO::game::gameHandler::load
>>> /usr/local/share/perl/5.8.4/Bio/SeqIO/game/gameHandler.pm:121
>>> STACK Bio::SeqIO::game::_getseqs
>>> /usr/local/share/perl/5.8.4/Bio/SeqIO/game.pm:156
>>> STACK Bio::SeqIO::game::next_seq
>>> /usr/local/share/perl/5.8.4/Bio/SeqIO/game.pm:101
>>> STACK toplevel fetchseq_game_xml.pl:64
>>>
>>> I have no idea why. Can anyone help?
>>> Thanks in advance,
>>> TAT
>>>
>>> ---------------------------------
>>> My simple script is
>>>
>>> #!/usr/local/lib/perl
>>>
>>> use strict;
>>>
>>> sub NULL () {0};
>>>
>>> use Bio::Seq;
>>> use Bio::SeqIO;
>>> #use Bio::SeqIO::game;
>>> #use Bio::Annotation;
>>> use Bio::SearchIO;
>>> use Bio::AlignIO;
>>> use Bio::SimpleAlign;
>>> use Bio::LocatableSeq;
>>> use Bio::Tools::Run::StandAloneBlast;
>>> use Bio::Tools::Run::Alignment::Clustalw;
>>> use Getopt::Long;
>>> use Bio::DB::GenBank;
>>> use Bio::DB::Flat::BDB;
>>> #use Bio::Index::GenBank;
>>> use Bio::Index::Fasta;
>>> use Bio::SeqFeature::Generic;
>>> use DBI;
>>>
>>>
>>> my $infile = shift;
>>> my $in = Bio::SeqIO->new( -file=> $infile, -format=>'game');
>>>
>>> while (my $query = $in->next_seq() ) {
>>>
>>>       print $query->id,"\n";
>>> }
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l@portal.open-bio.org
>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
> <3R_27900000_28200000.game.xml.gz>

From avilella at gmail.com  Tue Jul 12 11:19:45 2005
From: avilella at gmail.com (Albert Vilella)
Date: Tue Jul 12 11:12:41 2005
Subject: [Bioperl-l] bioperl-run Codeml.pm fix_blength
Message-ID: <1121181586.8167.13.camel@localhost.localdomain>

Hi,

I noticed that the valid values for fix_blength in Codeml.pm do not
include option "fix_blength 1: initial",

I agreed, I would add it myself in cvs:

bioperl-run/Bio/Tools/Run/Phylo/PAML/Codeml.pm

		     'fix_blength'   => [0,-1,2],
change to:
		     'fix_blength'   => [0,-1,1,2],

Jason?

Bests,

    Albert.

From jason.stajich at duke.edu  Tue Jul 12 11:28:10 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Tue Jul 12 11:19:26 2005
Subject: [Bioperl-l] Re: bioperl-run Codeml.pm fix_blength
In-Reply-To: <1121181586.8167.13.camel@localhost.localdomain>
References: <1121181586.8167.13.camel@localhost.localdomain>
Message-ID: <FEB3152A-E40A-4C63-B0DC-EADB3C91CABA@duke.edu>

sure - fix away.

I think it was a bit misguided on my part to think we could really  
capture all the valid values in this init hash - possibly could  
remove the whole system of checking and just establish default  
values.  Anyways, feel free to check that it.

-jason
On Jul 12, 2005, at 11:19 AM, Albert Vilella wrote:

> Hi,
>
> I noticed that the valid values for fix_blength in Codeml.pm do not
> include option "fix_blength 1: initial",
>
> I agreed, I would add it myself in cvs:
>
> bioperl-run/Bio/Tools/Run/Phylo/PAML/Codeml.pm
>
>              'fix_blength'   => [0,-1,2],
> change to:
>              'fix_blength'   => [0,-1,1,2],
>
> Jason?
>
> Bests,
>
>     Albert.
>
>

--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/


From valiente at jaist.ac.jp  Mon Jul 11 22:37:12 2005
From: valiente at jaist.ac.jp (Gabriel Valiente)
Date: Tue Jul 12 11:25:00 2005
Subject: [Bioperl-l] Announce: Bio::Tree::Draw::Cladogram
In-Reply-To: <200507111702.52264.heikki@ebi.ac.uk>
References: <200507111702.52264.heikki@ebi.ac.uk>
Message-ID: <42D32CD8.6070409@jaist.ac.jp>

Bio::Tree::Draw::Cladogram is a new module for drawing Bio::Tree::Tree 
objects in Encapsulated PostScript (EPS) format. It can be utilized both 
for displaying a single phylogenetic tree (a cladogram) and for the 
comparative display of two phylogenetic trees (a tanglegram) such as a 
gene tree and a species tree, a host tree and a parasite tree, two 
alternative trees for the same set of taxa, or two alternative trees for 
overlapping sets of taxa.

The POD contains a detailed description of the way in which cladograms 
and tanglegrams are built. However, tests are still missing and I'm 
afraid I won't be able to work on this until August. Many extensions are 
possible, such as using branch lengths and producing output in other 
graphic formats. Any suggestions are welcome.

Enjoy,

Gabriel
From valiente at jaist.ac.jp  Mon Jul 11 22:45:17 2005
From: valiente at jaist.ac.jp (Gabriel Valiente)
Date: Tue Jul 12 11:25:05 2005
Subject: [Bioperl-l] Announce: Bio::Tree::Compatible
In-Reply-To: <42D32CD8.6070409@jaist.ac.jp>
References: <200507111702.52264.heikki@ebi.ac.uk>
	<42D32CD8.6070409@jaist.ac.jp>
Message-ID: <42D32EBD.6070909@jaist.ac.jp>

Bio::Tree::Compatible is a new module for testing compatibility of 
phylogenetic trees with nested taxa represented as Bio::Tree::Tree 
objects. It is based on a recent characterization of ancestral 
compatibility of semi-labeled trees in terms of their cluster 
representations.

The POD is now complete but tests are still missing and I'm afraid I 
won't be able to work on this until August. However, I've tested this 
module on all pairs of trees from TreeBASE. Any suggestions are welcome.

The theory behind this module can be found at:

http://www.lsi.upc.es/dept/techreps/listado_concreto.php?id=766
http://arxiv.org/abs/cs.DM/0505086

Enjoy,

Gabriel
From avilella at gmail.com  Tue Jul 12 11:40:41 2005
From: avilella at gmail.com (Albert Vilella)
Date: Tue Jul 12 11:32:38 2005
Subject: [Bioperl-l] Re: bioperl-run Codeml.pm fix_blength
In-Reply-To: <FEB3152A-E40A-4C63-B0DC-EADB3C91CABA@duke.edu>
References: <1121181586.8167.13.camel@localhost.localdomain>
	<FEB3152A-E40A-4C63-B0DC-EADB3C91CABA@duke.edu>
Message-ID: <1121182841.8167.22.camel@localhost.localdomain>

El dt 12 de 07 del 2005 a les 11:28 -0400, en/na Jason Stajich va
escriure:
> sure - fix away.

done.

Also, in my pipeline it would be interesting to call Codeml.pm via
bioperl keeping the tempfiles in a specified directory:

I understand that save_tempfiles will save the generated tempfiles in
the temp directory, the dir will remain in $tempdir.
An $outdir could be specified so that the codeml run is saved where the
user specifies.

What do you think?

    Albert.


From jason.stajich at duke.edu  Tue Jul 12 11:47:19 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Tue Jul 12 11:38:15 2005
Subject: [Bioperl-l] Re: bioperl-run Codeml.pm fix_blength
In-Reply-To: <1121182841.8167.22.camel@localhost.localdomain>
References: <1121181586.8167.13.camel@localhost.localdomain>
	<FEB3152A-E40A-4C63-B0DC-EADB3C91CABA@duke.edu>
	<1121182841.8167.22.camel@localhost.localdomain>
Message-ID: <A15376B2-779D-4F25-8153-6B3417A18CCD@duke.edu>

Sounds good - would you just copy the dir to the users specified outdir?
    Another way to go is make tempdir a settable value (see  
Bio::Tools::Run::WrapperBase -- in bioperl-live repository) - but   
this may not be as clear on how to use it?

-jason
On Jul 12, 2005, at 11:40 AM, Albert Vilella wrote:

> El dt 12 de 07 del 2005 a les 11:28 -0400, en/na Jason Stajich va
> escriure:
>
>> sure - fix away.
>>
>
> done.
>
> Also, in my pipeline it would be interesting to call Codeml.pm via
> bioperl keeping the tempfiles in a specified directory:
>
> I understand that save_tempfiles will save the generated tempfiles in
> the temp directory, the dir will remain in $tempdir.
> An $outdir could be specified so that the codeml run is saved where  
> the
> user specifies.
>
> What do you think?
>
>     Albert.
>
>
>

--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/


From avilella at gmail.com  Tue Jul 12 12:02:57 2005
From: avilella at gmail.com (Albert Vilella)
Date: Tue Jul 12 11:54:55 2005
Subject: [Bioperl-l] Re: bioperl-run Codeml.pm fix_blength
In-Reply-To: <A15376B2-779D-4F25-8153-6B3417A18CCD@duke.edu>
References: <1121181586.8167.13.camel@localhost.localdomain>
	<FEB3152A-E40A-4C63-B0DC-EADB3C91CABA@duke.edu>
	<1121182841.8167.22.camel@localhost.localdomain>
	<A15376B2-779D-4F25-8153-6B3417A18CCD@duke.edu>
Message-ID: <1121184178.8167.28.camel@localhost.localdomain>

El dt 12 de 07 del 2005 a les 11:47 -0400, en/na Jason Stajich va
escriure:
> Sounds good - would you just copy the dir to the users specified
> outdir?

yes

>    Another way to go is make tempdir a settable value (see
> Bio::Tools::Run::WrapperBase -- in bioperl-live repository) - but
> this may not be as clear on how to use it?

well, it is not as direct as the other way but maybe it is cleaner in
the sense that will directly run the analysis on $tempdir and no extra
cp or mv would be needed...

   Albert.

> 
> 
> -jason
> On Jul 12, 2005, at 11:40 AM, Albert Vilella wrote:
> 
> > El dt 12 de 07 del 2005 a les 11:28 -0400, en/na Jason Stajich va
> > escriure:
> > 
> > > sure - fix away.
> > > 
> > 
> > 
> > done.
> > 
> > 
> > Also, in my pipeline it would be interesting to call Codeml.pm via
> > bioperl keeping the tempfiles in a specified directory:
> > 
> > 
> > I understand that save_tempfiles will save the generated tempfiles
> > in
> > the temp directory, the dir will remain in $tempdir.
> > An $outdir could be specified so that the codeml run is saved where
> > the
> > user specifies.
> > 
> > 
> > What do you think?
> > 
> > 
> >     Albert.
> > 
> > 
> > 
> > 
> > 
> 
> --
> 
> Jason Stajich
> 
> jason.stajich at duke.edu
> 
> http://www.duke.edu/~jes12/
> 
> 
> 
> 
> 

From wrp at virginia.edu  Tue Jul 12 12:41:41 2005
From: wrp at virginia.edu (William R. Pearson)
Date: Tue Jul 12 12:35:05 2005
Subject: [Bioperl-l] Computational and Comparative Genomics Course - July 15
	Deadline
In-Reply-To: <200507121533.j6CFXha6021002@portal.open-bio.org>
References: <200507121533.j6CFXha6021002@portal.open-bio.org>
Message-ID: <56C4C92F-B286-4C0F-8EC9-B094BA9A7528@virginia.edu>


Course announcement - Application deadline, July 15, 2005

================================================================

Cold Spring Harbor
COMPUTATIONAL & COMPARATIVE GENOMICS
November 2 - 8, 2005
Application Deadline: July 15, 2005

INSTRUCTORS:

Pearson, William, Ph.D., University of Virginia, Charlottesville, VA
Smith, Randall, Ph.D., SmithKline Beecham Pharmaceuticals, King of
Prussia, PA

Beyond BLAST and FASTA - Alignment: from proteins to genomes - This
course presents a comprehensive overview of the theory and practice of
computational methods for extracting the maximum amount of information
from protein and DNA sequence similarity through sequence database
searches, statistical analysis, and multiple sequence alignment, and
genome scale alignment. Additional topics include gene finding,
dentifying signals in unaligned sequences, integration of genetic and
sequence information in biological databases.

The course combines lectures with hands-on exercises; students are
encouraged to pose challenging sequence analysis problems using their
own data. The course makes extensive use of local WWW pages to present
problem sets and the computing tools to solve them. Students use
Windows and Mac workstations attached to a UNIX server; participants
should be comfortable using the Unix operating system and a Unix text
editor.

The course is designed for biologists seeking advanced training in
biological sequence analysis, computational biology core resource
directors and staff, and for scientists in other disciplines, such as
computer science, who wish to survey current research problems in
biological sequence analysis and comparative genomics.

The primary focus of the Computational and Comparative Genomics Course
is the theory and practice of algorithms used in computational
biology, with the goal of using current methods more effectively and
developing new algorithms. Cold Spring Harbor also offers an Advanced
Bioinformatics Programming course, which focuses more on software
development.

Over the past few years, the course has been expanded to cover more
algorithms and exercises on comparative genomics and genome databases.

For additional information and the lecture schedule and problem sets
for the 2004 course, see:

         http://fasta.bioch.virginia.edu/cshl04

================================================================

To apply to the course, fill out the form at:

         http://meetings.cshl.edu/courses/courseapplication.asp

================================================================

From heikki at ebi.ac.uk  Wed Jul 13 09:06:24 2005
From: heikki at ebi.ac.uk (Heikki Lehvaslaiho)
Date: Wed Jul 13 08:56:51 2005
Subject: [Bioperl-l] EMBL ID line parsing error
Message-ID: <200507131406.24939.heikki@ebi.ac.uk>


I noticed that one BioFetch test was failing. It was caused by an EMBL entry 
object not having a display ID. The failure was caused by regular expression 
in the EMBL parser not allowing spaces in the molecule substring of the ID 
line:


ID   BUM        standard; genomic RNA; VRL; 200 BP.
                   was:   (\S+);
                   fix:   ([\S ]+);     now in bioperl-live


The affected Bio::Seq::RichSeq methods are:
 display_id(), id(), molecule(), division()

Here is a breakdown of all molecule values in current EMBL release:

circular genomic dna     7427
circular genomic rna      687
circular mrna              23
circular other dna        915
circular other rna          9
circular trna               1
circular unassigned dna   266
circular unassigned rna     2
genomic dna          14573961
genomic rna            152219
mrna                 28138477
other dna                6956
other rna                1827
pre-rna                   898
rrna                     5999
scrna                      95
snorna                    981
snrna                     455
trna                      667
unassigned dna        1941868
unassigned rna         102162


One third of the EMBL entries are affected.

This error does not affect GenBank entries which use different syntax.

I wonder how long this error has been there!


 -Heikki

-- 
______ _/      _/_____________________________________________________
      _/      _/                      http://www.ebi.ac.uk/mutations/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_ebi _ac _uk
    _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
   _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
  _/  _/  _/  Cambridge, CB10 1SD, United Kingdom
     _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________
From heikki at ebi.ac.uk  Wed Jul 13 10:49:09 2005
From: heikki at ebi.ac.uk (Heikki Lehvaslaiho)
Date: Wed Jul 13 10:46:53 2005
Subject: [Bioperl-l] Announce: Bio::Seq::Quality
In-Reply-To: <200507111702.52264.heikki@ebi.ac.uk>
References: <200507111702.52264.heikki@ebi.ac.uk>
Message-ID: <200507131549.09525.heikki@ebi.ac.uk>

I've been cleaning Bio::Seq::SeqWithQuality usage from bioperl-live modules 
and replacing it with Bio::Seq::Quality. Everything seems to work.

I've left Bio::Seq::PrimaryQual for the next rewrite. Its functionality is all 
in the Quality class (get and set id and quality values), but you can not get 
the quality values from a Bio::Seq::Quality object if you do not have the 
sequence set. Usually qualities without residues do not make such sense, but 
there is something in Bio::Assembly code or at least in its tests that need 
plain qualities.

 -Heikki 

On Monday 11 July 2005 17:02, Heikki Lehvaslaiho wrote:
> Bio::Seq::Quality is a new module that allows you to store per-residue
> quality and trace index values using Bio::Seq::MetaI interface. It replaces
> Bio::Seq::SeqWithQuality which is now deprecated.
>
> Solutions to persistence should focus on storing Bio::Seq::Meta and
> Bio::Seq::Meta::Array objects. It should be easy to stringify most real
> world meta values. Then the persistence could be implemented by storing the
> sequence object and N number of meta strings.
>
> All the functional code is in Bio::Seq::Meta::Array, Bio::Seq::Quality
> merely adds a convenient interface.
>
> The POD contains a discussion of differences from Bio::Seq::SeqWithQuality.
> If the following, or anything else,  is a problem let me know as soon as
> possible:
>
>   The greatest difference to Bio::Seq::SeqWithQuality is that in this
>   implementation quality for all sequence residues are automatically
>   assigned a value of '0' (zero) unless you set it to something
>   else. Length of the quality array always equals the length of the
>   sequence. Therefore, length() never returns "DIFFERENT".
>
>
> Enjoy,
>  -Heikki

-- 
______ _/      _/_____________________________________________________
      _/      _/                      http://www.ebi.ac.uk/mutations/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_ebi _ac _uk
    _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
   _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
  _/  _/  _/  Cambridge, CB10 1SD, United Kingdom
     _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________
From michael.watson at bbsrc.ac.uk  Wed Jul 13 11:04:59 2005
From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C))
Date: Wed Jul 13 10:57:27 2005
Subject: [Bioperl-l] Getting hit or subject length in BPlite
Message-ID: <8975119BCD0AC5419D61A9CF1A923E9502067A87@iahce2knas1.iah.bbsrc.reserved>

Hi

Maybe I'm just being dumb, but I can't see a way to get the hit length
(note: NOT hsp length) using Bplite to parse a blast report....

Any help?

Mick

 
From pierre_rioux at yahoo.com  Wed Jul 13 13:51:56 2005
From: pierre_rioux at yahoo.com (Pierre Rioux)
Date: Wed Jul 13 13:42:41 2005
Subject: [Bioperl-l] EMBL ID line parsing error
In-Reply-To: <200507131406.24939.heikki@ebi.ac.uk>
Message-ID: <20050713175156.77649.qmail@web53003.mail.yahoo.com>

Hi,

> I noticed that one BioFetch test was failing. It was caused by an EMBL entry 
> object not having a display ID. The failure was caused by regular expression 
> in the EMBL parser not allowing spaces in the molecule substring of the ID 
> line:
> 
> 
> ID   BUM        standard; genomic RNA; VRL; 200 BP.
>                    was:   (\S+);
>                    fix:   ([\S ]+);     now in bioperl-live

Because regular expressions are greedy, and because \S
also matches the semicolon ";", I think maybe a better
fix would be 

                            ([^;]);

That way, if the EMBL line format ever gets extended to include
more semicolon-separated fields, it will still work.

(Personally, when I write regexes, I always try to make sure
the specific character that is used as delimiter cannot
be matched by the parenthesized regex for the fields...
otherwise you're putting too much trust on the NUMBER of
fields in the line for the whole line-matching regex
to succeed as planned).

Pierre


____________________________________________________
Start your day with Yahoo! - make it your home page
http://www.yahoo.com/r/hs
 
From pierre_rioux at yahoo.com  Wed Jul 13 14:08:09 2005
From: pierre_rioux at yahoo.com (Pierre Rioux)
Date: Wed Jul 13 13:58:55 2005
Subject: [Bioperl-l] EMBL ID line parsing error
In-Reply-To: <20050713175156.77649.qmail@web53003.mail.yahoo.com>
Message-ID: <20050713180810.81238.qmail@web53001.mail.yahoo.com>

Small correction.

I wrote:

>                             ([^;]);
 
But it should be:

                              ([^;]*);

I hope mail readers out there won't turn this into
some kind of weird smiley. :-)

Pierre


____________________________________________________
Start your day with Yahoo! - make it your home page
http://www.yahoo.com/r/hs
 
From reneehalbrook74 at yahoo.com  Wed Jul 13 17:19:57 2005
From: reneehalbrook74 at yahoo.com (Renee Halbrook)
Date: Wed Jul 13 17:10:47 2005
Subject: [Bioperl-l] COG parsing ?
Message-ID: <20050713211957.6373.qmail@web40513.mail.yahoo.com>

Hi,

Does BioPerl have a parser for the Clusters of
Orthologous Groups of proteins (COGs) from NCBI ?


Thanks for any help,
Renee Halbrook


__________________________________ 
Yahoo! Mail 
Stay connected, organized, and protected. Take the tour: 
http://tour.mail.yahoo.com/mailtour.html 

From heikki at ebi.ac.uk  Wed Jul 13 17:49:39 2005
From: heikki at ebi.ac.uk (Heikki Lehvaslaiho)
Date: Wed Jul 13 17:40:02 2005
Subject: [Bioperl-l] EMBL ID line parsing error
In-Reply-To: <20050713180810.81238.qmail@web53001.mail.yahoo.com>
References: <20050713180810.81238.qmail@web53001.mail.yahoo.com>
Message-ID: <200507132249.40102.heikki@ebi.ac.uk>

Hi Pierre,

You are quite right, ([^;]*); or ([^;]+); really is a lot better way of 
writing it and that is the way I committed the fix:

 ($name, $mol, $div) = 
   ($line =~ /^ID\s+(\S+).*;\s+([^;]+);\s+(\S+);/);


I started writing the email as note for myself when I first verified the bug, 
and then forgot to change the text in the email before sending. 

Sorry to waste your time,

     -Heikki


On Wednesday 13 July 2005 19:08, Pierre Rioux wrote:
> Small correction.
>
> I wrote:
> >                             ([^;]);
>
> But it should be:
>
>                               ([^;]*);
>
> I hope mail readers out there won't turn this into
> some kind of weird smiley. :-)
>
> Pierre
>
>
>
> ____________________________________________________
> Start your day with Yahoo! - make it your home page
> http://www.yahoo.com/r/hs

-- 
______ _/      _/_____________________________________________________
      _/      _/                      http://www.ebi.ac.uk/mutations/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_ebi _ac _uk
    _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
   _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
  _/  _/  _/  Cambridge, CB10 1SD, United Kingdom
     _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________
From Marc.Logghe at devgen.com  Thu Jul 14 05:53:30 2005
From: Marc.Logghe at devgen.com (Marc Logghe)
Date: Thu Jul 14 05:44:18 2005
Subject: [Bioperl-l] Announce: Bio::Seq::Quality
Message-ID: <0C528E3670D8CE4B8E013F6749231AA62F5439@ANTARESIA.be.devgen.com>

Hi Heikki,

> I've left Bio::Seq::PrimaryQual for the next rewrite. Its 
> functionality is all in the Quality class (get and set id and 
> quality values), but you can not get the quality values from 
> a Bio::Seq::Quality object if you do not have the sequence 
> set. Usually qualities without residues do not make such 
> sense, but there is something in Bio::Assembly code or at 
> least in its tests that need plain qualities.
 
> >   The greatest difference to Bio::Seq::SeqWithQuality is 
> that in this
> >   implementation quality for all sequence residues are automatically
> >   assigned a value of '0' (zero) unless you set it to something
> >   else. Length of the quality array always equals the length of the
> >   sequence. Therefore, length() never returns "DIFFERENT".


When these to extracts of your mail are considered, am I correct in
thinking that the lengths of the sequence and quality array only are
identical when you pass a sequence in the construcor together with the
quality string ? But in all the other cases, how can one be sure that
the lengths are equal ?
E.g. you can first create the Bio::Seq::Quality object passing it the
quality and assign the sequence afterwards by calling $qual->seq($seq).
As you indicated, it is even possible not to set the sequence, so seq
length is zero while the quality is not.
Does it mean a user should check the lengths explicitely ?
BTW I am currently editing Bio::Assembly::IO::ace so that it also parses
CAP3 generated ACE files. You think it is OK also to set the contig
sequence for the quality object, e.g. not leaving the seq attribute
empty. I'll check the Bio::Assembly code why plain quality is needed to
pass the tests.
Cheers,
Marc

From michael.watson at bbsrc.ac.uk  Thu Jul 14 06:53:17 2005
From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C))
Date: Thu Jul 14 06:44:16 2005
Subject: [Bioperl-l] Blast features added to wrong strand???
Message-ID: <8975119BCD0AC5419D61A9CF1A923E9502067A90@iahce2knas1.iah.bbsrc.reserved>

Hi

I'm using bioperl-1.4.

I have a genomic region.  I am first using bl2seq and blastn to align it
with some custom sequences, then blastall and blastx to blast it agains
uniprot.

I'm using SearchIO and pretty standard code to add the blast hits as
features to the sequence e.g.:

my $feature = Bio::SeqFeature::Generic->new(-primary_tag  => 'CDS',
							  -score
=> $hit->raw_score,
							  -display_name
=> $hit->name,
							   -tag
=> {
                                                            locus_tag =>
$name,
	
note      => $note,
                                                              }
							    );

# @hsps is a filtered list of HSPs obtained from $hit->next_hsp
foreach $hsp (@hsps) {
	$feature->add_sub_SeqFeature($hsp,'EXPAND');
}

$genome->add_SeqFeature($feature); # $genome is a Bio::Seq feature

Now, the bl2seq hits all have the strand reported as "Plus / Minus" and
the blastx hits all have the strand reported as -1 i.e. there is a gene
on the other strand of my sequence.

HOWEVER, using the above code for both the bl2seq results and the blastx
results, ONLY the blastx results get annotated on the reverse strand -
the bl2seq results, which report the strand as "Plus / Minus", get
annotated on the forward strand and hence point the wrong way when I
draw them :-(

So my question is, what am I doing wrong in the above code (which is
pretty much ripped off from the bioperl HOWTOs) that makes the bl2seq
"Plus / Minus" hits get annotated on the plus strand on my sequence??

Many thanks
Mick 

From jason.stajich at duke.edu  Thu Jul 14 08:40:47 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Thu Jul 14 08:31:56 2005
Subject: [Bioperl-l] Blast features added to wrong strand???
In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E9502067A90@iahce2knas1.iah.bbsrc.reserved>
References: <8975119BCD0AC5419D61A9CF1A923E9502067A90@iahce2knas1.iah.bbsrc.reserved>
Message-ID: <86A43AEE-AC14-479E-B396-901D3C25C906@duke.edu>

Why use bl2seq when you can use fasta....

 From the Bio::SearchIO::blast documentation

=head2 bl2seq parsing

Since I cannot differentiate between BLASTX and TBLASTN since bl2seq
doesn't report the algorithm used - I assume it is BLASTX by default -
you can supply the program type with -report_type in the SearchIO
constructor i.e.

   my $parser = new Bio::SearchIO(-format => 'blast',
                                  -file => 'bl2seq.tblastn.report',
                                  -report_type => 'tblastn');

This only really affects where the frame and strand information are
put - they will always be on the $hsp-E<gt>query instead of on the
$hsp-E<gt>hit part of the feature pair for blastx and tblastn bl2seq
produced reports.  Hope that's clear...


On Jul 14, 2005, at 6:53 AM, michael watson ((IAH-C)) wrote:

> Hi
>
> I'm using bioperl-1.4.
>
> I have a genomic region.  I am first using bl2seq and blastn to  
> align it
> with some custom sequences, then blastall and blastx to blast it  
> agains
> uniprot.
>
> I'm using SearchIO and pretty standard code to add the blast hits as
> features to the sequence e.g.:
>
> my $feature = Bio::SeqFeature::Generic->new(-primary_tag  => 'CDS',
>                               -score
> => $hit->raw_score,
>                               -display_name
> => $hit->name,
>                                -tag
> => {
>                                                              
> locus_tag =>
> $name,
>
> note      => $note,
>                                                               }
>                                 );
>
> # @hsps is a filtered list of HSPs obtained from $hit->next_hsp
> foreach $hsp (@hsps) {
>     $feature->add_sub_SeqFeature($hsp,'EXPAND');
> }
>
> $genome->add_SeqFeature($feature); # $genome is a Bio::Seq feature
>
> Now, the bl2seq hits all have the strand reported as "Plus / Minus"  
> and
> the blastx hits all have the strand reported as -1 i.e. there is a  
> gene
> on the other strand of my sequence.
>
> HOWEVER, using the above code for both the bl2seq results and the  
> blastx
> results, ONLY the blastx results get annotated on the reverse strand -
> the bl2seq results, which report the strand as "Plus / Minus", get
> annotated on the forward strand and hence point the wrong way when I
> draw them :-(
>
> So my question is, what am I doing wrong in the above code (which is
> pretty much ripped off from the bioperl HOWTOs) that makes the bl2seq
> "Plus / Minus" hits get annotated on the plus strand on my sequence??
>
> Many thanks
> Mick
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12/


From michael.watson at bbsrc.ac.uk  Thu Jul 14 09:09:45 2005
From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C))
Date: Thu Jul 14 09:00:35 2005
Subject: [Bioperl-l] Blast features added to wrong strand???
Message-ID: <8975119BCD0AC5419D61A9CF1A923E950172D9EF@iahce2knas1.iah.bbsrc.reserved>

No, see something definitely weird is going on, but I am happy to accept
it may be my misuse.

To the bl2seq hits I have added the code "-report_type => 'blastn'" and
I get the same results.

The *really* weird thing is that after I have created these features, if
I write them to an EMBL file, they are annotated as being on the -1
strand e.g. complement(1..230) etc etc.  However, when I pass those very
same features to $panel->add_track, they are drawn on the + strand.

If I iterate through them and "print $feat->strand", they all say -1.
Write them out as EMBL, they say "complement(1..23) etc.  Draw them, and
they point --------------------> that way.

If I write them out as EMBL, then read them back in using Bio::SeqIO,
then pass them to $panel->add_track, they point in the right direction.
So something is getting set wrong - could it be "frame"?

????

-----Original Message-----
From: Jason Stajich [mailto:jason.stajich@duke.edu] 
Sent: 14 July 2005 13:41
To: michael watson (IAH-C)
Cc: bioperl-l@portal.open-bio.org
Subject: Re: [Bioperl-l] Blast features added to wrong strand???


Why use bl2seq when you can use fasta....

 From the Bio::SearchIO::blast documentation

=head2 bl2seq parsing

Since I cannot differentiate between BLASTX and TBLASTN since bl2seq
doesn't report the algorithm used - I assume it is BLASTX by default -
you can supply the program type with -report_type in the SearchIO
constructor i.e.

   my $parser = new Bio::SearchIO(-format => 'blast',
                                  -file => 'bl2seq.tblastn.report',
                                  -report_type => 'tblastn');

This only really affects where the frame and strand information are put
- they will always be on the $hsp-E<gt>query instead of on the
$hsp-E<gt>hit part of the feature pair for blastx and tblastn bl2seq
produced reports.  Hope that's clear...


On Jul 14, 2005, at 6:53 AM, michael watson ((IAH-C)) wrote:

> Hi
>
> I'm using bioperl-1.4.
>
> I have a genomic region.  I am first using bl2seq and blastn to
> align it
> with some custom sequences, then blastall and blastx to blast it  
> agains
> uniprot.
>
> I'm using SearchIO and pretty standard code to add the blast hits as 
> features to the sequence e.g.:
>
> my $feature = Bio::SeqFeature::Generic->new(-primary_tag  => 'CDS',
>                               -score
> => $hit->raw_score,
>                               -display_name
> => $hit->name,
>                                -tag
> => {
>                                                              
> locus_tag =>
> $name,
>
> note      => $note,
>                                                               }
>                                 );
>
> # @hsps is a filtered list of HSPs obtained from $hit->next_hsp 
> foreach $hsp (@hsps) {
>     $feature->add_sub_SeqFeature($hsp,'EXPAND');
> }
>
> $genome->add_SeqFeature($feature); # $genome is a Bio::Seq feature
>
> Now, the bl2seq hits all have the strand reported as "Plus / Minus"
> and
> the blastx hits all have the strand reported as -1 i.e. there is a  
> gene
> on the other strand of my sequence.
>
> HOWEVER, using the above code for both the bl2seq results and the
> blastx
> results, ONLY the blastx results get annotated on the reverse strand -
> the bl2seq results, which report the strand as "Plus / Minus", get
> annotated on the forward strand and hence point the wrong way when I
> draw them :-(
>
> So my question is, what am I doing wrong in the above code (which is 
> pretty much ripped off from the bioperl HOWTOs) that makes the bl2seq 
> "Plus / Minus" hits get annotated on the plus strand on my sequence??
>
> Many thanks
> Mick
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org 
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12/


From jbedell at oriongenomics.com  Thu Jul 14 10:24:39 2005
From: jbedell at oriongenomics.com (Joseph Bedell)
Date: Thu Jul 14 10:16:08 2005
Subject: [Bioperl-l] Getting hit or subject length in BPlite
Message-ID: <434AF352F9D03C4C896782B8CC78BC7687F922@VADER.oriongenomics.com>

Hey Mick,

Here's how to get the queryLength and the sbjct Length. Is this what
you're looking for?


my $report = new BPlite(\*STDIN);
       $report->queryLength;
while(my $sbjct = $report->nextSbjct) {
         $sbjct->length;

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Joseph A Bedell, Ph.D.         office: 314-615-6979 
Director, Bioinformatics         fax:    314-615-6975 
Orion Genomics                   cell:   314-518-1343
4041 Forest Park Ave
St. Louis, MO 63108
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

>-----Original Message-----
>From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-
>bounces@portal.open-bio.org] On Behalf Of michael watson (IAH-C)
>Sent: Wednesday, July 13, 2005 10:05 AM
>To: bioperl-l@portal.open-bio.org
>Subject: [Bioperl-l] Getting hit or subject length in BPlite
>
>Hi
>
>Maybe I'm just being dumb, but I can't see a way to get the hit length
>(note: NOT hsp length) using Bplite to parse a blast report....
>
>Any help?
>
>Mick
>
>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l@portal.open-bio.org
>http://portal.open-bio.org/mailman/listinfo/bioperl-l

From michael.watson at bbsrc.ac.uk  Thu Jul 14 10:32:11 2005
From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C))
Date: Thu Jul 14 10:23:57 2005
Subject: [Bioperl-l] Getting hit or subject length in BPlite
Message-ID: <8975119BCD0AC5419D61A9CF1A923E950172D9F8@iahce2knas1.iah.bbsrc.reserved>

Error message:

Can't locate object method "length" via package
"Bio::Tools::Bplite::Sbjct"

I managed to access it by hacking, ie I call 

$sbjct->{'LENGTH'}

But it seems a little bit of an oversight to store it and yet not
provide an accessor method?

Mick

-----Original Message-----
From: Joseph Bedell [mailto:jbedell@oriongenomics.com] 
Sent: 14 July 2005 15:25
To: michael watson (IAH-C); bioperl-l@portal.open-bio.org
Subject: RE: [Bioperl-l] Getting hit or subject length in BPlite


Hey Mick,

Here's how to get the queryLength and the sbjct Length. Is this what
you're looking for?


my $report = new BPlite(\*STDIN);
       $report->queryLength;
while(my $sbjct = $report->nextSbjct) {
         $sbjct->length;

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Joseph A Bedell, Ph.D.         office: 314-615-6979 
Director, Bioinformatics         fax:    314-615-6975 
Orion Genomics                   cell:   314-518-1343
4041 Forest Park Ave
St. Louis, MO 63108
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

>-----Original Message-----
>From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l- 
>bounces@portal.open-bio.org] On Behalf Of michael watson (IAH-C)
>Sent: Wednesday, July 13, 2005 10:05 AM
>To: bioperl-l@portal.open-bio.org
>Subject: [Bioperl-l] Getting hit or subject length in BPlite
>
>Hi
>
>Maybe I'm just being dumb, but I can't see a way to get the hit length
>(note: NOT hsp length) using Bplite to parse a blast report....
>
>Any help?
>
>Mick
>
>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l@portal.open-bio.org 
>http://portal.open-bio.org/mailman/listinfo/bioperl-l

From heikki at ebi.ac.uk  Thu Jul 14 11:19:56 2005
From: heikki at ebi.ac.uk (Heikki Lehvaslaiho)
Date: Thu Jul 14 11:10:27 2005
Subject: [Bioperl-l] Announce: Bio::Seq::Quality
In-Reply-To: <0C528E3670D8CE4B8E013F6749231AA62F5439@ANTARESIA.be.devgen.com>
References: <0C528E3670D8CE4B8E013F6749231AA62F5439@ANTARESIA.be.devgen.com>
Message-ID: <200507141619.56288.heikki@ebi.ac.uk>


Marc,

The way I wrote Bio::Seq::Meta modules is that you can set meta sets and 
sequence completely independently and everything is stored within the object, 
but only that part of the meta arrays are rerurned that have residues.

I tried this out now and realised that it does not work for padding the 
quality values:

e.g. 
$s = new Bio::Seq::Quality(-qual=> "6 6 7")
$s->qual(); # returns []
$s->qual_text(); # returns ''
$s->seq(atcg);
$qual_text(); # should return '6 6 7 0' but returns '6 6 7';


I have to tweak the code now. So, what do you think? Is the automatic padding 
a good or bad thing? Should I get rid of it or make sure it works as I 
planned? 

In other words, do you think it is better to let users make their own mistakes 
and offer ways to check for inconsistencies, or offer a "padded" fool proof 
system? (If this fool gets it right in the first place.)


 -Heikki


On Thursday 14 July 2005 10:53, Marc Logghe wrote:
> Hi Heikki,
>
> > I've left Bio::Seq::PrimaryQual for the next rewrite. Its
> > functionality is all in the Quality class (get and set id and
> > quality values), but you can not get the quality values from
> > a Bio::Seq::Quality object if you do not have the sequence
> > set. Usually qualities without residues do not make such
> > sense, but there is something in Bio::Assembly code or at
> > least in its tests that need plain qualities.
> >
> > >   The greatest difference to Bio::Seq::SeqWithQuality is
> >
> > that in this
> >
> > >   implementation quality for all sequence residues are automatically
> > >   assigned a value of '0' (zero) unless you set it to something
> > >   else. Length of the quality array always equals the length of the
> > >   sequence. Therefore, length() never returns "DIFFERENT".
>
> When these to extracts of your mail are considered, am I correct in
> thinking that the lengths of the sequence and quality array only are
> identical when you pass a sequence in the construcor together with the
> quality string ? But in all the other cases, how can one be sure that
> the lengths are equal ?
> E.g. you can first create the Bio::Seq::Quality object passing it the
> quality and assign the sequence afterwards by calling $qual->seq($seq).
> As you indicated, it is even possible not to set the sequence, so seq
> length is zero while the quality is not.
> Does it mean a user should check the lengths explicitely ?
> BTW I am currently editing Bio::Assembly::IO::ace so that it also parses
> CAP3 generated ACE files. You think it is OK also to set the contig
> sequence for the quality object, e.g. not leaving the seq attribute
> empty. I'll check the Bio::Assembly code why plain quality is needed to
> pass the tests.
> Cheers,
> Marc

-- 
______ _/      _/_____________________________________________________
      _/      _/                      http://www.ebi.ac.uk/mutations/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_ebi _ac _uk
    _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
   _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
  _/  _/  _/  Cambridge, CB10 1SD, United Kingdom
     _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________
From Marc.Logghe at devgen.com  Thu Jul 14 11:54:10 2005
From: Marc.Logghe at devgen.com (Marc Logghe)
Date: Thu Jul 14 11:44:58 2005
Subject: [Bioperl-l] Announce: Bio::Seq::Quality
Message-ID: <0C528E3670D8CE4B8E013F6749231AA62F5440@ANTARESIA.be.devgen.com>

> 
> The way I wrote Bio::Seq::Meta modules is that you can set 
> meta sets and sequence completely independently and 
> everything is stored within the object, but only that part of 
> the meta arrays are rerurned that have residues.
> 
> I tried this out now and realised that it does not work for 
> padding the quality values:
> 
> e.g. 
> $s = new Bio::Seq::Quality(-qual=> "6 6 7") $s->qual(); # 
> returns [] $s->qual_text(); # returns ''
> $s->seq(atcg);
> $qual_text(); # should return '6 6 7 0' but returns '6 6 7';

Ah, I see. The length checking + padding is only triggerd when one calls
qual().

> I have to tweak the code now. So, what do you think? Is the 
> automatic padding a good or bad thing? Should I get rid of it 
> or make sure it works as I planned? 

Personally I'd use that optionally by setting/resetting a padding flag
or something. I'd more be interested in having a way to validate your
Bio::Seq::Quality one way or another. In de case padding is switched
off, I'd like to know whether my sequence length is exactly the same as
my quality array. Does that make sense ?
Thing is, I am currently struggling with the Bio::Assembly* module
because we've noticed that the contig sequence object may contain gaps
and as a consequence is larger than the quality object that can be
extracted from the ace file produced by cap3. I'd like to include into
Bio::Assembly::IO::ace the feature that a Bio::Seq::Quality is created
*including* the cleaned contig sequence (gaps removed). In that process
it would be very usefull to have a way to check for inconsistencies. Not
sure however what to do when an inconsistency is actually occuring:
throw an exception, a warning, trash the contig or keep it, ???

> 
> In other words, do you think it is better to let users make 
> their own mistakes and offer ways to check for 
> inconsistencies, or offer a "padded" fool proof system? (If 
> this fool gets it right in the first place.)

In conclusion I'd opt for a inconsistency check and an optional padding
feature.

Cheers,
ML

From n.haigh at sheffield.ac.uk  Fri Jul 15 08:43:16 2005
From: n.haigh at sheffield.ac.uk (Nathan Haigh)
Date: Fri Jul 15 08:34:05 2005
Subject: [Bioperl-l] Bio::Graphics and primer3 pipeline
Message-ID: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAAIBuBbth5gEeZS9eEGziB38KAAAAQAAAATaxqRfxdbk2zfKJwCx+XdgEAAAAA@sheffield.ac.uk>

I'm creating a pipeline for passing around 200 sequences to primer3 in order
to generate primers. I want to be able to use Bio::Graphics to create a png
file for each sequence with the position of the primers shown and some of
the details about each primer (e.g. Tm, %GC).

 
Here's what I have so far in pseudocode:

 
Foreach Bio::Seq object

    Add position of introns as Bio::SeqFeature::Generic features

    Run Primer3 with Bio::Seq object

    Loop through primers, returning a Bio::Seq::PrimedSeq object

        Add primer as features using: Bio::Seq
object->add_SeqFeature(Bio::Seq::PrimedSeq);

 
Now create png using Bio::Graphics

 
This works ok, but I'm lost trying to get the Tm and GC content of the
primers as returned by Primer3

 
Does anyone have a script that can do something similar that I might try to
work out whats going on?

Thanks

Nathan

 
----------------------------------

Nathan Haigh

Bioinformatics PostDoctoral Research Associate

 
Room B2 211

Department of Animal and Plant Sciences

University of Sheffield

Western Bank

Sheffield

S10 2TN

 
Tel: +44 (0)114 22 20112

Mob: +44 (0)7742 533 569

Fax: +44 (0)114 22 20002

 
From halwaniradzi at yahoo.com  Fri Jul 15 00:47:13 2005
From: halwaniradzi at yahoo.com (halwani radzi)
Date: Fri Jul 15 10:34:09 2005
Subject: [Bioperl-l] anyone have experience on developing parallel
	smith-waterman for sequence alignment?
Message-ID: <20050715044713.10107.qmail@web90009.mail.scd.yahoo.com>

Hi everyone,
I have to do a research on parallelizing the matrix calculation using smith-waterman algorithm based on local aligment for sequence comparison. I have read some threads here discussing about this thing. I really appreciate if anyone that has exprience with this to share some information such as example codes, suitable framework and hardware even the papers/people/website to refer to. Thank you very much..

		
---------------------------------
Do you Yahoo!?
 Read only the mail you want - Yahoo! Mail SpamGuard.
From mayagao1999 at yahoo.com  Fri Jul 15 13:36:48 2005
From: mayagao1999 at yahoo.com (Alex Zhang)
Date: Fri Jul 15 13:27:29 2005
Subject: [Bioperl-l] A question about replacing a substring using Bioperl
Message-ID: <20050715173648.55102.qmail@web53508.mail.yahoo.com>

Dear all,

I have a txt file which stores 20 short DNA sequences
and the length of each is 8, let's
call it A. Meanwhile, I have another txt file which
owns 100 long DNA sequences and the length of each is
200, let's call it B.

Then, I want to replace a substring of each sequence
in B with each one in A.
The replacement starting site could be specified as
you want(such as
starting at position 1 for the first sequence in B,
10th for the 2nd sequence in
B, 20th for the 3rd, until 190th for the 20th in B )
or picked
by the program randomly. I am pretty sure
substr(string,index,length,replacement string) can
finish a part
of this work.

But I have limited experience of using Perl to
manipulate two files. Can anybody
give me some suggestions?

Thank you very much and look forward to your reply!

Best Regards,
      Maya

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
From skirov at utk.edu  Fri Jul 15 14:20:42 2005
From: skirov at utk.edu (Stefan Kirov)
Date: Fri Jul 15 14:12:09 2005
Subject: [Bioperl-l] Re: Information regarding BioPerl Transfac module
In-Reply-To: <6.2.0.14.0.20050715113315.0216a310@anumail.anu.edu.au>
References: <6.2.0.14.0.20050715113315.0216a310@anumail.anu.edu.au>
Message-ID: <42D7FE7A.2050803@utk.edu>

Nagesh,
Please address all questions to the bioperl list (if appropriate) and 
not to me personally.
Bio:::Matrix::PSM::IO can parse only the matrix.dat file at the moment 
and I have no intention of extending it. As for TFBS::DB::LocalTRANSFAC- 
it is not part of BioPerl in the moment and I believe  qustions should 
be addressed to Boris Lehnard or Leonardo Ramirez.  But it is clear to 
me from the documentation, that it also parses only matrix.dat (see NAME 
section).
Stefan

Nagesh Chakka wrote:

> Dear Stefan Kirov,
> I am Nagesh doing my PhD in Medical Sciences at Australian National 
> University. I have read some of your posting in the BioPerl mailing 
> list and thought of writing to you as I was not able to get the 
> information I wanted. I wanted to know whether we can achieve a simple 
> task of getting the information (about the name of the TF, Cell 
> specificity, tissue expression and other information) from the .dat 
> file if we know the matrix ID information using the 
> TFBS::DB::LocalTRANSFAC. I havent seen any method that returns what I 
> am interested in. So are there any other modules to achieve this task. 
> I also had a look at the Bio::Matrix::PSM::IO (could not find what I 
> wanted).
> Thanks very much for your attention. Any information related to this 
> would be highly appreciated.
> Regards
>
> Nagesh Chakka
> PhD Student
> John Curtin School of Medical Research
> Australian National University
> PO Box 334, Canberra ACT 2601
> Phone: +61-2-6125-8303
> Fax: +61-2-6125-0415


From chad at dieselwurks.com  Fri Jul 15 15:51:48 2005
From: chad at dieselwurks.com (Chad Matsalla)
Date: Fri Jul 15 15:42:33 2005
Subject: [Bioperl-l] genbank2gff3.PLS and the unflatenner - Inconsistent
	order?
Message-ID: <Pine.LNX.4.62.0507151200180.20311@sausage.usask.ca>


Greetings,

I posted to bioperl-l, hmm, back in June reporting issues with the
genbank2gff* scripts.

Time moved on but I returned to this project where I'm searching through
the Arabidopsis mitochondria for things. I want to gff-i-fy this:
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Nucleotide&cmd=search&term=nc_001284

I downloaded the genbank record and ran this command:
$BIOPERL_LIVE/scripts/scripts/Bio-DB-GFF/genbank2gff3.PLS genbank/nc_001284.gb 
Unordered features [on strand:-1]:
NC_001284 Unflattening error:
Details:
------------- EXCEPTION  -------------
MSG: ASSERTION ERROR: inconsistent order

Note that I put a BEGIN clause on top of genbank2gff3.PLS script this
morning and cvs committed it. This BEGIN clause ensures that the script
is using the cvs-versions of Bio::* modules which I've compiled into
./blib/lib .

The issue is for more than my mitochondria - it breaks even against test
data:
$BIOPERL/scripts/Bio-DB-GFF/genbank2gff3.PLS $BIOPERL/t/data/test.genbank -o .
...stuff...
L26462 Unflattening error:
...big exception...

Unfortunately, I don't know much about the unflattener but I'll help if
someone can point me in the right direction.

Oh, and just for an additional datapoint, the file
t/data/AE003644_Adh-genomic.gb works.


Thank you,

Chad Matsalla

From lstein at cshl.edu  Fri Jul 15 16:45:28 2005
From: lstein at cshl.edu (Lincoln Stein)
Date: Fri Jul 15 16:36:37 2005
Subject: [Bioperl-l] GFF3 and Gbrowse
In-Reply-To: <200506281654.35206.lstein@cshl.edu>
References: <BEE6D526.6953%anunberg@oriongenomics.com>
	<1119987851.3365.46.camel@localhost.localdomain>
	<200506281654.35206.lstein@cshl.edu>
Message-ID: <200507151645.29893.lstein@cshl.edu>

My memory is failing. The glyph and aggregator are named 
"processed_transcript"  I've also just now created a pair called 
"so_transcript" that do exactly the same thing. They should work with the 
GFF3 "canonical gene." Let me know if they don't.

Lincoln

On Tuesday 28 June 2005 04:54 pm, Lincoln Stein wrote:
> It's in bioperl CVS. A copy is also in the gbrowse CVS which will be
> installed if it detects an old version of bioperl.
>
> Lincoln
>
> On Tuesday 28 June 2005 03:44 pm, Scott Cain wrote:
> > Lincoln,
> >
> > This is the first I've heard of the so_transcript aggregator; have you
> > committed it anywhere?
> >
> > Scott
> >
> > On Tue, 2005-06-28 at 13:24 -0400, Lincoln Stein wrote:
> > > The bioperl GFF database (both the inmemory and relational database
> > > versions) need to be brought up to date to handle the full expressive
> > > powerof GFF3. So for the time being ID trumps Name. Also you must use
> > > the so_transcript aggregator instead of the processed_transcript
> > > aggregator.
> > >
> > > Lincoln
> > >
> > > On Tuesday 28 June 2005 11:21 am, Andrew Nunberg wrote:
> > > > I was wondering if there is any documentation about using GFF3 format
> > > > with Gbrowse.  Since this is the "new" format, I wanted to start
> > > > using it, but observing some behaviors.
> > > >
> > > > The GFF3 documentation on http://song.sourceforge.net/gff3.shtml
> > > > indicates the Name tag is the id to be displayed and the ID tag is
> > > > unique and internal, however when I use Gbrowse 1.62 it is ID that is
> > > > being displayed as the label.
> > > >
> > > > I wish to use processed_transcript aggregator, the GFF3 document
> > > > indicates you only need to  display the exons and CDS and the UTRs
> > > > will be inferred, however I did not see that when  viewed in Gbrowse.
> > > >
> > > > If there is some extra code or documentation I need please let me
> > > > know
> > > >
> > > > Thanks
> > > > Andy

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse@cshl.edu
From lstein at cshl.edu  Fri Jul 15 16:45:28 2005
From: lstein at cshl.edu (Lincoln Stein)
Date: Fri Jul 15 16:36:39 2005
Subject: [Bioperl-l] GFF3 and Gbrowse
In-Reply-To: <200506281654.35206.lstein@cshl.edu>
References: <BEE6D526.6953%anunberg@oriongenomics.com>
	<1119987851.3365.46.camel@localhost.localdomain>
	<200506281654.35206.lstein@cshl.edu>
Message-ID: <200507151645.29893.lstein@cshl.edu>

My memory is failing. The glyph and aggregator are named 
"processed_transcript"  I've also just now created a pair called 
"so_transcript" that do exactly the same thing. They should work with the 
GFF3 "canonical gene." Let me know if they don't.

Lincoln

On Tuesday 28 June 2005 04:54 pm, Lincoln Stein wrote:
> It's in bioperl CVS. A copy is also in the gbrowse CVS which will be
> installed if it detects an old version of bioperl.
>
> Lincoln
>
> On Tuesday 28 June 2005 03:44 pm, Scott Cain wrote:
> > Lincoln,
> >
> > This is the first I've heard of the so_transcript aggregator; have you
> > committed it anywhere?
> >
> > Scott
> >
> > On Tue, 2005-06-28 at 13:24 -0400, Lincoln Stein wrote:
> > > The bioperl GFF database (both the inmemory and relational database
> > > versions) need to be brought up to date to handle the full expressive
> > > powerof GFF3. So for the time being ID trumps Name. Also you must use
> > > the so_transcript aggregator instead of the processed_transcript
> > > aggregator.
> > >
> > > Lincoln
> > >
> > > On Tuesday 28 June 2005 11:21 am, Andrew Nunberg wrote:
> > > > I was wondering if there is any documentation about using GFF3 format
> > > > with Gbrowse.  Since this is the "new" format, I wanted to start
> > > > using it, but observing some behaviors.
> > > >
> > > > The GFF3 documentation on http://song.sourceforge.net/gff3.shtml
> > > > indicates the Name tag is the id to be displayed and the ID tag is
> > > > unique and internal, however when I use Gbrowse 1.62 it is ID that is
> > > > being displayed as the label.
> > > >
> > > > I wish to use processed_transcript aggregator, the GFF3 document
> > > > indicates you only need to  display the exons and CDS and the UTRs
> > > > will be inferred, however I did not see that when  viewed in Gbrowse.
> > > >
> > > > If there is some extra code or documentation I need please let me
> > > > know
> > > >
> > > > Thanks
> > > > Andy

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse@cshl.edu
From mayagao at gmail.com  Fri Jul 15 13:32:32 2005
From: mayagao at gmail.com (Gao Zhang)
Date: Fri Jul 15 18:45:40 2005
Subject: [Bioperl-l] A question about replacing a substring using Bioperl
In-Reply-To: <F2BC17DC-9F08-4CC6-8ECD-14B29EC70C5B@duke.edu>
References: <7beac6a05071508535feea867@mail.gmail.com>
	<F2BC17DC-9F08-4CC6-8ECD-14B29EC70C5B@duke.edu>
Message-ID: <7beac6a0507151032148bd615@mail.gmail.com>

Dear all,

I have a txt file which stores 20 short DNA sequences and the length of each 
is 8, let's 
call it A. Meanwhile, I have another txt file which owns 100 long DNA 
sequences and the length of each is
200, let's call it B. 

Then, I want to replace a substring of each sequence in B with each one in 
A.
The replacement starting site could be specified as you want(such as
starting at position 1 for the first sequence in B, 10th for the 2nd 
sequence in
B, 20th for the 3rd, until 190th for the 20th in B ) or picked
by the program randomly. I am pretty sure 
substr(string,index,length,replacement string) can finish a part
of this work. 

But I have limited experience of using Perl to manipulate two files. Can 
anybody
give me some suggestions?

Thank you very much and look forward to your reply!

Best Regards,
Maya

From rob at salmonella.org  Sat Jul 16 11:07:27 2005
From: rob at salmonella.org (Rob Edwards)
Date: Sat Jul 16 10:57:59 2005
Subject: [Bioperl-l] Bio::Graphics and primer3 pipeline
In-Reply-To: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAAIBuBbth5gEeZS9eEGziB38KAAAAQAAAATaxqRfxdbk2zfKJwCx+XdgEAAAAA@sheffield.ac.uk>
References: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAAIBuBbth5gEeZS9eEGziB38KAAAAQAAAATaxqRfxdbk2zfKJwCx+XdgEAAAAA@sheffield.ac.uk>
Message-ID: <29CA8456-285F-4A66-B865-4BFC35E4730F@salmonella.org>

You should be able to get the primers Tm from the  
Bio::SeqFeature::Primer objects - there are two different methods in  
there for calculating Tm's. The GC content is not a method at the  
moment, but could be added as one.

Rob


On Jul 15, 2005, at 5:43 AM, Nathan Haigh wrote:

> I'm creating a pipeline for passing around 200 sequences to primer3  
> in order
> to generate primers. I want to be able to use Bio::Graphics to  
> create a png
> file for each sequence with the position of the primers shown and  
> some of
> the details about each primer (e.g. Tm, %GC).
>
>
>
> Here's what I have so far in pseudocode:
>
>
>
> Foreach Bio::Seq object
>
>     Add position of introns as Bio::SeqFeature::Generic features
>
>     Run Primer3 with Bio::Seq object
>
>     Loop through primers, returning a Bio::Seq::PrimedSeq object
>
>         Add primer as features using: Bio::Seq
> object->add_SeqFeature(Bio::Seq::PrimedSeq);
>
>
>
> Now create png using Bio::Graphics
>
>
>
> This works ok, but I'm lost trying to get the Tm and GC content of the
> primers as returned by Primer3
>
>
>
> Does anyone have a script that can do something similar that I  
> might try to
> work out whats going on?
>
> Thanks
>
> Nathan
>
>
>
>
>
>
>
> ----------------------------------
>
> Nathan Haigh
>
> Bioinformatics PostDoctoral Research Associate
>
>
>
> Room B2 211
>
> Department of Animal and Plant Sciences
>
> University of Sheffield
>
> Western Bank
>
> Sheffield
>
> S10 2TN
>
>
>
> Tel: +44 (0)114 22 20112
>
> Mob: +44 (0)7742 533 569
>
> Fax: +44 (0)114 22 20002
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>

From hlapp at gmx.net  Sat Jul 16 20:02:05 2005
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat Jul 16 19:52:45 2005
Subject: [Bioperl-l] COG parsing ?
In-Reply-To: <20050713211957.6373.qmail@web40513.mail.yahoo.com>
References: <20050713211957.6373.qmail@web40513.mail.yahoo.com>
Message-ID: <038871a824ed3175897809adfeac5c98@gmx.net>

Not that I'm aware of.

If you're thinking about contributing one, that'd be cool.

	-hilmar

On Jul 13, 2005, at 2:19 PM, Renee Halbrook wrote:

> Hi,
>
> Does BioPerl have a parser for the Clusters of
> Orthologous Groups of proteins (COGs) from NCBI ?
>
>
> Thanks for any help,
> Renee Halbrook
>
>
> 		
> __________________________________
> Yahoo! Mail
> Stay connected, organized, and protected. Take the tour:
> http://tour.mail.yahoo.com/mailtour.html
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From n.haigh at sheffield.ac.uk  Sun Jul 17 09:22:43 2005
From: n.haigh at sheffield.ac.uk (Nathan Haigh)
Date: Sun Jul 17 09:16:02 2005
Subject: [Bioperl-l] Bio::Graphics and primer3 pipeline
In-Reply-To: <29CA8456-285F-4A66-B865-4BFC35E4730F@salmonella.org>
Message-ID: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAAIBuBbth5gEeZS9eEGziB38KAAAAQAAAAW082dbVJjECRVq5DcAB03wEAAAAA@sheffield.ac.uk>

Are these not parsed out of the primer3 output file at all, without the need
for actually calculating them? There seems to be quite a few useful
annotations that could be extracted from the output file.

Nath

-----Original Message-----
From: Rob Edwards [mailto:rob@salmonella.org] 
Sent: 16 July 2005 16:07
To: n.haigh@sheffield.ac.uk
Cc: 'Bioperl list'
Subject: Re: [Bioperl-l] Bio::Graphics and primer3 pipeline

You should be able to get the primers Tm from the  
Bio::SeqFeature::Primer objects - there are two different methods in  
there for calculating Tm's. The GC content is not a method at the  
moment, but could be added as one.

Rob


On Jul 15, 2005, at 5:43 AM, Nathan Haigh wrote:

> I'm creating a pipeline for passing around 200 sequences to primer3  
> in order
> to generate primers. I want to be able to use Bio::Graphics to  
> create a png
> file for each sequence with the position of the primers shown and  
> some of
> the details about each primer (e.g. Tm, %GC).
>
>
>
> Here's what I have so far in pseudocode:
>
>
>
> Foreach Bio::Seq object
>
>     Add position of introns as Bio::SeqFeature::Generic features
>
>     Run Primer3 with Bio::Seq object
>
>     Loop through primers, returning a Bio::Seq::PrimedSeq object
>
>         Add primer as features using: Bio::Seq
> object->add_SeqFeature(Bio::Seq::PrimedSeq);
>
>
>
> Now create png using Bio::Graphics
>
>
>
> This works ok, but I'm lost trying to get the Tm and GC content of the
> primers as returned by Primer3
>
>
>
> Does anyone have a script that can do something similar that I  
> might try to
> work out whats going on?
>
> Thanks
>
> Nathan
>
>
>
>
>
>
>
> ----------------------------------
>
> Nathan Haigh
>
> Bioinformatics PostDoctoral Research Associate
>
>
>
> Room B2 211
>
> Department of Animal and Plant Sciences
>
> University of Sheffield
>
> Western Bank
>
> Sheffield
>
> S10 2TN
>
>
>
> Tel: +44 (0)114 22 20112
>
> Mob: +44 (0)7742 533 569
>
> Fax: +44 (0)114 22 20002
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>


From iain.m.wallace at gmail.com  Mon Jul 18 07:54:41 2005
From: iain.m.wallace at gmail.com (Iain Wallace)
Date: Mon Jul 18 07:46:46 2005
Subject: [Bioperl-l] [Bioperl -l] Passing argument via Command line
Message-ID: <8cff3eb80507180454108946df@mail.gmail.com>

Hi all,

I am wondering if anybody can help me. I am trying to open a sequence file 
and parse it via Bio::SeqIO.

My script works fine if I pass the filename in via the commandline e.g. perl 
test_embl.pl filename.
but it doesn't work if i hard code the filename into the script, and I 
cann't figure out why. 

The only two lines i change are:
#my $seqfile = $ARGV[0]; 
my $seqfile = "HBB_HUMAN.BC007075.embl";

Thanks for any help you can give me

Iain 

--The Script--
use Bio::AlignIO;
use Bio::SeqIO;
use Bio::LocatableSeq;

#my $seqfile = $ARGV[0];
my $seqfile = "HBB_HUMAN.BC007075.embl";
print $seqfile,"\n";
my $input = new Bio::SeqIO->new(
-file => $seqfile,-format=>'EMBL');

while ( my $seq = $input->next_seq() ) {
print $seq->id,"\n";
@features = $seq->get_SeqFeatures(); # just top level
foreach my $feat ( @features ) {
if($feat->primary_tag eq "CDS"){
$cds_obj= $feat->spliced_seq;
$cds_seq=$cds_obj->seq;
my @translated = $feat->each_tag_value('translation');
$translated_seq= $translated[0];
print $translated_seq,"\n";
}
}

}

From Marc.Logghe at devgen.com  Mon Jul 18 08:43:25 2005
From: Marc.Logghe at devgen.com (Marc Logghe)
Date: Mon Jul 18 08:34:02 2005
Subject: [Bioperl-l] [Bioperl -l] Passing argument via Command line
Message-ID: <0C528E3670D8CE4B8E013F6749231AA62F545F@ANTARESIA.be.devgen.com>

Hi Iain,
It is because you run 2 times the new method with Bio::SeqIO
new Bio::SeqIO->new( ... )

This is like writing:
my $input = Bio::SeqIO->new;
$input = $input->new( -file => $seqfile,-format=>'EMBL' );

When you take your script where you hardcoded $seqfile and you run as
'perl test_embl.pl HBB_HUMAN.BC007075.embl' everything works fine.
The first time new() is called, you actually do not pass any arguments.
So by default bioperl will look for the passed filename in @ARGV, which
was given. Using that filename it will try to guess the format. This
succeeds also. The 2nd call to new() also will succeed.
But when one runs it like 'perl test_embl.pl' then it fails, because the
first call to new() fails because it has no filename (@ARGV is empty),
so no chance to guess the format.
Your instantion should obviously look like:
my $input = Bio::SeqIO->new( -file => $seqfile,-format=>'EMBL');
HTH,
Marc


> -----Original Message-----
> From: bioperl-l-bounces@portal.open-bio.org 
> [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of 
> Iain Wallace
> Sent: Monday, July 18, 2005 1:55 PM
> To: bioperl-l@portal.open-bio.org
> Subject: [Bioperl-l] [Bioperl -l] Passing argument via Command line
> 
> Hi all,
> 
> I am wondering if anybody can help me. I am trying to open a 
> sequence file and parse it via Bio::SeqIO.
> 
> My script works fine if I pass the filename in via the 
> commandline e.g. perl test_embl.pl filename.
> but it doesn't work if i hard code the filename into the 
> script, and I cann't figure out why. 
> 
> The only two lines i change are:
> #my $seqfile = $ARGV[0];
> my $seqfile = "HBB_HUMAN.BC007075.embl";
> 
> Thanks for any help you can give me
> 
> Iain 
> 
> --The Script--
> use Bio::AlignIO;
> use Bio::SeqIO;
> use Bio::LocatableSeq;
> 
> #my $seqfile = $ARGV[0];
> my $seqfile = "HBB_HUMAN.BC007075.embl"; print $seqfile,"\n"; 
> my $input = new Bio::SeqIO->new( -file => $seqfile,-format=>'EMBL');
> 
> while ( my $seq = $input->next_seq() ) { print $seq->id,"\n"; 
> @features = $seq->get_SeqFeatures(); # just top level foreach 
> my $feat ( @features ) { if($feat->primary_tag eq "CDS"){ 
> $cds_obj= $feat->spliced_seq; $cds_seq=$cds_obj->seq; my 
> @translated = $feat->each_tag_value('translation');
> $translated_seq= $translated[0];
> print $translated_seq,"\n";
> }
> }
> 
> }

From mayagao1999 at yahoo.com  Mon Jul 18 17:06:10 2005
From: mayagao1999 at yahoo.com (Alex Zhang)
Date: Mon Jul 18 16:56:46 2005
Subject: [Bioperl-l] how to work on two txt files simultaneously by handle
	corresponding lines from each file
Message-ID: <20050718210610.18944.qmail@web53509.mail.yahoo.com>

Dear All,

Sorry to bother you again.

I have two txt files to handle. One is
"short_sequences" and the other
one is "long_sequences". The "short_sequences" holds
100 short sequences (8 nucleotide long) and 100 long
sequences (200 nucleotide long) in the
"long_sequence".

For example, the first short sequence is "TTGACATA"
and the first long sequence is
"GAATCATATATTAGTCTCCACATACTCCGTTCGTGACCCATTACCCTTTCGGGAGA
GCCACAGCAACTGTAGATCTCGAAGTTGACAGGGGCAACTAGAGGCCTCAGAATTCT
CACTCTTGAGGAGAGAAGTCTAAGACCTACAGTATGGTCGGGTTAGTTTTTGTTCCGTC
GAACCTTGGACTAACCACTGTCTGGATA".

Basically, I want to generate a random position as a
starting site to replace a substring
in the long sequence with a short sequence. In this
example, we can choose a starting site
as 5th nucleotide in the long sequence, after
replacing using "TTGACATA", the replaced
long sequence is
"GAATTTGACATAAGTCTCCACATACTCCGTTCGTGACCCATTACCCTTTCGGGAGA
GCCACAGCAACTGTAGATCTCGAAGTTGACAGGGGCAACTAGAGGCCTCAGAATTCT
CACTCTTGAGGAGAGAAGTCTAAGACCTACAGTATGGTCGGGTTAGTTTTTGTTCCGTC
GAACCTTGGACTAACCACTGTCTGGATA".

Then I want replace the 2nd long sequence with the 2nd
short sequence and then repeat this over and over
again until the last long sequence is reached and
replaced. I think the only problem is that the
starting site should not be larger than 193.
Otherwise, there are
not enough nucleotides in the long sequence for
replacement.

Furthurmore, I want to keep track the starting
replacement site for each long sequence.


I am copying my code in the below. 
******************************************
use strict;
use warnings;

my (@short, @long, $offset); # the 'short' array will
hold the short
                            #sequences while 'long'
array the long sequences

open(FILE1, '<', "short_sequences.txt") || die "Can't
open short_sequences.txt: $!\n";
while(<FILE1>){
   chomp;
   push(@short, $_);
}
close FILE1; #Close the file

open(FILE2, '<', "long_sequences.txt")  || die "Can't
open long_sequences.txt: $!\n";
while(<FILE2>){
   chomp;
   push(@long, $_);
}
close FILE2; #Close the file


# replacement
foreach my $short(@short){
   foreach my $long(@long){
       $offset = int(rand(length($long)%193));
       substr($long,$offset,length($short),$short);
       printf "%3d", $offset+1;
       print "\n", $long, "\n";

   }
}
********************************************

But I just realized that there is a problem for the
two
loops. The problem is that each short sequence will be
used to replace all long sequences not the
corresponding one. 

So I seek your suggestions on how to handle two files
simultaneously for my case. 

Thank you very much and look forward to your reply!

Best Regards,
    Alex

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
From khoueiry at ibdm.univ-mrs.fr  Mon Jul 18 17:19:47 2005
From: khoueiry at ibdm.univ-mrs.fr (khoueiry)
Date: Mon Jul 18 17:32:58 2005
Subject: [Bioperl-l] how to work on two txt files simultaneously by handle
	corresponding lines from each file
In-Reply-To: <20050718210610.18944.qmail@web53509.mail.yahoo.com>
References: <20050718210610.18944.qmail@web53509.mail.yahoo.com>
Message-ID: <20050718211306.M22651@ibdm.univ-mrs.fr>

If I understood well your idea, I suggest to access table by index (see the code 
below).
I didn't test this code but I think it's a fine way to solve your problem.


# replacement
 for(my $i = 0; $i < $#short; $i++){
     $offset = int(rand(length($long)%193));
     printf "%3d", $offset+1;
     substr($long[$i],$offset,length($short[$i]),$short[$i]);
     print "\n", $long, "\n";
 
    }
 

On Mon, 18 Jul 2005 14:06:10 -0700 (PDT), Alex Zhang wrote
> Dear All,
> 
> Sorry to bother you again.
> 
> I have two txt files to handle. One is
> "short_sequences" and the other
> one is "long_sequences". The "short_sequences" holds
> 100 short sequences (8 nucleotide long) and 100 long
> sequences (200 nucleotide long) in the
> "long_sequence".
> 
> For example, the first short sequence is "TTGACATA"
> and the first long sequence is
> "GAATCATATATTAGTCTCCACATACTCCGTTCGTGACCCATTACCCTTTCGGGAGA
> GCCACAGCAACTGTAGATCTCGAAGTTGACAGGGGCAACTAGAGGCCTCAGAATTCT
> CACTCTTGAGGAGAGAAGTCTAAGACCTACAGTATGGTCGGGTTAGTTTTTGTTCCGTC
> GAACCTTGGACTAACCACTGTCTGGATA".
> 
> Basically, I want to generate a random position as a
> starting site to replace a substring
> in the long sequence with a short sequence. In this
> example, we can choose a starting site
> as 5th nucleotide in the long sequence, after
> replacing using "TTGACATA", the replaced
> long sequence is
> "GAATTTGACATAAGTCTCCACATACTCCGTTCGTGACCCATTACCCTTTCGGGAGA
> GCCACAGCAACTGTAGATCTCGAAGTTGACAGGGGCAACTAGAGGCCTCAGAATTCT
> CACTCTTGAGGAGAGAAGTCTAAGACCTACAGTATGGTCGGGTTAGTTTTTGTTCCGTC
> GAACCTTGGACTAACCACTGTCTGGATA".
> 
> Then I want replace the 2nd long sequence with the 2nd
> short sequence and then repeat this over and over
> again until the last long sequence is reached and
> replaced. I think the only problem is that the
> starting site should not be larger than 193.
> Otherwise, there are
> not enough nucleotides in the long sequence for
> replacement.
> 
> Furthurmore, I want to keep track the starting
> replacement site for each long sequence.
> 
> I am copying my code in the below. 
> ******************************************
> use strict;
> use warnings;
> 
> my (@short, @long, $offset); # the 'short' array will
> hold the short
>                             #sequences while 'long'
> array the long sequences
> 
> open(FILE1, '<', "short_sequences.txt") || die "Can't
> open short_sequences.txt: $!\n";
> while(<FILE1>){
>    chomp;
>    push(@short, $_);
> }
> close FILE1; #Close the file
> 
> open(FILE2, '<', "long_sequences.txt")  || die "Can't
> open long_sequences.txt: $!\n";
> while(<FILE2>){
>    chomp;
>    push(@long, $_);
> }
> close FILE2; #Close the file
> 
> # replacement
> foreach my $short(@short){
>    foreach my $long(@long){
>        $offset = int(rand(length($long)%193));
>        substr($long,$offset,length($short),$short);
>        printf "%3d", $offset+1;
>        print "\n", $long, "\n";
> 
>    }
> }
> ********************************************
> 
> But I just realized that there is a problem for the
> two
> loops. The problem is that each short sequence will be
> used to replace all long sequences not the
> corresponding one.
> 
> So I seek your suggestions on how to handle two files
> simultaneously for my case.
> 
> Thank you very much and look forward to your reply!
> 
> Best Regards,
>     Alex
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around 
> http://mail.yahoo.com 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l


--
Open WebMail Project (http://openwebmail.org)

From tgra at ceh.ac.uk  Tue Jul 19 08:23:14 2005
From: tgra at ceh.ac.uk (Tanya Gray)
Date: Tue Jul 19 08:11:21 2005
Subject: [Bioperl-l] FeatureIO::gff.pm -- error fetching sofa.definition
Message-ID: <s2dcfdcd.003@wpo.nerc.ac.uk>

Hi, I have a simple test script to read a GFF3 file using FeatureIO::gff.pm. Unfortunately it is throwing errors relating to retrieval of sofa.definition file:
MSG: failed to fetch http://umn.dl.sourceforge.net/sourceforge/song/sofa.definition, server threw 500

I am using Bioperl-1.5.0. I just wonder if anyone might know what the problem is.  Copy of the relevant script/ error messages below.

thank you
Tanya


relevant lines of script
---------------------------

my $file_features = "gff3.test";

my $fio = Bio::FeatureIO->new( -file =>$file_features, -format =>"GFF", -validate_terms=>0, -version=>3) or print "\nError occurred: " . $! ;


gff3.test
----------

##gff-version   3
##sequence-region   ctg123 1 1497228
ctg123  .       gene    1000    9000    .       +       .       ID=gene00001;Name=EDEN

ERROR MESSAGES
---------------------
perl gff3test.pl                                                            [11:11]

-------------------- WARNING ---------------------
MSG: [1/5] tried to fetch http://umn.dl.sourceforge.net/sourceforge/song/sofa.definition, but server threw 500.  retrying...
---------------------------------------------------

-------------------- WARNING ---------------------
MSG: [2/5] tried to fetch http://umn.dl.sourceforge.net/sourceforge/song/sofa.definition, but server threw 500.  retrying...
---------------------------------------------------

-------------------- WARNING ---------------------
MSG: [3/5] tried to fetch http://umn.dl.sourceforge.net/sourceforge/song/sofa.definition, but server threw 500.  retrying...
---------------------------------------------------

-------------------- WARNING ---------------------
MSG: [4/5] tried to fetch http://umn.dl.sourceforge.net/sourceforge/song/sofa.definition, but server threw 500.  retrying...
---------------------------------------------------

-------------------- WARNING ---------------------
MSG: [5/5] tried to fetch http://umn.dl.sourceforge.net/sourceforge/song/sofa.definition, but server threw 500.  retrying...
---------------------------------------------------

------------- EXCEPTION  -------------
MSG: failed to fetch http://umn.dl.sourceforge.net/sourceforge/song/sofa.definition, server threw 500
STACK Bio::Root::IO::_initialize_io /usr/local/share/perl/5.8.7/Bio/Root/IO.pm:276
STACK Bio::Root::IO::new /usr/local/share/perl/5.8.7/Bio/Root/IO.pm:227
STACK Bio::OntologyIO::dagflat::defs_url /usr/local/share/perl/5.8.7/Bio/OntologyIO/dagflat.pm:361
STACK Bio::OntologyIO::dagflat::_initialize /usr/local/share/perl/5.8.7/Bio/OntologyIO/dagflat.pm:188
STACK Bio::OntologyIO::soflat::_initialize /usr/local/share/perl/5.8.7/Bio/OntologyIO/soflat.pm:145
STACK Bio::OntologyIO::new /usr/local/share/perl/5.8.7/Bio/OntologyIO.pm:169
STACK Bio::OntologyIO::new /usr/local/share/perl/5.8.7/Bio/OntologyIO.pm:178
STACK Bio::Ontology::OntologyStore::get_ontology /usr/local/share/perl/5.8.7/Bio/Ontology/OntologyStore.pm:225
STACK Bio::FeatureIO::gff::_initialize /usr/local/share/perl/5.8.7/Bio/FeatureIO/gff.pm:110
STACK Bio::FeatureIO::new /usr/local/share/perl/5.8.7/Bio/FeatureIO.pm:268
STACK Bio::FeatureIO::new /usr/local/share/perl/5.8.7/Bio/FeatureIO.pm:288
STACK toplevel gff3test.pl:15


From grassi.e at virgilio.it  Tue Jul 19 09:58:40 2005
From: grassi.e at virgilio.it (Elena Grassi)
Date: Tue Jul 19 09:49:20 2005
Subject: [Bioperl-l] Bio-perl and webpages?
Message-ID: <1121781520.4064.38.camel@localhost.localdomain>

Hi,

I've got a bunch of scripts (ok, it should be a complete program, but
that's not the point now...) written in perl (written with other people
not that much object-oriented) and now I need to make them work through
a website.
My first idea is to use a little bit of dirty php, my second one is to
translate perl in php (I have to admit that I'd prefer not to use this
idea...), the third one involves bioperl: if I decide to try to re-write
the scripts with bioperl is there any suitable and fast tool to put them
into an html based structure?

Sorry for the nearly OT question,
E.
-- 
If I were a swan, I'd be gone.
If - Pink Floyd - Atom Heart Mother

From cain at cshl.edu  Tue Jul 19 09:59:17 2005
From: cain at cshl.edu (Scott Cain)
Date: Tue Jul 19 09:49:56 2005
Subject: [Bioperl-l] FeatureIO::gff.pm -- error fetching sofa.definition
In-Reply-To: <s2dcfdcd.003@wpo.nerc.ac.uk>
Message-ID: <Pine.GSO.4.05.10507190955410.29063-100000@phage.cshl.edu>

Hi Tanya,

The version of Bio::FeatureIO::gff in bioperl-1.5 was still a little
rough.  In particular, it required the download of SOFA from a hard coded
location to validate the types in the GFF3.  In bioperl-live, validation
has become a option you can pass to the constructor and is off by default.
I believe the location to get SOFA from is still hard coded in though.  I
would suggest using bioperl-live if you can.

Scott

----------------------------------------------------------------------
Scott Cain, Ph. D.				 	 cain@cshl.org
GMOD Coordinator, http://www.gmod.org/			 (216)392-3087
----------------------------------------------------------------------


On Tue, 19 Jul 2005, Tanya Gray wrote:

> Hi, I have a simple test script to read a GFF3 file using FeatureIO::gff.pm. Unfortunately it is throwing errors relating to retrieval of sofa.definition file:
> MSG: failed to fetch http://umn.dl.sourceforge.net/sourceforge/song/sofa.definition, server threw 500
> 
> I am using Bioperl-1.5.0. I just wonder if anyone might know what the problem is.  Copy of the relevant script/ error messages below.
> 
> thank you
> Tanya
> 
> 
> relevant lines of script
> ---------------------------
> 
> my $file_features = "gff3.test";
> 
> my $fio = Bio::FeatureIO->new( -file =>$file_features, -format =>"GFF", -validate_terms=>0, -version=>3) or print "\nError occurred: " . $! ;
> 
> 
> gff3.test
> ----------
> 
> ##gff-version   3
> ##sequence-region   ctg123 1 1497228
> ctg123  .       gene    1000    9000    .       +       .       ID=gene00001;Name=EDEN
> 
> ERROR MESSAGES
> ---------------------
> perl gff3test.pl                                                            [11:11]
> 
> -------------------- WARNING ---------------------
> MSG: [1/5] tried to fetch http://umn.dl.sourceforge.net/sourceforge/song/sofa.definition, but server threw 500.  retrying...
> ---------------------------------------------------
> 
> -------------------- WARNING ---------------------
> MSG: [2/5] tried to fetch http://umn.dl.sourceforge.net/sourceforge/song/sofa.definition, but server threw 500.  retrying...
> ---------------------------------------------------
> 
> -------------------- WARNING ---------------------
> MSG: [3/5] tried to fetch http://umn.dl.sourceforge.net/sourceforge/song/sofa.definition, but server threw 500.  retrying...
> ---------------------------------------------------
> 
> -------------------- WARNING ---------------------
> MSG: [4/5] tried to fetch http://umn.dl.sourceforge.net/sourceforge/song/sofa.definition, but server threw 500.  retrying...
> ---------------------------------------------------
> 
> -------------------- WARNING ---------------------
> MSG: [5/5] tried to fetch http://umn.dl.sourceforge.net/sourceforge/song/sofa.definition, but server threw 500.  retrying...
> ---------------------------------------------------
> 
> ------------- EXCEPTION  -------------
> MSG: failed to fetch http://umn.dl.sourceforge.net/sourceforge/song/sofa.definition, server threw 500
> STACK Bio::Root::IO::_initialize_io /usr/local/share/perl/5.8.7/Bio/Root/IO.pm:276
> STACK Bio::Root::IO::new /usr/local/share/perl/5.8.7/Bio/Root/IO.pm:227
> STACK Bio::OntologyIO::dagflat::defs_url /usr/local/share/perl/5.8.7/Bio/OntologyIO/dagflat.pm:361
> STACK Bio::OntologyIO::dagflat::_initialize /usr/local/share/perl/5.8.7/Bio/OntologyIO/dagflat.pm:188
> STACK Bio::OntologyIO::soflat::_initialize /usr/local/share/perl/5.8.7/Bio/OntologyIO/soflat.pm:145
> STACK Bio::OntologyIO::new /usr/local/share/perl/5.8.7/Bio/OntologyIO.pm:169
> STACK Bio::OntologyIO::new /usr/local/share/perl/5.8.7/Bio/OntologyIO.pm:178
> STACK Bio::Ontology::OntologyStore::get_ontology /usr/local/share/perl/5.8.7/Bio/Ontology/OntologyStore.pm:225
> STACK Bio::FeatureIO::gff::_initialize /usr/local/share/perl/5.8.7/Bio/FeatureIO/gff.pm:110
> STACK Bio::FeatureIO::new /usr/local/share/perl/5.8.7/Bio/FeatureIO.pm:268
> STACK Bio::FeatureIO::new /usr/local/share/perl/5.8.7/Bio/FeatureIO.pm:288
> STACK toplevel gff3test.pl:15
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 

From palmeida at igc.gulbenkian.pt  Tue Jul 19 10:31:05 2005
From: palmeida at igc.gulbenkian.pt (Paulo Almeida)
Date: Tue Jul 19 10:22:00 2005
Subject: [Bioperl-l] Bio-perl and webpages?
In-Reply-To: <1121781520.4064.38.camel@localhost.localdomain>
References: <1121781520.4064.38.camel@localhost.localdomain>
Message-ID: <42DD0EA9.6000306@igc.gulbenkian.pt>

Hi Elena,

If you already have your scripts in Perl it would probably be best to 
use the Perl CGI module, instead of php ( you can find it at 
http://search.cpan.org/dist/CGI.pm/ ). You can adapt the input so it is 
read from a web form and change the print commands to print html.

Incidentally, I'm not sure this is appropriate for the list, but since 
I'm on the subject... I tried to adapt a script to run on the Web; I 
wanted to use Taint mode but I got an error saying that something on the 
Clustal module of BioPerl was using an unsafe variable:

Insecure $ENV{PATH} while running with -T switch at 
/usr/local/share/perl/5.8.4/Bio/Tools/Run/Alignment/Clustalw.pm line 
556, <GEN0> line 2.

I wouldn't mind hardcoding the path of Clustal, but I couldn't figure 
out a way to do it, or to untaint the variable. Can anyone help?

Thanks,
--Paulo

Elena Grassi wrote:

>Hi,
>
>I've got a bunch of scripts (ok, it should be a complete program, but
>that's not the point now...) written in perl (written with other people
>not that much object-oriented) and now I need to make them work through
>a website.
>My first idea is to use a little bit of dirty php, my second one is to
>translate perl in php (I have to admit that I'd prefer not to use this
>idea...), the third one involves bioperl: if I decide to try to re-write
>the scripts with bioperl is there any suitable and fast tool to put them
>into an html based structure?
>
>Sorry for the nearly OT question,
>E.
>  
>

From jeremy_just at netcourrier.com  Tue Jul 19 11:25:32 2005
From: jeremy_just at netcourrier.com (=?ISO-8859-15?Q?J=E9r=E9my?= JUST)
Date: Tue Jul 19 11:16:47 2005
Subject: [Bioperl-l] Bio-perl and webpages?
In-Reply-To: <42DD0EA9.6000306@igc.gulbenkian.pt>
References: <1121781520.4064.38.camel@localhost.localdomain>
	<42DD0EA9.6000306@igc.gulbenkian.pt>
Message-ID: <20050719172532.00007ca9@pearson.infobiogen.fr>

On Tue, 19 Jul 2005 15:31:05 +0100
Paulo Almeida <palmeida@igc.gulbenkian.pt> wrote:

> Insecure $ENV{PATH} while running with -T switch at 
> /usr/local/share/perl/5.8.4/Bio/Tools/Run/Alignment/Clustalw.pm line 
> 556, <GEN0> line 2.
>
> I wouldn't mind hardcoding the path of Clustal, but I couldn't figure 
> out a way to do it, or to untaint the variable. Can anyone help?

  The content of %ENV is considered as unsafe, since it comes from
outside your program.
  One secure way of untainting the PATH is to set it at the beginning of
your code:

$ENV{PATH} = '/bin:/usr/bin:/usr/local/bin' ;


  I think you are bound to hardcode the PATH into your program for it to
be really safe.
  I've seen another solution in the SpamAssassin code: it checks each
element of the PATH to verify that there is no world-writable or
group-writable directories in it.


  See also perldoc perlsec for more details.

-- 
J?r?my JUST  <jeremy_just@netcourrier.com>
From lstein at cshl.edu  Tue Jul 19 12:43:36 2005
From: lstein at cshl.edu (Lincoln Stein)
Date: Tue Jul 19 12:34:29 2005
Subject: [Bioperl-l] Re: [Gmod-gbrowse] Adding human chromosomes as
	reference sequences
In-Reply-To: <ef371bf7766d7e620d68393126809000@helsinki.fi>
References: <ef371bf7766d7e620d68393126809000@helsinki.fi>
Message-ID: <200507191243.37491.lstein@cshl.edu>

Hi,

The bug involving _maxbin() was fixed in the CVS version of bioper some time 
ago. You also get the fix when you install the latest CVS version of GBrowse. 
I'm sorry that the ucsc_genes2gff.pl script isn't loading the chromosome 
extents; We just need a similar script called ucsc_chromosomes2gff.pl or 
something similar. Ilari, since you've already essentially done this, perhaps 
you'd be willing to contribute the script? I'll add it to bioperl.

Thanks for the information about load_ucsc.pl. Although I can't use it, due to 
not having the enum.pm module installed, I did see immediately where the 
problem has arisen and have fixed it in bioperl CVS (hope I didn't break it 
in so doing!)

As of about a week ago the xyplot.pm glyph has been enhanced to accept 
negative scores. You can also colorize the bars and points according to the 
score or other criteria.

lincoln

On Tuesday 19 July 2005 05:57 am, Ilari Scheinin wrote:
> Hello.
>
> I recently installed gbrowse for visualizing the human genome. By
> browsing this list, I found out that the easiest way to import the
> genome data is is to get it from UCSC.
>
> So I downloaded these files from
> ftp://hgdownload.cse.ucsc.edu/goldenPath/hg17/database/:
> chromInfo.txt, kgXref.txt, knownGeneMrna.txt, knownGenePep.txt,
> knownGene.txt, knownToLocusLink.txt, knownToPfam.txt,
> knownToU133Plus2.txt, knownToU133.txt, knownToU95.txt, refLink.txt,
> refSeqSummary.txt
>
> and these from ftp://ftp.ncbi.nlm.nih.gov/refseq/LocusLink/ARCHIVE/:
> log2UG, loc2acc, loc2go
> and also /gene/DATA/gene2accession (renamed to genebank2accessions.txt)
>
> and then ran ucsc_genes2gff.pl (from gmod-0.003) and bp_load_gff.pl with
> % ./ucsc_genes2gff.pl -annotations hg17 | bp_load_gff.pl -c -d
> "dbi:mysql:database=gbrowse;host=<host>" --user <user> -p <pass> -f
> sequencedata/ -
>
> It works fine and loads the data to the database, but it doesn't add
> the reference entries for the chromosomes, so when I try to search for
> chr1 (or just 1) in gbrowse, I get "The landmark named chr1 is not
> recognized.". I tried adding an entry for chr1 directly in mysql and
> gbrowse worked fine with that.
>
> So next I took the file chromInfo.txt which contains the lenghts of the
> chromosomes and edited that into a GFF file. I tried to load it with
> % bp_load_gff.pl -d "dbi:mysql:database=gbrowse;host=<host>" --user
> <user> -p <pass> chromosomes.gff
>
> I get:
> chromosomes.gff: loading...
> Can't locate object method "_maxbin" via package
> "Bio::DB::GFF::Adaptor::dbi::mysqlopt" at
> /usr/lib/perl5/site_perl/5.8.1/Bio/DB/GFF/Adaptor/dbi/mysql.pm line
> 687, <> line 2.
> DBI::db=HASH(0x11f8080)->disconnect invalidates 2 active statement
> handles (either destroy statement handles or call finish on them before
> disconnecting) at
> /usr/lib/perl5/site_perl/5.8.1/Bio/DB/GFF/Adaptor/dbi/caching_handle.pm
> line 228, <> line 2.
>
> I noticed that this is a problem with long features. Chr1 is
> 245,522,847 bp. If I drop the 7 from the end, it works. The default for
> maxfeature is 100,000,000, but adding --maxfeature 1000000000 for
> bp_load_gff.pl doesn't have any effect. As you can see, this is with
> perl 5.8.1, and same thing happens on another machine with 5.8.3.
> Bioperl is 1.5.0. Is the script broken or am I doing something wrong?
>
> I then made a little script that goes through chromInfo.txt and adds
> the chromosomes directly to mysql. I ignored the column fbin, because I
> didn't know what it was for. This seems to work fine, gbrowse is able
> to find the chromosomes. But is there an "official" or better way to
> import the human genome data to gbrowse?
>
> I also tried load_ucsc.pl from bioperl-1.5.0, but it didn't add the
> chromosome entries either. By the way, the script produces an empty GFF
> file for each input file, but everything is written to stdout, so all
> the files remain empty.
>
>
> Also one other thing. Can the score values in GFF be negative? I'm
> using gbrowse to visualize CGH data, but the xyplot doesn't seem to
> work with negative log ratios.
>
>
> Regards,
> Ilari
>
>
>
> -------------------------------------------------------
> SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
> from IBM. Find simple to follow Roadmaps, straightforward articles,
> informative Webcasts and more! Get everything you need to get up to
> speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
> _______________________________________________
> Gmod-gbrowse mailing list
> Gmod-gbrowse@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse@cshl.edu
From Andrew.Mather at dpi.vic.gov.au  Tue Jul 19 00:11:48 2005
From: Andrew.Mather at dpi.vic.gov.au (Andrew.Mather@dpi.vic.gov.au)
Date: Tue Jul 19 13:19:40 2005
Subject: [Bioperl-l] Bioperl-ext, Staden and x86_64
Message-ID: <OF4EFD0CA2.B0ED7AFA-ONCA257043.0015B69B-CA257043.00170D76@nre.vic.gov.au>

Hi Bioperlers

I'm having some problems getting the bioperl-ext to install under RHEL3
Update 5 for Opteron.

I guess it's not strictly a Bioperl problem, but I figure someone here will
have tried this before I and can offer some advice.

The problem seems to be related to the Staden io_lib.  Version 1.8.11
wouldn't compile, as the configure fails since it doesn't appear to
undertsand Opterons.  I looked around and found Verison 1.9.0 on
Sourceforge and this appears to compile cleanly, however it doesn't look
like it's left any .so files in /usr/local/lib  (or anywhere else for that
matter).

>From reading the staden::read makefile, this (and I'm guesing it's this )
causes the make process to fail and I can't build ext.  It leaves the .a
files, but no .so files.  I've copied the Read, os and configure header
files into /usr/local/include, which seems to be a common problem, but this
makes no difference.

Has anyone on the list compiled the staden io_lib on Opteron ?  If so,
pointers to appropriate info/versions etc gratefully received.

Thanks,
Andrew


Animal Genetics and Genomics, PIRVic Attwood
475 Mickleham Road, Attwood, 3049
ph +61 3 92174342
mob  0413 009 761


----------------
There are 10 kinds of people...those who understand binary and those who
don't.


From palmeida at igc.gulbenkian.pt  Tue Jul 19 14:45:31 2005
From: palmeida at igc.gulbenkian.pt (Paulo Almeida)
Date: Tue Jul 19 14:36:27 2005
Subject: [Bioperl-l] Bio-perl and webpages?
In-Reply-To: <20050719172532.00007ca9@pearson.infobiogen.fr>
References: <1121781520.4064.38.camel@localhost.localdomain>
	<42DD0EA9.6000306@igc.gulbenkian.pt>
	<20050719172532.00007ca9@pearson.infobiogen.fr>
Message-ID: <200507191945.31605.palmeida@igc.gulbenkian.pt>

Hey,

I did what you said  and it seems to be working. Thank you very much. I 
changed things in Clustalw.pm back and forth and never thought of trying to 
solve the problem within my script.

-- Paulo

On Tuesday 19 July 2005 16:25, J?r?my JUST wrote:
> On Tue, 19 Jul 2005 15:31:05 +0100
>
> Paulo Almeida <palmeida@igc.gulbenkian.pt> wrote:
> > Insecure $ENV{PATH} while running with -T switch at
> > /usr/local/share/perl/5.8.4/Bio/Tools/Run/Alignment/Clustalw.pm line
> > 556, <GEN0> line 2.
> >
> > I wouldn't mind hardcoding the path of Clustal, but I couldn't figure
> > out a way to do it, or to untaint the variable. Can anyone help?
>
>   The content of %ENV is considered as unsafe, since it comes from
> outside your program.
>   One secure way of untainting the PATH is to set it at the beginning of
> your code:
>
> $ENV{PATH} = '/bin:/usr/bin:/usr/local/bin' ;
>
>
>   I think you are bound to hardcode the PATH into your program for it to
> be really safe.
>   I've seen another solution in the SpamAssassin code: it checks each
> element of the PATH to verify that there is no world-writable or
> group-writable directories in it.
>
>
>   See also perldoc perlsec for more details.

From astew at wam.umd.edu  Tue Jul 19 18:03:19 2005
From: astew at wam.umd.edu (Andrew Stewart)
Date: Tue Jul 19 21:02:52 2005
Subject: [Bioperl-l] error installing bioperl-db
Message-ID: <42DD78A7.5060507@wam.umd.edu>

I'm having a problem while trying to install the bioperl-db modules.  
While trying to run a make test, I get the following error:

t/01dbadaptor.....ok 
1/13                                                   
------------- EXCEPTION  -------------
*MSG: Failed to load module Bio::DB::DBI::postgresql*. Can't locate 
Bio/DB/DBI/postgresql.pm in @INC (@INC contains: t 
/usr/local/bioperl-db/blib/lib /usr/local/bioperl-db/blib/arch 
/sw/lib/perl5/5.8.1/darwin-thread-multi-2level /sw/lib/perl5/5.8.1 
/sw/lib/perl5 /sw/lib/perl5/darwin /Users/astew/usr/lib 
/System/Library/Perl/5.8.1/darwin-thread-multi-2level 
/System/Library/Perl/5.8.1 
/Library/Perl/5.8.1/darwin-thread-multi-2level /Library/Perl/5.8.1 
/Library/Perl /Network/Library/Perl/5.8.1/darwin-thread-multi-2level 
/Network/Library/Perl/5.8.1 /Network/Library/Perl .) at 
/sw/lib/perl5/5.8.1/Bio/Root/Root.pm line 396.

STACK Bio::Root::Root::_load_module /sw/lib/perl5/5.8.1/Bio/Root/Root.pm:398
STACK Bio::DB::SimpleDBContext::dbi 
/usr/local/bioperl-db/blib/lib/Bio/DB/SimpleDBContext.pm:296
STACK Bio::DB::BioSQL::DBAdaptor::new 
/usr/local/bioperl-db/blib/lib/Bio/DB/BioSQL/DBAdaptor.pm:85
STACK Bio::DB::BioDB::new /usr/local/bioperl-db/blib/lib/Bio/DB/BioDB.pm:203
STACK DBTestHarness::get_DBAdaptor t/DBTestHarness.pm:257
STACK DBTestHarness::get_DBContext t/DBTestHarness.pm:272
STACK toplevel t/01dbadaptor.t:23

--------------------------------------
t/01dbadaptor.....dubious                                                    

        Test returned status 2 (wstat 512, 0x200)
DIED. FAILED tests 2-13
        Failed 12/13 tests, 7.69% okay


Where am I supposed to find Bio::DB::DBI::postgresql ?


-Andrew Stewart
From hlapp at gmx.net  Wed Jul 20 04:50:32 2005
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed Jul 20 04:48:20 2005
Subject: [Bioperl-l] error installing bioperl-db
In-Reply-To: <42DD78A7.5060507@wam.umd.edu>
References: <42DD78A7.5060507@wam.umd.edu>
Message-ID: <00afefecc06d2d2bb22f5be09fb4410a@gmx.net>

Did you configure the name of your driver to be 'postgresql'? Is this a 
new DBD driver for PostgreSQL? The DBD driver used to be Pg (DBD::Pg), 
so bioperl-db just uses the same convention or name.

I.e., change postgresql to Pg in your configuration 
(t/DBHarness.biosql.conf), unless the DBD driver you are using is 
indeed DBD::postgresql.

If the DBD driver you are using is indeed DBD::postgresql and not 
DBD::Pg then copy Bio/DB/DBI/Pg.pm to Bio/DB/DBI/postgresql.pm and 
rename (or copy) the directory Bio/DB/BioSQL/Pg to 
Bio/DB/BioSQL/postgresql.

Hth,

	-hilmar

On Jul 19, 2005, at 3:03 PM, Andrew Stewart wrote:

> I'm having a problem while trying to install the bioperl-db modules.  
> While trying to run a make test, I get the following error:
>
> t/01dbadaptor.....ok 1/13                                              
>      ------------- EXCEPTION  -------------
> *MSG: Failed to load module Bio::DB::DBI::postgresql*. Can't locate 
> Bio/DB/DBI/postgresql.pm in @INC (@INC contains: t 
> /usr/local/bioperl-db/blib/lib /usr/local/bioperl-db/blib/arch 
> /sw/lib/perl5/5.8.1/darwin-thread-multi-2level /sw/lib/perl5/5.8.1 
> /sw/lib/perl5 /sw/lib/perl5/darwin /Users/astew/usr/lib 
> /System/Library/Perl/5.8.1/darwin-thread-multi-2level 
> /System/Library/Perl/5.8.1 
> /Library/Perl/5.8.1/darwin-thread-multi-2level /Library/Perl/5.8.1 
> /Library/Perl /Network/Library/Perl/5.8.1/darwin-thread-multi-2level 
> /Network/Library/Perl/5.8.1 /Network/Library/Perl .) at 
> /sw/lib/perl5/5.8.1/Bio/Root/Root.pm line 396.
>
> STACK Bio::Root::Root::_load_module 
> /sw/lib/perl5/5.8.1/Bio/Root/Root.pm:398
> STACK Bio::DB::SimpleDBContext::dbi 
> /usr/local/bioperl-db/blib/lib/Bio/DB/SimpleDBContext.pm:296
> STACK Bio::DB::BioSQL::DBAdaptor::new 
> /usr/local/bioperl-db/blib/lib/Bio/DB/BioSQL/DBAdaptor.pm:85
> STACK Bio::DB::BioDB::new 
> /usr/local/bioperl-db/blib/lib/Bio/DB/BioDB.pm:203
> STACK DBTestHarness::get_DBAdaptor t/DBTestHarness.pm:257
> STACK DBTestHarness::get_DBContext t/DBTestHarness.pm:272
> STACK toplevel t/01dbadaptor.t:23
>
> --------------------------------------
> t/01dbadaptor.....dubious
>        Test returned status 2 (wstat 512, 0x200)
> DIED. FAILED tests 2-13
>        Failed 12/13 tests, 7.69% okay
>
>
> Where am I supposed to find Bio::DB::DBI::postgresql ?
>
>
> -Andrew Stewart
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From hlapp at gmx.net  Wed Jul 20 12:33:50 2005
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed Jul 20 12:27:29 2005
Subject: [Bioperl-l] error installing bioperl-db
In-Reply-To: <42DE7912.4020300@wam.umd.edu>
References: <42DD78A7.5060507@wam.umd.edu>
	<00afefecc06d2d2bb22f5be09fb4410a@gmx.net>
	<42DE7912.4020300@wam.umd.edu>
Message-ID: <9e25eb983a2480c4493d3f0422ad0365@gmx.net>

First off, the bioperl-microarray mailing list has little to do with  
this topic. The appropriate list is bioperl-l to which you posted  
first. You can find the page to subscribe at www.bioperl.org.

As for your error report, seeing an error in the tests is usually not a  
good sign. You can force an install but there's likely a problem that  
needs to be fixed. Here, in Postgresql the first failed statement  
invalidates the entire transaction and no other SQL command can succeed  
until the transaction is rolled back. To deal with this the Postgresql  
version of the schema defines 'rules' that do lookups to prevent unique  
key clashes.

One of the statements either failed unexpectedly, or it failed when it  
should have been caught by one of the rules.

Which version of Postgresql are you using? Did you download the schema  
from CVS, and were there any errors when you instantiated it?

I'll need to replicate the error before I can judge further what's  
going on.

	-hilmar

On Jul 20, 2005, at 9:17 AM, Andrew Stewart wrote:

> Doh, that was a sloppy overlook on my part.  Thanks for pointing it  
> out.
>
> make test now reports 97% ok with the following error:
> t/03simpleseq.....NOK 33Use of uninitialized value in join or string  
> at /usr/local/bioperl-db/blib/lib/Bio/DB/BioSQL/BaseDriver.pm line  
> 1845.
>
> -------------------- WARNING ---------------------
> MSG: update in Bio::DB::BioSQL::PrimarySeqAdaptor (driver) failed,  
> values were ("NM_003319","","NM_003319","Homo sapiens titin (TTN),  
> transcript variant N2-B, mRNA","3") FKs (2)
> ERROR:  current transaction is aborted, commands ignored until end of  
> transaction block
>
> ---------------------------------------------------
> Use of uninitialized value in join or string at  
> /usr/local/bioperl-db/blib/lib/Bio/DB/BioSQL/BaseDriver.pm line 1845.
> Use of uninitialized value in join or string at  
> /usr/local/bioperl-db/blib/lib/Bio/DB/BioSQL/BaseDriver.pm line 1845.
> Use of uninitialized value in join or string at  
> /usr/local/bioperl-db/blib/lib/Bio/DB/BioSQL/BaseDriver.pm line 1845.
>
> -------------------- WARNING ---------------------
> MSG: insert in Bio::DB::BioSQL::BiosequenceAdaptor (driver) failed,  
> values were ("","82027","dna","","") FKs (2)
> ERROR:  current transaction is aborted, commands ignored until end of  
> transaction block
>
> ---------------------------------------------------
> t/03simpleseq.....ok 34/59                                              
>      ------------- EXCEPTION  -------------
> MSG: error while executing statement in  
> Bio::DB::BioSQL::PrimarySeqAdaptor::find_by_unique_key: ERROR:   
> current transaction is aborted, commands ignored until end of  
> transaction block
>
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key  
> /usr/local/bioperl-db/blib/lib/Bio/DB/BioSQL/ 
> BasePersistenceAdaptor.pm:951
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key  
> /usr/local/bioperl-db/blib/lib/Bio/DB/BioSQL/ 
> BasePersistenceAdaptor.pm:855
> STACK (eval) t/03simpleseq.t:112
> STACK toplevel t/03simpleseq.t:66
>
> --------------------------------------
> t/03simpleseq.....FAILED tests 32-33, 35, 38-59                         
>             Failed 25/59 tests, 57.63% okay
>
>
> Is this an error I should be worried about or should I go ahead and  
> force make install?
>
>
> Thanks for the help.  Could I be added to this listserv by the way?
>
> -Andrew Stewart
> US Navy BDRD
>
>
>
>
> Hilmar Lapp wrote:
>
>> Did you configure the name of your driver to be 'postgresql'? Is this  
>> a new DBD driver for PostgreSQL? The DBD driver used to be Pg  
>> (DBD::Pg), so bioperl-db just uses the same convention or name.
>>
>> I.e., change postgresql to Pg in your configuration  
>> (t/DBHarness.biosql.conf), unless the DBD driver you are using is  
>> indeed DBD::postgresql.
>>
>> If the DBD driver you are using is indeed DBD::postgresql and not  
>> DBD::Pg then copy Bio/DB/DBI/Pg.pm to Bio/DB/DBI/postgresql.pm and  
>> rename (or copy) the directory Bio/DB/BioSQL/Pg to  
>> Bio/DB/BioSQL/postgresql.
>>
>> Hth,
>>
>>     -hilmar
>>
>> On Jul 19, 2005, at 3:03 PM, Andrew Stewart wrote:
>>
>>> I'm having a problem while trying to install the bioperl-db modules.  
>>>  While trying to run a make test, I get the following error:
>>>
>>> t/01dbadaptor.....ok 1/13                                             
>>>        ------------- EXCEPTION  -------------
>>> *MSG: Failed to load module Bio::DB::DBI::postgresql*. Can't locate  
>>> Bio/DB/DBI/postgresql.pm in @INC (@INC contains: t  
>>> /usr/local/bioperl-db/blib/lib /usr/local/bioperl-db/blib/arch  
>>> /sw/lib/perl5/5.8.1/darwin-thread-multi-2level /sw/lib/perl5/5.8.1  
>>> /sw/lib/perl5 /sw/lib/perl5/darwin /Users/astew/usr/lib  
>>> /System/Library/Perl/5.8.1/darwin-thread-multi-2level  
>>> /System/Library/Perl/5.8.1  
>>> /Library/Perl/5.8.1/darwin-thread-multi-2level /Library/Perl/5.8.1  
>>> /Library/Perl /Network/Library/Perl/5.8.1/darwin-thread-multi-2level  
>>> /Network/Library/Perl/5.8.1 /Network/Library/Perl .) at  
>>> /sw/lib/perl5/5.8.1/Bio/Root/Root.pm line 396.
>>>
>>> STACK Bio::Root::Root::_load_module  
>>> /sw/lib/perl5/5.8.1/Bio/Root/Root.pm:398
>>> STACK Bio::DB::SimpleDBContext::dbi  
>>> /usr/local/bioperl-db/blib/lib/Bio/DB/SimpleDBContext.pm:296
>>> STACK Bio::DB::BioSQL::DBAdaptor::new  
>>> /usr/local/bioperl-db/blib/lib/Bio/DB/BioSQL/DBAdaptor.pm:85
>>> STACK Bio::DB::BioDB::new  
>>> /usr/local/bioperl-db/blib/lib/Bio/DB/BioDB.pm:203
>>> STACK DBTestHarness::get_DBAdaptor t/DBTestHarness.pm:257
>>> STACK DBTestHarness::get_DBContext t/DBTestHarness.pm:272
>>> STACK toplevel t/01dbadaptor.t:23
>>>
>>> --------------------------------------
>>> t/01dbadaptor.....dubious
>>>        Test returned status 2 (wstat 512, 0x200)
>>> DIED. FAILED tests 2-13
>>>        Failed 12/13 tests, 7.69% okay
>>>
>>>
>>> Where am I supposed to find Bio::DB::DBI::postgresql ?
>>>
>>>
>>> -Andrew Stewart
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l@portal.open-bio.org
>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From lstein at cshl.edu  Wed Jul 20 12:39:23 2005
From: lstein at cshl.edu (Lincoln Stein)
Date: Wed Jul 20 12:30:56 2005
Subject: [Bioperl-l] Re: [Gmod-gbrowse] Adding human chromosomes as
	reference sequences
In-Reply-To: <42DE0A24.6030705@molecular-sciences.ox.ac.uk>
References: <ef371bf7766d7e620d68393126809000@helsinki.fi>
	<200507191243.37491.lstein@cshl.edu>
	<42DE0A24.6030705@molecular-sciences.ox.ac.uk>
Message-ID: <200507201239.24760.lstein@cshl.edu>

These changes were all in bioperl itself, so you don't have to update gbrowse.

Lincoln

On Wednesday 20 July 2005 04:24 am, Steve Taylor wrote:
> Hi,
>
> > As of about a week ago the xyplot.pm glyph has been enhanced to accept
> > negative scores. You can also colorize the bars and points according to
> > the score or other criteria.
>
> That's great! Is it best to do a full CVS update of bioperl and gbrowse
> (1_62-bugfixes branch) or will just updating bioperl suffice to get these
> features?
>
> Thanks and Regards,
>
> Steve
>
>
> -------------------------------------------------------
> SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
> from IBM. Find simple to follow Roadmaps, straightforward articles,
> informative Webcasts and more! Get everything you need to get up to
> speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
> _______________________________________________
> Gmod-gbrowse mailing list
> Gmod-gbrowse@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse@cshl.edu
From Kary at ioc.fiocruz.br  Wed Jul 20 12:27:03 2005
From: Kary at ioc.fiocruz.br (Kary Ann Del Carmen Soriano Ocana)
Date: Wed Jul 20 13:09:15 2005
Subject: [Bioperl-l] Add a new parameters for Hmmpfam
Message-ID: <29AC1A3F62AAF54BA71E367C6D62CEB096C2A0@alpha.ioc.fiocruz.br>

Dear Marc Logghe:

I have a script in perl for run hmmpfam (add following), I would like add other parameter in my @params, because when I run by shell with this "expert options command --forward" (Viterbi algorithm), it returns me much more hits in the result:

1.- SHELL COMMAND
my $factory = system("hmmpfam -E 0.1 --forward modelos_hmmer_alignm.hmm $seq >results/hmmer_alignm.out")

2.- PERL SCRIPT
#!/usr/bin/perl -w

$ENV{HMMPFAMDIR} = '/usr/local/bin/';
use lib "/usr/local/bioperl14";
use lib "/usr/local/bioperl-run-1.4";

use strict;
use Bio::Tools::Run::Hmmpfam;
use Bio::SearchIO;
use Bio::SearchIO::Writer::HTMLResultWriter;
use Bio::SearchIO::Writer::TextResultWriter;
use Bio::SearchIO::Writer::HSPTableWriter;
use Bio::SearchIO::Writer::ResultTableWriter;
use Bio::SeqIO;

my @params = ('DB' => 'modelos_hmmer_tcoffee.hmm', 'E' => 0.1);
my $factory = Bio::Tools::Run::Hmmpfam->new(@params);
my $seq = $ARGV[0];
   
#any old protein fasta file
my $search = $factory->run($seq);

my $writer = Bio::SearchIO::Writer::HSPTableWriter -> new(
							-columns => [qw(
								 hit_name
								 query_name
								 score
								 expect
								 start_hit
								 end_hit
								 start_query
								 end_query
								  
								 )]
							 );

my $out = Bio::SearchIO->new( -writer => $writer,
			      -file   => ">results/searchio_tcoffee.out" );
 
while (my $result = $search->next_result()) {
	$out->write_result($result);
}


Thank you very much for help me.

Your faithfully

Kary Soriano


From smarkel at scitegic.com  Wed Jul 20 17:19:28 2005
From: smarkel at scitegic.com (Scott Markel)
Date: Wed Jul 20 17:13:28 2005
Subject: [Bioperl-l] HTTP response size check in Bio::Tools::Run::RemoteBlast
Message-ID: <42DEBFE0.1080209@scitegic.com>

Sometime last week NCBI made a change to the HTTP response
for remote BLAST requests.  Based on when my regressions
started to fail, I think it was on the 14th.

The if( $size > 1000 ) check in retrieve_blast() now passes
when it shouldn't, meaning that intermediate pages are assumed
to be final results.  I'm now seeing response sizes of just
under 2000 for the intermediate pages.  A customer of mine is
getting about the same.

If this check is changed to 2000, then we're back in business.
We can't make the number too big or we'll start missing small
result sets.  A request for a single BLASTp hit gives me a
result size of about 3400.

Has anyone else seen this problem?  Is this a reasonable fix
to propose?  I'm a little concerned that whatever the number
is, it's very susceptible to changes at NCBI.

Scott

-- 
Scott Markel, Ph.D.
Principal Bioinformatics Architect  email:  smarkel@scitegic.com
SciTegic Inc.                       mobile: +1 858 205 3653
9665 Chesapeake Drive, Suite 401    voice:  +1 858 279 8800, ext. 253
San Diego, CA 92123                 fax:    +1 858 279 8804
USA                                 web:    http://www.scitegic.com


From jason.stajich at duke.edu  Wed Jul 20 21:23:35 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Wed Jul 20 21:15:00 2005
Subject: [Bioperl-l] HTTP response size check in
	Bio::Tools::Run::RemoteBlast
In-Reply-To: <42DEBFE0.1080209@scitegic.com>
References: <42DEBFE0.1080209@scitegic.com>
Message-ID: <013A3AB6-52F7-4813-A220-6F2A0F57B92F@duke.edu>

I got a email from Guido to the same effect.  Guido - best to post to  
the mailing list in the future so I am not the bottleneck.

I just haven't had time to actually make the changes.

Really need someone else to maintain this module to be honest.

Anyways, any ways to make the module more robust to NCBI changes  
would be appreciated - it really started as a simple hack - I don't  
know if it needs to mirror more closely the example code that NCBI  
provides for submitting remote blasts.

-jason
On Jul 20, 2005, at 5:19 PM, Scott Markel wrote:

> Sometime last week NCBI made a change to the HTTP response
> for remote BLAST requests.  Based on when my regressions
> started to fail, I think it was on the 14th.
>
> The if( $size > 1000 ) check in retrieve_blast() now passes
> when it shouldn't, meaning that intermediate pages are assumed
> to be final results.  I'm now seeing response sizes of just
> under 2000 for the intermediate pages.  A customer of mine is
> getting about the same.
>
> If this check is changed to 2000, then we're back in business.
> We can't make the number too big or we'll start missing small
> result sets.  A request for a single BLASTp hit gives me a
> result size of about 3400.
>
> Has anyone else seen this problem?  Is this a reasonable fix
> to propose?  I'm a little concerned that whatever the number
> is, it's very susceptible to changes at NCBI.
>
> Scott
>
> -- 
> Scott Markel, Ph.D.
> Principal Bioinformatics Architect  email:  smarkel@scitegic.com
> SciTegic Inc.                       mobile: +1 858 205 3653
> 9665 Chesapeake Drive, Suite 401    voice:  +1 858 279 8800, ext. 253
> San Diego, CA 92123                 fax:    +1 858 279 8804
> USA                                 web:    http://www.scitegic.com
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12/


From ferdinand.marletaz at gmail.com  Thu Jul 21 04:49:31 2005
From: ferdinand.marletaz at gmail.com (=?ISO-8859-1?Q?Ferdinand_Marl=E9taz?=)
Date: Thu Jul 21 04:41:08 2005
Subject: [Bioperl-l] Blast : Bus Error
Message-ID: <7c7aa474050721014961ce6a6f@mail.gmail.com>

Hi,

I know my current problem is only farly related with bioperl but maybe
omebody would have already encountered it so, it can be tryed...

I try to run blast (tblastx) on a G5 powermac computer (OS : OS 10.4
Tiger but the same was happening with 10.3 Panther), it starts perfect
normal but after sometimes, it stops and displays either 'bus error'
or 'segmentation fault'... I'm quite surprised because I've never got
this problem on a second identical G5 in my lab ? I've try to change
blast version from 2.10 to 2.11... but it don't solved the problem.
I verify that it's not related to my databases in reformating them
from fasta...

So, I don't see where the problem can come from ? Does anybody have
encountered such problems or erros and have a solution or an idea
because I'd like to avoid reinstalling the system on this machine
cause loss of time...

Thanks a lot

Cheers

Ferdi

______________________
Ferdinand Marl?taz
Evolution and Phylogeny of Metazoans 
UMR 6540 DIMAR CNRS
Station Marine d'Endoume
Rue Batterie-des-Lions
13007 MARSEILLE
Tel. 33(0)4 91 04 16 54
Port. 33(0)6 30 35 58 49
e-mail. Ferdinand.Marletaz@ens-lyon.fr

From l.douchy at gmail.com  Thu Jul 21 05:12:26 2005
From: l.douchy at gmail.com (Laurent DOUCHY)
Date: Thu Jul 21 05:04:43 2005
Subject: [Bioperl-l] Blast : Bus Error
In-Reply-To: <7c7aa474050721014961ce6a6f@mail.gmail.com>
References: <7c7aa474050721014961ce6a6f@mail.gmail.com>
Message-ID: <2fb209dd0507210212672ea750@mail.gmail.com>

Hello,
This problem can happen for several reasons :
your ram is not sufficiant and /or  you are working against a db like
nt too big for the combination PPC/blast/db; First verify your ram
(500Mo are not enougth) , secondly try to work when you can on a part
of nt ; try to  check the blast optimised by the Bioteam...
Cordially

LN 

2005/7/21, Ferdinand Marl?taz <ferdinand.marletaz@gmail.com>:
> Hi,
> 
> I know my current problem is only farly related with bioperl but maybe
> omebody would have already encountered it so, it can be tryed...
> 
> I try to run blast (tblastx) on a G5 powermac computer (OS : OS 10.4
> Tiger but the same was happening with 10.3 Panther), it starts perfect
> normal but after sometimes, it stops and displays either 'bus error'
> or 'segmentation fault'... I'm quite surprised because I've never got
> this problem on a second identical G5 in my lab ? I've try to change
> blast version from 2.10 to 2.11... but it don't solved the problem.
> I verify that it's not related to my databases in reformating them
> from fasta...
> 
> So, I don't see where the problem can come from ? Does anybody have
> encountered such problems or erros and have a solution or an idea
> because I'd like to avoid reinstalling the system on this machine
> cause loss of time...
> 
> Thanks a lot
> 
> Cheers
> 
> Ferdi
> 
> ______________________
> Ferdinand Marl?taz
> Evolution and Phylogeny of Metazoans
> UMR 6540 DIMAR CNRS
> Station Marine d'Endoume
> Rue Batterie-des-Lions
> 13007 MARSEILLE
> Tel. 33(0)4 91 04 16 54
> Port. 33(0)6 30 35 58 49
> e-mail. Ferdinand.Marletaz@ens-lyon.fr
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

From ferdinand.marletaz at gmail.com  Thu Jul 21 05:58:27 2005
From: ferdinand.marletaz at gmail.com (=?ISO-8859-1?Q?Ferdinand_Marl=E9taz?=)
Date: Thu Jul 21 05:49:43 2005
Subject: [Bioperl-l] Blast : Bus Error
In-Reply-To: <20050721094219.GA14638@ebi.ac.uk>
References: <7c7aa474050721014961ce6a6f@mail.gmail.com>
	<2fb209dd0507210212672ea750@mail.gmail.com>
	<20050721094219.GA14638@ebi.ac.uk>
Message-ID: <7c7aa474050721025839062a98@mail.gmail.com>

Well, I excclude memory problems (2 GB RAM on these machines) and
Database SIze problems (The error happens both with large and little
like 50 Mo DB). On top of that, I've already perform on the two
computers identical blast searches and the other computer runs very
well...
I don't think about Hardware problems too because this bugging
computer have led similar searches in the past without problem... So,
something could happened in the configuration what makes the blast
process faulty !  I just know that somebody have try to install linux
on this computer and don't manage to finish this installation. Maybe a
source of my current problems ?

What do you all think about that ?

Thanks 

Ferdi


2005/7/21, Andreas Kahari <ak@ebi.ac.uk>:
> [not to the list]
> 
> Hi guys,
> 
> There could also be a problem with a faulty memory module...  If
> the error is not consistently reproducible, then this is one
> possible cause.
> 
> Running out of memory should not produce a Bus Error.  It might
> produce a Segmentation Fault if the program doesn't care that
> the memory allocation failed, but not a Bus Error (as far as I
> know, but I don't run OS X here).
> 
> A way to diagnose this is to run exactly the same set-up on two
> identical machines until one of them causes the error more than
> once.  If the other machine seems to run ok then it is very
> possible that there is a hardware fault on the first machine (or
> some important system configuration setting is different without
> you knowing it).
> 
> Regards,
> Andreas
> 
> On Thu, Jul 21, 2005 at 11:12:26AM +0200, Laurent DOUCHY wrote:
> > Hello,
> > This problem can happen for several reasons :
> > your ram is not sufficiant and /or  you are working against a db like
> > nt too big for the combination PPC/blast/db; First verify your ram
> > (500Mo are not enougth) , secondly try to work when you can on a part
> > of nt ; try to  check the blast optimised by the Bioteam...
> > Cordially
> >
> > LN
> >
> > 2005/7/21, Ferdinand Marl?taz <ferdinand.marletaz@gmail.com>:
> > > Hi,
> > >
> > > I know my current problem is only farly related with bioperl but maybe
> > > omebody would have already encountered it so, it can be tryed...
> > >
> > > I try to run blast (tblastx) on a G5 powermac computer (OS : OS 10.4
> > > Tiger but the same was happening with 10.3 Panther), it starts perfect
> > > normal but after sometimes, it stops and displays either 'bus error'
> > > or 'segmentation fault'... I'm quite surprised because I've never got
> > > this problem on a second identical G5 in my lab ? I've try to change
> > > blast version from 2.10 to 2.11... but it don't solved the problem.
> > > I verify that it's not related to my databases in reformating them
> > > from fasta...
> > >
> > > So, I don't see where the problem can come from ? Does anybody have
> > > encountered such problems or erros and have a solution or an idea
> > > because I'd like to avoid reinstalling the system on this machine
> > > cause loss of time...
> [cut]
> 
> --
> Andreas K?h?ri
> 
> EMBL-EBI/ensembl
> www.ensembl.org
> 
> 1024D/C2E163CB F4C4 A41A 665B 448A 3FA9  6AEA 12E3 39DA C2E1 63CB
>

From johan.viklund at gmail.com  Thu Jul 21 08:10:20 2005
From: johan.viklund at gmail.com (Johan Viklund)
Date: Thu Jul 21 08:02:25 2005
Subject: [Bioperl-l] bioperl-db: exporting data
In-Reply-To: <eedb6cb2613fe06259b294a066e2d81d@gmx.net>
References: <5e924f0a05070508012bbb63d3@mail.gmail.com>
	<eedb6cb2613fe06259b294a066e2d81d@gmx.net>
Message-ID: <5e924f0a05072105101b55e307@mail.gmail.com>

Thanks for the help, it works now (it was a small programming error).


(sending this so someone else in a similar predicament can find help)
On 7/6/05, Hilmar Lapp <hlapp@gmx.net> wrote:
> The way you're describing doesn't sound too far off. The rank is an
> ordering index as well as a component of the unique key constraint,
> i.e.,  you can't have two seqfeature qualifier values for the same
> feature and tag name unless the rank is different.
> 
> Have you convinced yourself that you con log in to the database and
> retrieve those additions by hand (using SQL)?
> 
> Can you reduce this to a test case where you load a single sequence
> record, then issue SQL to add your custom annotation, and then retrieve
> the record again. Email me the entry you loaded, the SQL statements you
> issued, and the entry you got out.
> 
>          -hilmar
> 
> On Jul 5, 2005, at 8:01 AM, Johan Viklund wrote:
> 
> > Hi
> >
> > I'm trying to add COG annotations from Entrez Gene to sequences (from
> > refseq in genbank format) I have in a biosql database (on mysql). The
> > problem is I can't get them out again with the bioentry2flat.pl script
> > (the bioentries appears without what i've added).
> >
> > I don't use bioperl for this (i've got ~40000 COG annotations (linked
> > to GeneIDs)). Instead I add it to the seqfeature_qualifer_value table
> > similar to the way GeneID:s are represented (as far as i've figured),
> > with term_id corresponding to db_xref, the same seqfeature_id as the
> > GeneID had and rank i've tried a few different variations but none
> > seem to work (the first free that's larger than GeneID and 1).
> >
> > How should I add this annotation to the database so it gets exported
> > when I use bioperl?
> >
> > I've also got another question: What is rank for?
> >
> > --
> > Johan Viklund
> > E-mail: <johan.viklund.0705@student.uu.se>
> >         <johan.viklund@gmail.com>
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> --
> -------------------------------------------------------------
> Hilmar Lapp                            email: lapp at gnf.org
> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> -------------------------------------------------------------
> 
> 
> 


-- 
Johan Viklund
E-post: <johan.viklund@gmail.com>

From johan.viklund at gmail.com  Thu Jul 21 08:18:35 2005
From: johan.viklund at gmail.com (Johan Viklund)
Date: Thu Jul 21 08:23:50 2005
Subject: [Bioperl-l] bioperl-db: Searcing
Message-ID: <5e924f0a05072105187840349f@mail.gmail.com>

Hello again,

I've got new bioperl-db problem:

This is my context:

I've got a number of sequences in the databases (complete genomes from
refseq). I want to be able to find all the db_xrefs for a feature when
i've got GeneID or GI for that feature (prefarably this should be
returned as a Bio::SeqFeatureI compliant object).

If this isn't [currently] possible, how do I get a Bio::SeqFeatureI
object from the database?

For the record, I can do this with sql-queries and dbi, I want to know
if there's a bioperl way.
-- 
Johan Viklund
E-post: <johan.viklund@gmail.com>
-----------------
perl -we '$,=" ";$_=bless sub{shift;print
split(/::/,ref)},Just::Another::Perl::Hacker;&$_'

From johan.viklund at gmail.com  Thu Jul 21 08:18:35 2005
From: johan.viklund at gmail.com (Johan Viklund)
Date: Thu Jul 21 08:37:45 2005
Subject: [Bioperl-l] bioperl-db: Searcing
Message-ID: <5e924f0a05072105187840349f@mail.gmail.com>

Hello again,

I've got new bioperl-db problem:

This is my context:

I've got a number of sequences in the databases (complete genomes from
refseq). I want to be able to find all the db_xrefs for a feature when
i've got GeneID or GI for that feature (prefarably this should be
returned as a Bio::SeqFeatureI compliant object).

If this isn't [currently] possible, how do I get a Bio::SeqFeatureI
object from the database?

For the record, I can do this with sql-queries and dbi, I want to know
if there's a bioperl way.
-- 
Johan Viklund
E-post: <johan.viklund@gmail.com>
-----------------
perl -we '$,=" ";$_=bless sub{shift;print
split(/::/,ref)},Just::Another::Perl::Hacker;&$_'

From amackey at pcbi.upenn.edu  Thu Jul 21 12:09:37 2005
From: amackey at pcbi.upenn.edu (Aaron J. Mackey)
Date: Thu Jul 21 12:00:30 2005
Subject: [Bioperl-l] HTTP response size check in
	Bio::Tools::Run::RemoteBlast
In-Reply-To: <013A3AB6-52F7-4813-A220-6F2A0F57B92F@duke.edu>
References: <42DEBFE0.1080209@scitegic.com>
	<013A3AB6-52F7-4813-A220-6F2A0F57B92F@duke.edu>
Message-ID: <3E91A072-C504-4595-AE8F-39F686F21EC5@pcbi.upenn.edu>

This is what I do to distinguish intermediate pages from , and it  
seems to be stable (at least so far):

   if ($html =~ m/Status=WAITING/iso)

-Aaron

On Jul 20, 2005, at 9:23 PM, Jason Stajich wrote:

> I got a email from Guido to the same effect.  Guido - best to post  
> to the mailing list in the future so I am not the bottleneck.
>
> I just haven't had time to actually make the changes.
>
> Really need someone else to maintain this module to be honest.
>
> Anyways, any ways to make the module more robust to NCBI changes  
> would be appreciated - it really started as a simple hack - I don't  
> know if it needs to mirror more closely the example code that NCBI  
> provides for submitting remote blasts.
>
> -jason
> On Jul 20, 2005, at 5:19 PM, Scott Markel wrote:
>
>
>> Sometime last week NCBI made a change to the HTTP response
>> for remote BLAST requests.  Based on when my regressions
>> started to fail, I think it was on the 14th.
>>
>> The if( $size > 1000 ) check in retrieve_blast() now passes
>> when it shouldn't, meaning that intermediate pages are assumed
>> to be final results.  I'm now seeing response sizes of just
>> under 2000 for the intermediate pages.  A customer of mine is
>> getting about the same.
>>
>> If this check is changed to 2000, then we're back in business.
>> We can't make the number too big or we'll start missing small
>> result sets.  A request for a single BLASTp hit gives me a
>> result size of about 3400.
>>
>> Has anyone else seen this problem?  Is this a reasonable fix
>> to propose?  I'm a little concerned that whatever the number
>> is, it's very susceptible to changes at NCBI.
>>
>> Scott
>>
>> -- 
>> Scott Markel, Ph.D.
>> Principal Bioinformatics Architect  email:  smarkel@scitegic.com
>> SciTegic Inc.                       mobile: +1 858 205 3653
>> 9665 Chesapeake Drive, Suite 401    voice:  +1 858 279 8800, ext. 253
>> San Diego, CA 92123                 fax:    +1 858 279 8804
>> USA                                 web:    http://www.scitegic.com
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l@portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12/
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

From sdshlxh at gmail.com  Thu Jul 21 12:49:37 2005
From: sdshlxh at gmail.com (Ping Yao)
Date: Thu Jul 21 12:41:52 2005
Subject: [Bioperl-l] Re: Bioperl-l Digest, Vol 27, Issue 8
In-Reply-To: <200507210909.j6L98gTw018162@portal.open-bio.org>
References: <200507210909.j6L98gTw018162@portal.open-bio.org>
Message-ID: <e99f98a705072109499019b72@mail.gmail.com>

Hi group :
 I want to download genes from genbank and put them in my local database 
MySQL.
 Now what I can do is to download into different files .
 So who can help me put them into MySQL ?
 Or does anyone have the code for it and let me try ?
 Ping

From palmeida at igc.gulbenkian.pt  Thu Jul 21 13:16:38 2005
From: palmeida at igc.gulbenkian.pt (Paulo Almeida)
Date: Thu Jul 21 13:07:32 2005
Subject: [Bioperl-l] Re: Bioperl-l Digest, Vol 27, Issue 8
In-Reply-To: <e99f98a705072109499019b72@mail.gmail.com>
References: <200507210909.j6L98gTw018162@portal.open-bio.org>
	<e99f98a705072109499019b72@mail.gmail.com>
Message-ID: <200507211816.39202.palmeida@igc.gulbenkian.pt>

Hi Ping,

Are you familiar with DBD-mysql ? If not, check it out on 
http://search.cpan.org/dist/DBD-mysql/

On Thursday 21 July 2005 17:49, Ping Yao wrote:
> Hi group :
>  I want to download genes from genbank and put them in my local database
> MySQL.
>  Now what I can do is to download into different files .
>  So who can help me put them into MySQL ?
>  Or does anyone have the code for it and let me try ?
>  Ping
>

-- 
Paulo Almeida
Tel: +351 21 4464635, Fax: +351 21 4407970
Instituto Gulbenkian de Ci?ncia
Rua da Quinta Grande, 6
P-2780-156 Oeiras
Portugal
http://www.igc.gulbenkian.pt

From chiromatzo at gmail.com  Thu Jul 21 14:15:55 2005
From: chiromatzo at gmail.com (Alynne Chiromatzo)
Date: Thu Jul 21 14:06:35 2005
Subject: [Bioperl-l] $hsp->seq_inds and axt file
Message-ID: <5865004505072111156b10d5bd@mail.gmail.com>

Hi!

I'm having trouble in finding the hsp->seq_inds in the axt file(whole
genome alignment from UCSC Genome Browser). The code is below and a
sample of the input file. It doens't show the sequence that it suppose
to contain. Anyone can help me?

Thanks very much!

Alynne Oya.

#! /usr/bin/perl

use Bio::SearchIO;

 my $parser = new Bio::SearchIO(-format => 'axt',
                                 -file   => '/work/project/align/testeaxt');
 while( my $result = $parser->next_result ) {
   while( my $hit = $result->next_hit ) {
     while( my $hsp = $hit->next_hsp) {
         print "Hank: ".$hsp->rank." Strand : ".$hsp->strand('hit')."\n";
         print "Query Name: ".$result->query_name." Hit Name: ".$hit->name."\n";
         ($query_beg, $query_end) = $hsp->range('query');#encontra os
valores de inicio-final, mas soh q somados de 1
         ($hit_beg,$hit_end) = $hsp->range('hit');
         print "Range: ".($query_beg-1)."-".($query_end-1)."
".($hit_beg-1)."-".($hit_end-1)."\n";
         print $hsp->query_string."\n".$hsp->hit_string."\n";
         @h_ind = $hsp->seq_inds('query', 'identical', 1);

         #Here doesn't apper the index sequence like it suppose to show
         foreach (@h_ind){    
            print "==> ".$_." ";
         }
         print "\n";
     }

This is a sample of the input file:

1 SCAFFOLD1 1535 1688 chrX 44389546 44389697 + 6498
TACAATAGGTCAAGGGTCTGCAAACTATAGGTTTAAAAATTAAAAAGAA-GAAAAATATATGGTGGAGACTGGTTGGGATCATAAAGCCCAATATATTTATTGTATGGTCtgtgt-tagccaggagtcttcagagaaacagaaccaataagataCA
TACAATAAATCAGAGGTCAGCAAGCTATAGGTTTT----TTAAACAGGACAAAAAATATACAACAGAGAAAATGTAGGACCAGAAAACCCAACATATTTATTATATGGGCTTTTTGTGgtcagggttctcctgtgaaacaggaccaataggatgta

3 SCAFFOLD1 3665 3845 chrX 44391563 44391740 + 7187
CCCTAAAAAGTCA-GTTTTTCA------AGAAGCATAAGCATAGTGTAAATGTAGGAGTTCATAGATCCATAGCAGGGAGAGCTGTTTAGCCTACTTATAGCTTATTTCCAGCTTATATCATCTGTTTGGGGCACGGTCATCCCTAGAGGCAGAGGAA-GAGATTTGGAATGAGGTTTTAGCATGATAT
TCCTGAAAATTTATATTTTTCACCAAGAAGAAACATAAACATCTTGCACA---AGGA---CATAAATCTATAGCTGGGGGTGCTGTT-AGTCTAGTTCTAGCATATTTCTAGCCTACATCATCTGTTTGGGGCATAATCATGTCTGGAAGAAAAGGAATGAGGTTTG----GGGATTTTAGCATGGTAT

17 SCAFFOLD2 22789 22919 chrX 44409117 44409239 - 5180
AGAATACACATCATAGTTATCATAGGGGAAT-GTTTAGGTGGCAGGATAAGGCATATTT--TTTTCTTTTCTCTGGTCTGTAAATTCTCTAACATAACTATATTGCTTTTAAATTTTAAATTGATTTTCAATTA
agaaaacacacc-cacttataatagtggatttgtccaggtggcaggactatacatctttgttttctttttttcttgtTTATAAATGTTCTAATATAACTATATTGCCtttaaa----------atttttaatta

From cjfields at uiuc.edu  Thu Jul 21 14:28:49 2005
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu Jul 21 14:19:55 2005
Subject: [Bioperl-l] PPM for bioperl-1.5?
Message-ID: <6.2.1.2.2.20050721131602.03c8f370@express.cites.uiuc.edu>

I noticed the PPM for the latest developer bioperl (v 1.5) isn't found in 
http://bioperl.org/DIST.  I saw that Nathan created one a while back; did 
anyone transfer it over to the above directory?

__________________________________

Chris Fields - Postdoctoral Researcher
Lab of Dr. Robert Switzer

Address:

University of Illinois at Urbana-Champaign
Dept. of Biochemistry - 323 RAL
600 S. Mathews Ave.
Urbana, IL 61801

Phone : (217) 333-7098
Fax : (217) 244-5858 

From cjfields at uiuc.edu  Thu Jul 21 15:11:27 2005
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu Jul 21 15:02:10 2005
Subject: [Bioperl-l] PPM for bioperl-1.5?
In-Reply-To: <6.2.1.2.2.20050721131602.03c8f370@express.cites.uiuc.edu>
References: <6.2.1.2.2.20050721131602.03c8f370@express.cites.uiuc.edu>
Message-ID: <6.2.1.2.2.20050721140910.03cfac40@express.cites.uiuc.edu>

Also, not to complicate things, but are Lincoln's bioperl-1.5 PPM (at 
http://www.gmod.org/ggb/ppm/) and Nathan's version (at 
<http://web.ukonline.co.uk/nathanhaigh/bioperl/bioperl.ppd>http://web.ukonline.co.uk/nathanhaigh/bioperl/bioperl.ppd) 

essentially the same?  I noticed that Nathan's has a bunch of dependencies 
but Lincoln's doesn't.
Chris

At 01:28 PM 7/21/2005, Chris Fields wrote:
>I noticed the PPM for the latest developer bioperl (v 1.5) isn't found in 
>http://bioperl.org/DIST.  I saw that Nathan created one a while back; did 
>anyone transfer it over to the above directory?
>
>__________________________________
>
>Chris Fields - Postdoctoral Researcher
>Lab of Dr. Robert Switzer
>
>Address:
>
>University of Illinois at Urbana-Champaign
>Dept. of Biochemistry - 323 RAL
>600 S. Mathews Ave.
>Urbana, IL 61801
>
>Phone : (217) 333-7098
>Fax : (217) 244-5858
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l@portal.open-bio.org
>http://portal.open-bio.org/mailman/listinfo/bioperl-l

__________________________________

Chris Fields - Postdoctoral Researcher
Lab of Dr. Robert Switzer

Address:

University of Illinois at Urbana-Champaign
Dept. of Biochemistry - 323 RAL
600 S. Mathews Ave.
Urbana, IL 61801

Phone : (217) 333-7098
Fax : (217) 244-5858 

From jason.stajich at duke.edu  Thu Jul 21 15:12:39 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Thu Jul 21 15:03:38 2005
Subject: [Bioperl-l] $hsp->seq_inds and axt file
In-Reply-To: <5865004505072111156b10d5bd@mail.gmail.com>
References: <5865004505072111156b10d5bd@mail.gmail.com>
Message-ID: <1121973159.42dff3a7c358d@webmail.duke.edu>

There's no midline/homology line in the axt format so there is no way to know
which columns are identical so I don't see how it can work.

-jason
-- 
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/


Quoting Alynne Chiromatzo <chiromatzo@gmail.com>:

> Hi!
> 
> I'm having trouble in finding the hsp->seq_inds in the axt file(whole
> genome alignment from UCSC Genome Browser). The code is below and a
> sample of the input file. It doens't show the sequence that it suppose
> to contain. Anyone can help me?
> 
> Thanks very much!
> 
> Alynne Oya.
> 
> #! /usr/bin/perl
> 
> use Bio::SearchIO;
> 
>  my $parser = new Bio::SearchIO(-format => 'axt',
>                                  -file   => '/work/project/align/testeaxt');
>  while( my $result = $parser->next_result ) {
>    while( my $hit = $result->next_hit ) {
>      while( my $hsp = $hit->next_hsp) {
>          print "Hank: ".$hsp->rank." Strand : ".$hsp->strand('hit')."\n";
>          print "Query Name: ".$result->query_name." Hit Name:
> ".$hit->name."\n";
>          ($query_beg, $query_end) = $hsp->range('query');#encontra os
> valores de inicio-final, mas soh q somados de 1
>          ($hit_beg,$hit_end) = $hsp->range('hit');
>          print "Range: ".($query_beg-1)."-".($query_end-1)."
> ".($hit_beg-1)."-".($hit_end-1)."\n";
>          print $hsp->query_string."\n".$hsp->hit_string."\n";
>          @h_ind = $hsp->seq_inds('query', 'identical', 1);
> 
>          #Here doesn't apper the index sequence like it suppose to show
>          foreach (@h_ind){    
>             print "==> ".$_." ";
>          }
>          print "\n";
>      }
> 
> This is a sample of the input file:
> 
> 1 SCAFFOLD1 1535 1688 chrX 44389546 44389697 + 6498
>
TACAATAGGTCAAGGGTCTGCAAACTATAGGTTTAAAAATTAAAAAGAA-GAAAAATATATGGTGGAGACTGGTTGGGATCATAAAGCCCAATATATTTATTGTATGGTCtgtgt-tagccaggagtcttcagagaaacagaaccaataagataCA
>
TACAATAAATCAGAGGTCAGCAAGCTATAGGTTTT----TTAAACAGGACAAAAAATATACAACAGAGAAAATGTAGGACCAGAAAACCCAACATATTTATTATATGGGCTTTTTGTGgtcagggttctcctgtgaaacaggaccaataggatgta
> 
> 3 SCAFFOLD1 3665 3845 chrX 44391563 44391740 + 7187
>
CCCTAAAAAGTCA-GTTTTTCA------AGAAGCATAAGCATAGTGTAAATGTAGGAGTTCATAGATCCATAGCAGGGAGAGCTGTTTAGCCTACTTATAGCTTATTTCCAGCTTATATCATCTGTTTGGGGCACGGTCATCCCTAGAGGCAGAGGAA-GAGATTTGGAATGAGGTTTTAGCATGATAT
>
TCCTGAAAATTTATATTTTTCACCAAGAAGAAACATAAACATCTTGCACA---AGGA---CATAAATCTATAGCTGGGGGTGCTGTT-AGTCTAGTTCTAGCATATTTCTAGCCTACATCATCTGTTTGGGGCATAATCATGTCTGGAAGAAAAGGAATGAGGTTTG----GGGATTTTAGCATGGTAT
> 
> 17 SCAFFOLD2 22789 22919 chrX 44409117 44409239 - 5180
>
AGAATACACATCATAGTTATCATAGGGGAAT-GTTTAGGTGGCAGGATAAGGCATATTT--TTTTCTTTTCTCTGGTCTGTAAATTCTCTAACATAACTATATTGCTTTTAAATTTTAAATTGATTTTCAATTA
>
agaaaacacacc-cacttataatagtggatttgtccaggtggcaggactatacatctttgttttctttttttcttgtTTATAAATGTTCTAATATAACTATATTGCCtttaaa----------atttttaatta
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 

From cain at cshl.edu  Thu Jul 21 15:21:31 2005
From: cain at cshl.edu (Scott Cain)
Date: Thu Jul 21 15:13:02 2005
Subject: [Bioperl-l] PPM for bioperl-1.5?
In-Reply-To: <6.2.1.2.2.20050721140910.03cfac40@express.cites.uiuc.edu>
References: <6.2.1.2.2.20050721131602.03c8f370@express.cites.uiuc.edu>
	<6.2.1.2.2.20050721140910.03cfac40@express.cites.uiuc.edu>
Message-ID: <1121973691.3494.37.camel@localhost.localdomain>

The ppm on the gmod website is really intended to be just enough to get
GBrowse working and nothing more (though I'm sure you could do more with
it, just not the stuff that there are missing dependencies for).

Scott

On Thu, 2005-07-21 at 14:11 -0500, Chris Fields wrote:
> Also, not to complicate things, but are Lincoln's bioperl-1.5 PPM (at 
> http://www.gmod.org/ggb/ppm/) and Nathan's version (at 
> <http://web.ukonline.co.uk/nathanhaigh/bioperl/bioperl.ppd>http://web.ukonline.co.uk/nathanhaigh/bioperl/bioperl.ppd) 
> 
> essentially the same?  I noticed that Nathan's has a bunch of dependencies 
> but Lincoln's doesn't.
> Chris
> 
> At 01:28 PM 7/21/2005, Chris Fields wrote:
> >I noticed the PPM for the latest developer bioperl (v 1.5) isn't found in 
> >http://bioperl.org/DIST.  I saw that Nathan created one a while back; did 
> >anyone transfer it over to the above directory?
> >
> >__________________________________
> >
> >Chris Fields - Postdoctoral Researcher
> >Lab of Dr. Robert Switzer
> >
> >Address:
> >
> >University of Illinois at Urbana-Champaign
> >Dept. of Biochemistry - 323 RAL
> >600 S. Mathews Ave.
> >Urbana, IL 61801
> >
> >Phone : (217) 333-7098
> >Fax : (217) 244-5858
> >_______________________________________________
> >Bioperl-l mailing list
> >Bioperl-l@portal.open-bio.org
> >http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 
> __________________________________
> 
> Chris Fields - Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> 
> Address:
> 
> University of Illinois at Urbana-Champaign
> Dept. of Biochemistry - 323 RAL
> 600 S. Mathews Ave.
> Urbana, IL 61801
> 
> Phone : (217) 333-7098
> Fax : (217) 244-5858 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain@cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory

From hartzell at kestrel.alerce.com  Thu Jul 21 15:34:19 2005
From: hartzell at kestrel.alerce.com (George Hartzell)
Date: Thu Jul 21 15:26:03 2005
Subject: [Bioperl-l] "Be forgiving in what you accept" and
	Bio::Tools::GuessSeqFormat
Message-ID: <200507211934.j6LJYJO3007600@satchel.alerce.com>


There's a great "old" Internet maxim, "Be forgiving in what you accept
and strict in what you send".

The Bio::Seqio modules seem to be able to cope with "fasta" formatted
files that have a space separating the ">" from the rest of the line
(e.g.  "> ape") if a) you explicitly specify the format or b) if you
have the sequence in a file that ends in "fa" (or generally matches
the list of patterns that correspond to fasta file names).

But, if you happen to have the sequence in a file with a funny name
(e.g. /var/tmp/apreq23ZHis [aka a form upload]) then it fails.  It
can't guess based on the filename and the file content test is strict
and wants to see the header line without the whitespace (">ape").

Is there any reason not to extend the regexp a bit and relax that
constraint (since everything else seems to cope with it)?

Something like this:

*** /usr/local/lib/perl5/site_perl/5.8.6/Bio/Tools/GuessSeqFormat.pm.orig	Thu Jul 21 12:30:55 2005
--- /usr/local/lib/perl5/site_perl/5.8.6/Bio/Tools/GuessSeqFormat.pm	Thu Jul 21 12:31:45 2005
***************
*** 591,595 ****
      my ($line, $lineno) = (shift, shift);
      return (($lineno != 1 && $line =~ /^[A-IK-NP-Z]+$/i) ||
!             $line =~ /^>\w/);
  }
  
--- 591,595 ----
      my ($line, $lineno) = (shift, shift);
      return (($lineno != 1 && $line =~ /^[A-IK-NP-Z]+$/i) ||
!             $line =~ /^>\s*\w/);
  }
  
g.
From brian_osborne at cognia.com  Thu Jul 21 16:04:29 2005
From: brian_osborne at cognia.com (Brian Osborne)
Date: Thu Jul 21 15:55:26 2005
Subject: [Bioperl-l] "Be forgiving in what you accept" and
	Bio::Tools::GuessSeqFormat
In-Reply-To: <200507211934.j6LJYJO3007600@satchel.alerce.com>
Message-ID: <BF05780D.2D45%brian_osborne@cognia.com>

George,

This does sound like a reasonable change, I will make it unless someone has
an objection. Let's wait a moment...

Brian O.


On 7/21/05 3:34 PM, "George Hartzell" <hartzell@kestrel.alerce.com> wrote:

> 
> There's a great "old" Internet maxim, "Be forgiving in what you accept
> and strict in what you send".
> 
> The Bio::Seqio modules seem to be able to cope with "fasta" formatted
> files that have a space separating the ">" from the rest of the line
> (e.g.  "> ape") if a) you explicitly specify the format or b) if you
> have the sequence in a file that ends in "fa" (or generally matches
> the list of patterns that correspond to fasta file names).
> 
> But, if you happen to have the sequence in a file with a funny name
> (e.g. /var/tmp/apreq23ZHis [aka a form upload]) then it fails.  It
> can't guess based on the filename and the file content test is strict
> and wants to see the header line without the whitespace (">ape").
> 
> Is there any reason not to extend the regexp a bit and relax that
> constraint (since everything else seems to cope with it)?
> 
> Something like this:
> 
> *** /usr/local/lib/perl5/site_perl/5.8.6/Bio/Tools/GuessSeqFormat.pm.orig Thu
> Jul 21 12:30:55 2005
> --- /usr/local/lib/perl5/site_perl/5.8.6/Bio/Tools/GuessSeqFormat.pm Thu Jul
> 21 12:31:45 2005
> ***************
> *** 591,595 ****
>       my ($line, $lineno) = (shift, shift);
>       return (($lineno != 1 && $line =~ /^[A-IK-NP-Z]+$/i) ||
> !             $line =~ /^>\w/);
>   }
>   
> --- 591,595 ----
>       my ($line, $lineno) = (shift, shift);
>       return (($lineno != 1 && $line =~ /^[A-IK-NP-Z]+$/i) ||
> !             $line =~ /^>\s*\w/);
>   }
>   
> g.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l


From akarger at CGR.Harvard.edu  Thu Jul 21 15:58:19 2005
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Thu Jul 21 16:02:11 2005
Subject: [Bioperl-l] The Scriptome now has mailing lists
Message-ID: <339D68B133EAD311971E009027DC47970321AF08@montecarlo.cgr.harvard.edu>

A few months ago, I introduced the Scriptome, a new cookbook/toolbox of Perl
one-liners that allows non-programmer biologists to manipulate their data.  

I've just created some mailing lists, so I don't have to clutter this list
anymore. The scriptome-announce list will be  very low traffic (maybe 1
email per month), and scriptome-users will (hopefully) be busier. Subscribe
to either or both at http://bioinformatics.org/mail/?group_id=505 

Most of you already know how to program, but I'm hoping you'll let some
non-geeks know about this resource - or maybe help build it. Now that we've
got 40 or 50 tools on the website (http://cgr.harvard.edu/cbg/scriptome) we
would love to get feedback from Real Biologists and the people who support
them.

Cheers,

- Amir Karger
Computational Biology Group
Bauer Center for Genomics Research
Harvard University
From Marc.Logghe at devgen.com  Thu Jul 21 17:06:42 2005
From: Marc.Logghe at devgen.com (Marc Logghe)
Date: Thu Jul 21 16:57:12 2005
Subject: [Bioperl-l] Re: Bioperl-l Digest, Vol 27, Issue 8
Message-ID: <0C528E3670D8CE4B8E013F6749231AA607D885@ANTARESIA.be.devgen.com>

Hi Ping,
I have a strong feeling you are looking for bioperl-db/biosql.
As soon as you have set up the system you can load your genbank records with the command:
load_seqdatabase.pl --host localhost --dbname biosql \
                       --namespace my_genbank --format genbank \
                       my/genbank/file.gb

You can perform queries using the API or fetch records by accession number with the bioentry2flat.pl script.
A good starting point is the BOSC2003 presentation of Hilmar:
http://www.open-bio.org/bosc2003/slides/Persistent_Bioperl_BOSC03.pdf
HTH,
Marc


-----Original Message-----
From: bioperl-l-bounces@portal.open-bio.org on behalf of Ping Yao
Sent: Thu 7/21/2005 6:49 PM
To: bioperl-l@portal.open-bio.org
Subject: [Bioperl-l] Re: Bioperl-l Digest, Vol 27, Issue 8
 
Hi group :
 I want to download genes from genbank and put them in my local database 
MySQL.
 Now what I can do is to download into different files .
 So who can help me put them into MySQL ?
 Or does anyone have the code for it and let me try ?
 Ping

_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l


From stephen.taylor at molecular-sciences.ox.ac.uk  Wed Jul 20 04:24:04 2005
From: stephen.taylor at molecular-sciences.ox.ac.uk (Steve Taylor)
Date: Thu Jul 21 17:13:29 2005
Subject: [Bioperl-l] Re: [Gmod-gbrowse] Adding human chromosomes as
	reference sequences
In-Reply-To: <200507191243.37491.lstein@cshl.edu>
References: <ef371bf7766d7e620d68393126809000@helsinki.fi>
	<200507191243.37491.lstein@cshl.edu>
Message-ID: <42DE0A24.6030705@molecular-sciences.ox.ac.uk>

Hi,

> As of about a week ago the xyplot.pm glyph has been enhanced to accept 
> negative scores. You can also colorize the bars and points according to the 
> score or other criteria.

That's great! Is it best to do a full CVS update of bioperl and gbrowse (1_62-bugfixes
branch) or will just updating bioperl suffice to get these features?

Thanks and Regards,

Steve
From ilari.scheinin at helsinki.fi  Wed Jul 20 05:37:36 2005
From: ilari.scheinin at helsinki.fi (Ilari Scheinin)
Date: Thu Jul 21 17:13:40 2005
Subject: [Bioperl-l] Re: [Gmod-gbrowse] Adding human chromosomes as
	reference sequences
In-Reply-To: <200507191243.37491.lstein@cshl.edu>
References: <ef371bf7766d7e620d68393126809000@helsinki.fi>
	<200507191243.37491.lstein@cshl.edu>
Message-ID: <b0a422a3c89ce29ee72a7ce63faa1e2b@helsinki.fi>

On 19.7.2005, at 19:43, Lincoln Stein wrote:
> I'm sorry that the ucsc_genes2gff.pl script isn't loading the 
> chromosome
> extents; We just need a similar script called ucsc_chromosomes2gff.pl 
> or
> something similar. Ilari, since you've already essentially done this, 
> perhaps
> you'd be willing to contribute the script? I'll add it to bioperl.

Actually I wrote my script with PHP, because I don't really know much 
about Perl. I just recently wanted to use gbrowse and for that reason 
installed Bioperl. I have started learning myself some Perl, but I 
think I'm more in the "Hello world" stage than chromosomes2gff stage.

Anyway, the chromInfo.txt file from UCSC is just a tab delimited file 
where the first field is the name of the chromosome, and the second 
field contains the number of bases. So it is really simple to do a 
chromosomes2gff script.

If someone is interested, here is the PHP script I used. It doesn't 
convert the chromosome info to a GFF file, but directly loads the data 
into a mysql database. It is a really dummy script and doesn't do any 
kind of checks whether it can really read the provided file and access 
the database, or whether some of the data already exists. It doesn't 
touch the fbin column of the table fdata, because I have no idea what 
it is for. It is not mentioned in perldoc 
Bio::DB::GFF::Adaptor::dbi::mysql.

#!/usr/bin/php -f
<?php
         $host = "";
         $db = "";
         $user = "";
         $pass = "";

         $file = $argv[1];
         if (!$file) {
                 echo "Usage: $argv[0] <path to chromInfo.txt>\n";
                 exit();
         }
         $con = mysql_connect($host, $user, $pass);
         mysql_select_db($db, $con);
         mysql_query("insert into ftype (fmethod, fsource) values 
('chromosome', 'assembly')", $con);
         $ftypeid = mysql_insert_id($con);
         $fp = fopen($file, "r");
         $count = 0;
         while ($line = fgets($fp)) {
                 $fields = explode("\t", $line);
                 mysql_query("insert into fgroup (gclass, gname) values 
('chromosome', '$fields[0]')", $con);
                 $gid = mysql_insert_id();
                 mysql_query("insert into fdata (fref, fstart, fstop, 
ftypeid, gid) values ('$fields[0]', 1, $fields[1], $ftypeid, $gid)", 
$con);
                 $count++;
         }
         fclose($fp);
         mysql_close($con);
         echo "Added $count entries.\n";
?>

Ilari

From Guido.Dieterich at gbf.de  Thu Jul 21 09:57:42 2005
From: Guido.Dieterich at gbf.de (Guido Dieterich)
Date: Thu Jul 21 17:13:44 2005
Subject: [Bioperl-l] HTTP response size check in
	Bio::Tools::Run::RemoteBlast
In-Reply-To: <013A3AB6-52F7-4813-A220-6F2A0F57B92F@duke.edu>
References: <42DEBFE0.1080209@scitegic.com>
	<013A3AB6-52F7-4813-A220-6F2A0F57B92F@duke.edu>
Message-ID: <1121954262.16542.51.camel@sb289.gbf-braunschweig.de>

Skipped content of type multipart/alternative-------------- next part --------------
An embedded message was scrubbed...
From: Jason Stajich <jason.stajich@duke.edu>
Subject: Re: [Bioperl-l] HTTP response size check in
	Bio::Tools::Run::RemoteBlast
Date: Wed, 20 Jul 2005 21:23:35 -0400
Size: 4381
Url: http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050721/25ac06a2/attachment.eml
From n.haigh at sheffield.ac.uk  Fri Jul 22 04:14:34 2005
From: n.haigh at sheffield.ac.uk (Nathan Haigh)
Date: Fri Jul 22 04:09:39 2005
Subject: [Bioperl-l] PPM for bioperl-1.5?
In-Reply-To: <6.2.1.2.2.20050721140910.03cfac40@express.cites.uiuc.edu>
Message-ID: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAAIBuBbth5gEeZS9eEGziB38KAAAAQAAAAqCdBs8Wt606dV2t5KcDxoAEAAAAA@sheffield.ac.uk>

I created the ppd from the 1.5 release and then manually added as many
dependencies from bioperl as I could find to make things as simple and
complete as possible for those who wish to install via PPM. As a result it
should be a pretty self contained download for bioperl 1.5 but if you are
missing many of the dependencies it could take a while to download them all!

I have used that particular ppd to install bioperl 1.5 on a clean system and
as far as I remember it installs most thing if you have the repositories
added to PPM that are mentioned in the INSTALL.WIN file:
http://bioperl.org/Core/Latest/INSTALL.WIN

A point of interest when using PPM:
Try not to use PPM to do something like "upgrade <package>", inconsistencies
in PPM and peoples naming of ppd files can result in an old version of a
package being installed. Therefore always use: "search <package>" and
"install <number>" in order to obtain the correct version of a package.

Let me know if you have any problems.
Nathan


-----Original Message-----
From: bioperl-l-bounces@portal.open-bio.org
[mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Chris Fields
Sent: 21 July 2005 20:11
To: bioperl-l@bioperl.org
Subject: Re: [Bioperl-l] PPM for bioperl-1.5?

Also, not to complicate things, but are Lincoln's bioperl-1.5 PPM (at 
http://www.gmod.org/ggb/ppm/) and Nathan's version (at 
<http://web.ukonline.co.uk/nathanhaigh/bioperl/bioperl.ppd>http://web.ukonli
ne.co.uk/nathanhaigh/bioperl/bioperl.ppd) 

essentially the same?  I noticed that Nathan's has a bunch of dependencies 
but Lincoln's doesn't.
Chris

At 01:28 PM 7/21/2005, Chris Fields wrote:
>I noticed the PPM for the latest developer bioperl (v 1.5) isn't found in 
>http://bioperl.org/DIST.  I saw that Nathan created one a while back; did 
>anyone transfer it over to the above directory?
>
>__________________________________
>
>Chris Fields - Postdoctoral Researcher
>Lab of Dr. Robert Switzer
>
>Address:
>
>University of Illinois at Urbana-Champaign
>Dept. of Biochemistry - 323 RAL
>600 S. Mathews Ave.
>Urbana, IL 61801
>
>Phone : (217) 333-7098
>Fax : (217) 244-5858
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l@portal.open-bio.org
>http://portal.open-bio.org/mailman/listinfo/bioperl-l

__________________________________

Chris Fields - Postdoctoral Researcher
Lab of Dr. Robert Switzer

Address:

University of Illinois at Urbana-Champaign
Dept. of Biochemistry - 323 RAL
600 S. Mathews Ave.
Urbana, IL 61801

Phone : (217) 333-7098
Fax : (217) 244-5858 

_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l


From n.haigh at sheffield.ac.uk  Fri Jul 22 04:40:38 2005
From: n.haigh at sheffield.ac.uk (Nathan Haigh)
Date: Fri Jul 22 04:35:25 2005
Subject: [Bioperl-l] PPM for bioperl-1.5?
In-Reply-To: <6.2.1.2.2.20050721131602.03c8f370@express.cites.uiuc.edu>
Message-ID: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAAIBuBbth5gEeZS9eEGziB38KAAAAQAAAAViZj7oecjkaHqGHtQnWciQEAAAAA@sheffield.ac.uk>

No one transferred the ppm file over to the bioperl server, if someone is
able to do this, then please do but note the following:

The latest version of the ppd file should be named bioperl.ppd other
versions should be named something like bioperl-version_no.ppd.

Does anyone know which server the website is served from? Is it on the
pub.open-bio.org server, if so I could make the changes myself if I get the
permissions?

Nathan


-----Original Message-----
From: bioperl-l-bounces@portal.open-bio.org
[mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Chris Fields
Sent: 21 July 2005 19:29
To: bioperl-l@bioperl.org
Subject: [Bioperl-l] PPM for bioperl-1.5?

I noticed the PPM for the latest developer bioperl (v 1.5) isn't found in 
http://bioperl.org/DIST.  I saw that Nathan created one a while back; did 
anyone transfer it over to the above directory?

__________________________________

Chris Fields - Postdoctoral Researcher
Lab of Dr. Robert Switzer

Address:

University of Illinois at Urbana-Champaign
Dept. of Biochemistry - 323 RAL
600 S. Mathews Ave.
Urbana, IL 61801

Phone : (217) 333-7098
Fax : (217) 244-5858 

_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l

From jrm at compbio.dundee.ac.uk  Fri Jul 22 04:39:40 2005
From: jrm at compbio.dundee.ac.uk (Jon Manning)
Date: Fri Jul 22 04:35:50 2005
Subject: [Bioperl-l] Bio::Structure::IO and single chain output
Message-ID: <42E0B0CC.7010609@compbio.dundee.ac.uk>

Hi all,

I've been using Bio::Structure::IO to read PDB files. I'm currently 
trying to calculate solvent accessibilities on a per-chain basis, so 
want to spit out Bio::Structure::Chain objects to PDB-format files so I 
can feed them alone to NAccess. I've tried passing them directly to the 
IO object, this looked like it might work:

...
$out = Bio::Structure::IO->new(-file => ">test.pdb",
                                  '-format' => 'pdb');

$out->write_structure($chain);
...

(where $chain is a Bio::Structure::Chain)

But I get this sort of error:

------------- EXCEPTION  -------------
MSG:  Bio::Structure::Chain=HASH(0x8dc7368) is not a StructureI 
compliant module.
STACK Bio::Structure::IO::pdb::write_structure 
/usr/lib/perl5/site_perl/5.8.5/Bio/Structure/IO/pdb.pm:531
STACK toplevel ./structureIOtest.pl:26


So I then thought to create a Bio::Structure::Entry object, add the 
chain, and feed that to IO, like:

...
my $entry = Bio::Structure::Entry->new(-id  => 'structure_id');
$entry->chain($chainobject);
$out = Bio::Structure::IO->new(-file => ">test.pdb",
                                  '-format' => 'pdb');

$out->write_structure($entry);
...

But I've clearly misunderstood the entry initialisation somewhere, 
because I get this sort of error:

Use of uninitialized value in concatenation (.) or string at 
/usr/lib/perl5/site_perl/5.8.5/Bio/Structure/Entry.pm line 331, <GEN0> 
line 12836.

------------- EXCEPTION  -------------
MSG: add_chain: first argument needs to be a Model object ()

STACK Bio::Structure::Entry::add_chain 
/usr/lib/perl5/site_perl/5.8.5/Bio/Structure/Entry.pm:330
STACK Bio::Structure::Entry::get_chains 
/usr/lib/perl5/site_perl/5.8.5/Bio/Structure/Entry.pm:386
STACK Bio::Structure::Entry::chain 
/usr/lib/perl5/site_perl/5.8.5/Bio/Structure/Entry.pm:300
STACK toplevel ./structureIOtest.pl:25

--------------------------------------


I'd really appreciate some pointers on how to go about doing this.

Thanks,

Jon

From n.haigh at sheffield.ac.uk  Fri Jul 22 05:05:51 2005
From: n.haigh at sheffield.ac.uk (Nathan Haigh)
Date: Fri Jul 22 04:57:48 2005
Subject: [Bioperl-l] "Be forgiving in what you accept"
	andBio::Tools::GuessSeqFormat
In-Reply-To: <BF05780D.2D45%brian_osborne@cognia.com>
Message-ID: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAAIBuBbth5gEeZS9eEGziB38KAAAAQAAAAGJKCWI4aHUC5AbYthOOhiQEAAAAA@sheffield.ac.uk>

May I ask what software is producing this FASTA format file which has a
space immediately after the '>' in the description line?

Although I am not aware of a formal description of FASTA format, I have
never seem any files with a space immediately after '>'. Although I don't
object to relaxing this a little in bioperl, you may find that these files
are not compatible with other software.

Nathan

-----Original Message-----
From: bioperl-l-bounces@portal.open-bio.org
[mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Brian Osborne
Sent: 21 July 2005 21:04
To: hartzell@alerce.com; bioperl-l
Subject: Re: [Bioperl-l] "Be forgiving in what you accept"
andBio::Tools::GuessSeqFormat

George,

This does sound like a reasonable change, I will make it unless someone has
an objection. Let's wait a moment...

Brian O.


On 7/21/05 3:34 PM, "George Hartzell" <hartzell@kestrel.alerce.com> wrote:

> 
> There's a great "old" Internet maxim, "Be forgiving in what you accept
> and strict in what you send".
> 
> The Bio::Seqio modules seem to be able to cope with "fasta" formatted
> files that have a space separating the ">" from the rest of the line
> (e.g.  "> ape") if a) you explicitly specify the format or b) if you
> have the sequence in a file that ends in "fa" (or generally matches
> the list of patterns that correspond to fasta file names).
> 
> But, if you happen to have the sequence in a file with a funny name
> (e.g. /var/tmp/apreq23ZHis [aka a form upload]) then it fails.  It
> can't guess based on the filename and the file content test is strict
> and wants to see the header line without the whitespace (">ape").
> 
> Is there any reason not to extend the regexp a bit and relax that
> constraint (since everything else seems to cope with it)?
> 
> Something like this:
> 
> *** /usr/local/lib/perl5/site_perl/5.8.6/Bio/Tools/GuessSeqFormat.pm.orig
Thu
> Jul 21 12:30:55 2005
> --- /usr/local/lib/perl5/site_perl/5.8.6/Bio/Tools/GuessSeqFormat.pm Thu
Jul
> 21 12:31:45 2005
> ***************
> *** 591,595 ****
>       my ($line, $lineno) = (shift, shift);
>       return (($lineno != 1 && $line =~ /^[A-IK-NP-Z]+$/i) ||
> !             $line =~ /^>\w/);
>   }
>   
> --- 591,595 ----
>       my ($line, $lineno) = (shift, shift);
>       return (($lineno != 1 && $line =~ /^[A-IK-NP-Z]+$/i) ||
> !             $line =~ /^>\s*\w/);
>   }
>   
> g.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l


_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l

From n.haigh at sheffield.ac.uk  Fri Jul 22 04:48:06 2005
From: n.haigh at sheffield.ac.uk (Nathan Haigh)
Date: Fri Jul 22 05:24:15 2005
Subject: [Bioperl-l] PPM for bioperl-1.5?
In-Reply-To: <6.2.1.2.2.20050721140910.03cfac40@express.cites.uiuc.edu>
Message-ID: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAAIBuBbth5gEeZS9eEGziB38KAAAAQAAAAheg7T6etFkawUeHG2nmN7gEAAAAA@sheffield.ac.uk>

Opps, just realized that although the ppd file is available at:

The actual file containing the bioperl stuff isn't available at:
http://bioperl.org/DIST/bioperl-1.5-ppm.tar.gz

If someone could volunteer to put these files in the http://bioperl.org/DIST
directory or grant me access I can do this myself; just let me know and I'll
pass on the relevant files!

Nathan


-----Original Message-----
From: bioperl-l-bounces@portal.open-bio.org
[mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Chris Fields
Sent: 21 July 2005 20:11
To: bioperl-l@bioperl.org
Subject: Re: [Bioperl-l] PPM for bioperl-1.5?

Also, not to complicate things, but are Lincoln's bioperl-1.5 PPM (at 
http://www.gmod.org/ggb/ppm/) and Nathan's version (at 
<http://web.ukonline.co.uk/nathanhaigh/bioperl/bioperl.ppd>http://web.ukonli
ne.co.uk/nathanhaigh/bioperl/bioperl.ppd) 

essentially the same?  I noticed that Nathan's has a bunch of dependencies 
but Lincoln's doesn't.
Chris

At 01:28 PM 7/21/2005, Chris Fields wrote:
>I noticed the PPM for the latest developer bioperl (v 1.5) isn't found in 
>http://bioperl.org/DIST.  I saw that Nathan created one a while back; did 
>anyone transfer it over to the above directory?
>
>__________________________________
>
>Chris Fields - Postdoctoral Researcher
>Lab of Dr. Robert Switzer
>
>Address:
>
>University of Illinois at Urbana-Champaign
>Dept. of Biochemistry - 323 RAL
>600 S. Mathews Ave.
>Urbana, IL 61801
>
>Phone : (217) 333-7098
>Fax : (217) 244-5858
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l@portal.open-bio.org
>http://portal.open-bio.org/mailman/listinfo/bioperl-l

__________________________________

Chris Fields - Postdoctoral Researcher
Lab of Dr. Robert Switzer

Address:

University of Illinois at Urbana-Champaign
Dept. of Biochemistry - 323 RAL
600 S. Mathews Ave.
Urbana, IL 61801

Phone : (217) 333-7098
Fax : (217) 244-5858 

_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l


From khoueiry at ibdm.univ-mrs.fr  Fri Jul 22 10:38:51 2005
From: khoueiry at ibdm.univ-mrs.fr (khoueiry)
Date: Fri Jul 22 10:27:33 2005
Subject: [Bioperl-l] getting patterns consensus
Message-ID: <1122043131.16107.6.camel@DavidLinux>

Hi all,

Let's admit that I have the following pattern : 

$PAT = A[AT]GAT[CT]A

Is there a bioperl method or a fine/fast perl way to get all the
consensus relative to that pattern:
 (i.e)

AAGATCA
AAGATTA
ATGATCA
ATGATTA

Thanks

pierre


From skirov at utk.edu  Fri Jul 22 10:59:40 2005
From: skirov at utk.edu (Stefan Kirov)
Date: Fri Jul 22 10:50:13 2005
Subject: [Bioperl-l] getting patterns consensus
In-Reply-To: <1122043131.16107.6.camel@DavidLinux>
References: <1122043131.16107.6.camel@DavidLinux>
Message-ID: <42E109DC.9030906@utk.edu>

Yes. Look at Bio::Tools::IUPAC. Create the Bio::Seq object, using IUPAC 
coding for ambiguous nucleotides (see the documentation) and then create 
the IUPAC object based on the seq one. Then use next_seq method- it will 
give you exactly what you need.
Stefan

khoueiry wrote:

>Hi all,
>
>Let's admit that I have the following pattern : 
>
>$PAT = A[AT]GAT[CT]A
>
>Is there a bioperl method or a fine/fast perl way to get all the
>consensus relative to that pattern:
> (i.e)
>
>AAGATCA
>AAGATTA
>ATGATCA
>ATGATTA
>
>Thanks
>
>pierre
>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l@portal.open-bio.org
>http://portal.open-bio.org/mailman/listinfo/bioperl-l
>  
>

-- 
Stefan Kirov, Ph.D.
University of Tennessee/Oak Ridge National Laboratory
5700 bldg, PO BOX 2008 MS6164
Oak Ridge TN 37831-6164
USA
tel +865 576 5120
fax +865-576-5332
e-mail: skirov@utk.edu
sao@ornl.gov

"And the wars go on with brainwashed pride
For the love of God and our human rights
And all these things are swept aside"

From brian_osborne at cognia.com  Fri Jul 22 11:04:39 2005
From: brian_osborne at cognia.com (Brian Osborne)
Date: Fri Jul 22 10:55:09 2005
Subject: [Bioperl-l] getting patterns consensus
In-Reply-To: <1122043131.16107.6.camel@DavidLinux>
Message-ID: <BF068347.2DB8%brian_osborne@cognia.com>

Pierre,

I haven't taken a very close look but I believe you can do this with
Bio::Tools::SeqPattern. There's an accompanying script,
examples/tools/seq_pattern.pl.

Brian O.


On 7/22/05 10:38 AM, "khoueiry" <khoueiry@ibdm.univ-mrs.fr> wrote:

> Hi all,
> 
> Let's admit that I have the following pattern :
> 
> $PAT = A[AT]GAT[CT]A
> 
> Is there a bioperl method or a fine/fast perl way to get all the
> consensus relative to that pattern:
>  (i.e)
> 
> AAGATCA
> AAGATTA
> ATGATCA
> ATGATTA
> 
> Thanks
> 
> pierre
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Fri Jul 22 11:11:25 2005
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri Jul 22 11:01:56 2005
Subject: [Bioperl-l] PPM for bioperl-1.5?
In-Reply-To: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAAIBuBbth5gEe
	ZS9eEGziB38KAAAAQAAAAPF/cClbC0E+UuImm364+iwEAAAAA@sheffield.ac.uk>
References: <6.2.1.2.2.20050721131602.03c8f370@express.cites.uiuc.edu>
	<!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAAIBuBbth5gEeZS9eEGziB38KAAAAQAAAAPF/cClbC0E+UuImm364+iwEAAAAA@sheffield.ac.uk>
Message-ID: <6.2.1.2.2.20050722093124.01e1fe88@express.cites.uiuc.edu>

I found the links for both the PPD and PPM in your older response from the 
bioperl emails (I think from Jan 2005), downloaded the files to a local 
directory, and installed everything through a local repository (although I 
had a few slight hitches, see below).  I also installed GBrowse directly 
from the GMOD site and it works as well.  I also found GD-SVG and installed 
it, and I get SVG images with GBrowse just fine!

A few notes:  I had to change the location of the tar archive in the PPD 
file for Bioperl and GD-SVG.  For some reason PPM kept looking for them in 
the Bioperl website (where they are still MIA); I found the link in the PPD 
files for both Bioperl and GD-SVG and changed them by removing the path to 
the file on the site to the local file, reflecting their location in the 
local repository (this is done for every link in each file), so for GD-SVG:

         <ARCHITECTURE NAME="MSWin32-x86-multi-thread-5.8" />
         <CODEBASE HREF="http://bioperl.org/DIST/GD-SVG-0.25-ppm.tar.gz" />
     </IMPLEMENTATION>

to

         <ARCHITECTURE NAME="MSWin32-x86-multi-thread-5.8" />
         <CODEBASE HREF="GD-SVG-0.25-ppm.tar.gz" />
     </IMPLEMENTATION>

Voila!

Also, the repository for Kobes (http://theoryx5.uwinnipeg.ca/ppms) didn't 
work and stopped installation of Bioperl; it is very likely due a problem 
with the latest XML-SAX module (and not Bioperl), which leaves out or 
doesn't initialize the file ParserDetails.ini, which may be causing some 
problems when parsing repositories (although I can't see why); I kep 
getting messages that ParserDetails.ini couldn't be found.  I got around it 
by using the PPM repository for Randy Kobes that ActiveState lists 
(http://theoryx5.uwinnipeg.ca/cgi-bin/ppmserver?urn:/PPMServer58 for Perl 
5.8, http://theoryx5.uwinnipeg.ca/cgi-bin/ppmserver?urn:/PPMServer for 
5.6).  Installation worked fine after that.  I also reinstalled XML-SAX 
from the Kobes repository and everything now loads from the original 
repository listed under the INSTALL.WIN file for Bioperl.  This may be 
something we want to remember if someone comes up with a similar issue; 
since your PPM package has XML-SAX listed as a dependency, it may install 
the ActiveState version (the bad one) vs. the Kobes version (the good one).

Cheers

Chris

At 03:51 AM 7/22/2005, you wrote:
>Would you be confident enough to install these locally from your own
>computer if I give you the relevant files? It would involve saving the files
>to a dir on your computer adding that directory to your PPM repository list
>and then installing as usual through PPM. Let me know and I can send you the
>files (just over 2Mb), or make them available from a web server.
>
>Nathan
>
>
>-----Original Message-----
>From: bioperl-l-bounces@portal.open-bio.org
>[mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Chris Fields
>Sent: 21 July 2005 19:29
>To: bioperl-l@bioperl.org
>Subject: [Bioperl-l] PPM for bioperl-1.5?
>
>I noticed the PPM for the latest developer bioperl (v 1.5) isn't found in
>http://bioperl.org/DIST.  I saw that Nathan created one a while back; did
>anyone transfer it over to the above directory?
>
>__________________________________
>
>Chris Fields - Postdoctoral Researcher
>Lab of Dr. Robert Switzer
>
>Address:
>
>University of Illinois at Urbana-Champaign
>Dept. of Biochemistry - 323 RAL
>600 S. Mathews Ave.
>Urbana, IL 61801
>
>Phone : (217) 333-7098
>Fax : (217) 244-5858
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l@portal.open-bio.org
>http://portal.open-bio.org/mailman/listinfo/bioperl-l

__________________________________

Chris Fields - Postdoctoral Researcher
Lab of Dr. Robert Switzer

Address:

University of Illinois at Urbana-Champaign
Dept. of Biochemistry - 323 RAL
600 S. Mathews Ave.
Urbana, IL 61801

Phone : (217) 333-7098
Fax : (217) 244-5858 

From hartzell at kestrel.alerce.com  Fri Jul 22 11:35:47 2005
From: hartzell at kestrel.alerce.com (George Hartzell)
Date: Fri Jul 22 11:27:45 2005
Subject: [Bioperl-l] "Be forgiving in what you accept"
	andBio::Tools::GuessSeqFormat
In-Reply-To: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAAIBuBbth5gEeZS9eEGziB38KAAAAQAAAAGJKCWI4aHUC5AbYthOOhiQEAAAAA@sheffield.ac.uk>
References: <BF05780D.2D45%brian_osborne@cognia.com>
	<!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAAIBuBbth5gEeZS9eEGziB38KAAAAQAAAAGJKCWI4aHUC5AbYthOOhiQEAAAAA@sheffield.ac.uk>
Message-ID: <17121.4691.792887.4566@satchel.alerce.com>


Nathan Haigh writes:
 > May I ask what software is producing this FASTA format file which has a
 > space immediately after the '>' in the description line?

I don't know what created it.  Wouldn't surprise me to find out it was
created in Microsoft Word....  It was given to me as a example input
file/test case.

 > Although I am not aware of a formal description of FASTA format, I have
 > never seem any files with a space immediately after '>'. Although I don't
 > object to relaxing this a little in bioperl, you may find that these files
 > are not compatible with other software.

Yeah, there is that.  On the other hand, then we should make the
equivalent change and have the Bio::SeqIO object fail on them even if
it's told that they're Fasta (e.g. by -format or by guessing based on
filename).

I was just frustrated when stuff worked up until the moment that I
uploaded the file into a tool via the web (at which point it ended up
in an oddly named file and the guessing heuristic broke).

I'd vote for relaxing the constraint, but, hey....

g.
From n.haigh at sheffield.ac.uk  Fri Jul 22 12:15:34 2005
From: n.haigh at sheffield.ac.uk (Nathan Haigh)
Date: Fri Jul 22 12:07:29 2005
Subject: [Bioperl-l] "Be forgiving in what you
	accept"andBio::Tools::GuessSeqFormat
In-Reply-To: <17121.4691.792887.4566@satchel.alerce.com>
Message-ID: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAAIBuBbth5gEeZS9eEGziB38KAAAAQAAAAKMFViaJt6k2o03lKKA8IwgEAAAAA@sheffield.ac.uk>

If you specified that the file was FASTA, I'm not sure how the parser would
work for pulling out primary_id, display_id etc etc for the sequence - have
you check that the parser is flexible enough to pull these out of a sequence
description that has a space after the '>'?

It may be better to strip out these spaces prior to using them in bioperl?
But to be honest I wouldn't be bothered either way! :o)

Nathan

-----Original Message-----
From: George Hartzell [mailto:hartzell@kestrel.alerce.com] 
Sent: 22 July 2005 16:36
To: n.haigh@sheffield.ac.uk
Cc: 'Brian Osborne'; 'bioperl-l'
Subject: RE: [Bioperl-l] "Be forgiving in what you
accept"andBio::Tools::GuessSeqFormat


Nathan Haigh writes:
 > May I ask what software is producing this FASTA format file which has a
 > space immediately after the '>' in the description line?

I don't know what created it.  Wouldn't surprise me to find out it was
created in Microsoft Word....  It was given to me as a example input
file/test case.

 > Although I am not aware of a formal description of FASTA format, I have
 > never seem any files with a space immediately after '>'. Although I don't
 > object to relaxing this a little in bioperl, you may find that these
files
 > are not compatible with other software.

Yeah, there is that.  On the other hand, then we should make the
equivalent change and have the Bio::SeqIO object fail on them even if
it's told that they're Fasta (e.g. by -format or by guessing based on
filename).

I was just frustrated when stuff worked up until the moment that I
uploaded the file into a tool via the web (at which point it ended up
in an oddly named file and the guessing heuristic broke).

I'd vote for relaxing the constraint, but, hey....

g.


From khoueiry at ibdm.univ-mrs.fr  Fri Jul 22 12:34:36 2005
From: khoueiry at ibdm.univ-mrs.fr (khoueiry)
Date: Fri Jul 22 12:23:45 2005
Subject: [Bioperl-l] getting patterns consensus
In-Reply-To: <42E109DC.9030906@utk.edu>
References: <1122043131.16107.6.camel@DavidLinux> <42E109DC.9030906@utk.edu>
Message-ID: <1122050076.5577.2.camel@DavidLinux>

Thanks,

Kirov, Your method did exactly what I need. brian, no I  don't think
that Bio::Tools::SeqPattern resolve the prob unless  something's missing
me. 

Thanks again

Pierre

Le vendredi 22 juillet 2005 ? 10:59 -0400, Stefan Kirov a ?crit : 

> Yes. Look at Bio::Tools::IUPAC. Create the Bio::Seq object, using IUPAC 
> coding for ambiguous nucleotides (see the documentation) and then create 
> the IUPAC object based on the seq one. Then use next_seq method- it will 
> give you exactly what you need.
> Stefan
> 
> khoueiry wrote:
> 
> >Hi all,
> >
> >Let's admit that I have the following pattern : 
> >
> >$PAT = A[AT]GAT[CT]A
> >
> >Is there a bioperl method or a fine/fast perl way to get all the
> >consensus relative to that pattern:
> > (i.e)
> >
> >AAGATCA
> >AAGATTA
> >ATGATCA
> >ATGATTA
> >
> >Thanks
> >
> >pierre
> >
> >
> >_______________________________________________
> >Bioperl-l mailing list
> >Bioperl-l@portal.open-bio.org
> >http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >  
> >
> 
From victor.ruotti at gmail.com  Fri Jul 22 13:16:22 2005
From: victor.ruotti at gmail.com (Victor)
Date: Fri Jul 22 13:07:31 2005
Subject: [Bioperl-l] The Scriptome now has mailing lists
In-Reply-To: <339D68B133EAD311971E009027DC47970321AF08@montecarlo.cgr.harvard.edu>
References: <339D68B133EAD311971E009027DC47970321AF08@montecarlo.cgr.harvard.edu>
Message-ID: <36d7e55505072210166a3f666e@mail.gmail.com>

Hi Lincoln:

We are testing your bp_fast_load_gff.pl program on Solaris 10. 
The script was downloaded from bioperl-live cvs repository. 
It wont go pass creating the pipe files:
./bp_fast_load_gff.pl -d human_test test.gff
loading normalized group, type and attribute information...ok
creating load file /export/home/victor/fastload/fdata.16445...ok
opening load file for writing...

Here is the line of code where I think is having problem executing:
$FH{$_} = IO::File->new($file,'>') or die $_,": $!";

Is this because of the pipes used to fast things up?

Thanks,
Victor

From skirov at utk.edu  Fri Jul 22 13:39:07 2005
From: skirov at utk.edu (Stefan Kirov)
Date: Fri Jul 22 13:34:24 2005
Subject: [Bioperl-l] getting patterns consensus
In-Reply-To: <1122050076.5577.2.camel@DavidLinux>
References: <1122043131.16107.6.camel@DavidLinux> <42E109DC.9030906@utk.edu>
	<1122050076.5577.2.camel@DavidLinux>
Message-ID: <42E12F3B.6020806@utk.edu>

Pierre,
Thanks, but actually the module is written by Aaaron Mackay, so I guess 
your gratitude goes to him :-) .
Stefan

khoueiry wrote:

> Thanks,
>
> Kirov, Your method did exactly what I need. brian, no I  don't think 
> that Bio::Tools::SeqPattern resolve the prob unless  something's 
> missing me.
>
> Thanks again
>
> Pierre
>
> Le vendredi 22 juillet 2005 ? 10:59 -0400, Stefan Kirov a ?crit :
>
>>Yes. Look at Bio::Tools::IUPAC. Create the Bio::Seq object, using IUPAC 
>>coding for ambiguous nucleotides (see the documentation) and then create 
>>the IUPAC object based on the seq one. Then use next_seq method- it will 
>>give you exactly what you need.
>>Stefan
>>
>>khoueiry wrote:
>>
>>>Hi all,
>>>
>>>Let's admit that I have the following pattern : 
>>>
>>>$PAT = A[AT]GAT[CT]A
>>>
>>>Is there a bioperl method or a fine/fast perl way to get all the
>>>consensus relative to that pattern:
>>> (i.e)
>>>
>>>AAGATCA
>>>AAGATTA
>>>ATGATCA
>>>ATGATTA
>>>
>>>Thanks
>>>
>>>pierre
>>>
>>>
>>>_______________________________________________
>>>Bioperl-l mailing list
>>>Bioperl-l@portal.open-bio.org <mailto:Bioperl-l@portal.open-bio.org>
>>>http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>  
>>>
>>
>>    
>>

-- 
Stefan Kirov, Ph.D.
University of Tennessee/Oak Ridge National Laboratory
5700 bldg, PO BOX 2008 MS6164
Oak Ridge TN 37831-6164
USA
tel +865 576 5120
fax +865-576-5332
e-mail: skirov@utk.edu
sao@ornl.gov

"And the wars go on with brainwashed pride
For the love of God and our human rights
And all these things are swept aside"

From Steve_Chervitz at affymetrix.com  Fri Jul 22 14:22:26 2005
From: Steve_Chervitz at affymetrix.com (Chervitz, Steve)
Date: Fri Jul 22 14:14:20 2005
Subject: [Bioperl-l] "Be forgiving in what you accept" and
	Bio::Tools::GuessSeqFormat
In-Reply-To: <200507211934.j6LJYJO3007600@satchel.alerce.com>
Message-ID: <BF068772.1135F%Steve_Chervitz@affymetrix.com>

George Hartzell <hartzell@kestrel.alerce.com>:

> Is there any reason not to extend the regexp a bit and relax that
> constraint (since everything else seems to cope with it)?

Seems reasonable. I've seen fasta files where there was no id at all, just a
'>' by itself on a line followed by a line of sequence. Perhaps the sequence
format guesser should accept as fasta any input with a line beginning with
'>'? But maybe this is too radical...

> But, if you happen to have the sequence in a file with a funny name
> (e.g. /var/tmp/apreq23ZHis [aka a form upload]) then it fails.  It
> can't guess based on the filename and the file content test is strict
> and wants to see the header line without the whitespace (">ape").

Would be good to add this example to the SeqIO test suite.

>
> There's a great "old" Internet maxim, "Be forgiving in what you accept
> and strict in what you send".

Here's an interesting discussion on this philosophy:
http://www.artima.com/forums/flat.jsp?forum=106&thread=4204

The FCC had this notion long before the internet. It's part of the specs on
many off-the-shelf electronic devices: CFR part 15 "Devices must not
interfere with licensed services and must accept interference from licensed
services."

I found a recent presentation on the FCC site showing results of a survey
about whether part 15 stifles innovation (10/14 respondants said no, and 9/5
said more stringent regulations might even permit *more* innovation):

http://www.fcc.gov/oet/tac/Part_15_Survey_12_4_02.ppt

Flexibility in input acceptance may be an issue in Bioperl to the extent
that it leads to complicated code that is difficult to maintain or for
others to grok. But in this particular SeqIO case, flexibility seems
warranted. I think it should be up to a specific application to wield
authority over what it accepts and produces for fasta files. Since bioperl
is a library used by multiple apps, high flexibility in acceptance seems
like a bonus.

Steve


From brian_osborne at cognia.com  Fri Jul 22 16:17:44 2005
From: brian_osborne at cognia.com (Brian Osborne)
Date: Fri Jul 22 16:15:04 2005
Subject: [Bioperl-l] 1.5 for Windows
In-Reply-To: <200507211934.j6LJYJO3007600@satchel.alerce.com>
Message-ID: <BF06CCA8.2DFD%brian_osborne@cognia.com>

bioperl-l,

The following files, provided by Nathan Haigh, have been uploaded to
bioperl.org/DIST:

GD-SVG-0.25-ppm.tar.gz
GD-SVG.ppd

bioperl-1.5-ppm.tar.gz
Bioperl-1.5.ppd

The MD5's have also been added to SIGNATURES.md5.

Brian O.


From rvosa at sfu.ca  Fri Jul 22 19:54:09 2005
From: rvosa at sfu.ca (Rutger Vos)
Date: Fri Jul 22 19:44:54 2005
Subject: [Bioperl-l] who, if anyone, "owns" Bio::
Message-ID: <42E18721.10000@sfu.ca>

Dear bioperlers,

for a while now, I've been working on a set of perl modules for 
phylogenetic analysis. Obviously, I would like other people to use these 
too (and perhaps contribute to development as well), and so I wish to 
upload them to the CPAN. The working title for the root name space has 
been "Phylo::" but I'd like to change this because - rightly - the perl 
community is hesitant towards a proliferation of top level name spaces. 
Also, I might wish to incorporate my work into the bioperl release (that 
is, if the core developers agree) because, well, fragmentation helps 
no-one. I am however on the fence on this second issue.

To help me understand these issues I direct the following questions to 
the folks in the know:

* is the Bio:: namespace reserved for BioPerl proper or can other people 
- if appropriate - use it as their top level name space, in the same way 
as, say, WWW:: or CGI::?

* Do modules that use the Bio:: namespace have to be part of the bioperl 
release? I for one think bioperl is wonderful, but it is mostly aimed at 
molecular biologists, and so a phylogeneticist or evolutionary biologist 
might not want to install the whole thing just to use peripheral 
functionality. After all, BioPerl is (with all due respect) starting to 
grow into a fairly monolithic install.

Basically, I'm curious whether it would be okay on your part if I 
submitted my work under the Bio:: namespace, but as separate installs, 
to the comprehensive perl archive network. Like I said, I am at this 
point not entirely of one mind w.r.t merging/contributing to bioperl 
proper - to be honest, I fear that might be a hairy proposition what 
with other modules possibly making assumptions about underlying data 
structures in the packages rather than following the advertised API and 
what not, but I'm certainly interested in discussing that also.

Looking forward to your replies!

Best wishes,

Rutger


-- 
++++++++++++++++++++++++++++++++++++++++++++
Rutger Vos, PhD. candidate
Department of Biological Sciences
Simon Fraser University
8888 University Drive
Burnaby, BC, V5A1S6
Phone: 604-291-5625 
Fax: 604-291-3496
Personal site: http://www.sfu.ca/~rvosa
FAB* lab: http://www.sfu.ca/~fabstar
++++++++++++++++++++++++++++++++++++++++++++


From senger at ebi.ac.uk  Sat Jul 23 12:07:16 2005
From: senger at ebi.ac.uk (Martin Senger)
Date: Sat Jul 23 11:57:42 2005
Subject: [Bioperl-l] Bio::Biblio - small but important changes
Message-ID: <Pine.LNX.4.44.0507231700170.21622-100000@bagheera.ebi.ac.uk>

Hi,
   I have slightly changed the Bio::Biblio modules that use SOAP to get to
the MEDLINE repository (at EBI). The changes are two - and neither of them
should have any impact on your code (because the interface of Bio::Biblio
has not been changed). Therefore, it should be enough just to update your 
local copy of these modules.

   Anyway, the changes are:

   1) The default location of the MEDLINE/EBI Web Service changed. The new 
one is http://www.ebi.ac.uk/openbqs/services/MedlineSRS.

   2) The Web Service API (that is hidden to most of you by the 
Bio::Biblio's API) has changed in order to comply with the WSDL 
specification (method overloading has been removed - that's why some 
method names changed there).

   Of couse, I will be happy to hear about any problems you may (hopefully
not) meet after these changes.

   Regards,
   Martin

-- 
Martin Senger

EMBL Outstation - Hinxton                Senger@EBI.ac.uk     
European Bioinformatics Institute        Phone: (+44) 1223 494636      
Wellcome Trust Genome Campus             (Switchboard:     494444)
Hinxton                                  Fax  : (+44) 1223 494468
Cambridge CB10 1SD
United Kingdom                           http://industry.ebi.ac.uk/~senger

From hartzell at kestrel.alerce.com  Sat Jul 23 16:45:26 2005
From: hartzell at kestrel.alerce.com (George Hartzell)
Date: Sat Jul 23 16:36:17 2005
Subject: [Bioperl-l] Stop him before he codes again!
	[almost-multiple-alignment tool?]
Message-ID: <17122.44134.422603.381087@satchel.alerce.com>


Yep, there's nothing scarier than some people at the keyboard....

I have:

  1     putative sequence (DNA, not too big 1-5kb).
  many  (1-10) empirically determined sequences (DNA) that should be
        fairly similar to the putative sequence.

It's trivial to:

  produce a pairwise alignment of each empirical sequence against the
  putative sequence.

I'd ultimately like to produce (actually, I don't have much choice...):

  an almost multiple-alignment-like figure, with the putative sequence
  (e.g.) along the bottom and the empirical sequences piled up above
  it, gapped in emp. and put. sequences where necessary.

It's pretty much similar to piling up a bunch of EST's/cDNA vs. the
corresponding genomic, with a simpler gapping model (no splicing,
etc...).

Ideally I'd like to just wave my hands and have it all work :).  I'll
write code if I have to....

Since I'm not really looking for a multiple alignment, I'd like to
avoid the cost of actually computing/approximating one.

I'd like to just do the pairwise alignments then shoehorn the results
into an existing bioperl multiple-alignment representation and play
with it from there.

I'd love comments (and will happily pay in beer/coffee next time
you're in Berkeley) about:

    - existing tools that do just this (or even close)

    - what's the cleanest bioperl object to shoehorn it into?
      SimpleAlign?  Align?

    - given one of those objects, are where should I start digging for
      pretty output routines?

    - if there's nothing particularly useful, any suggestions on how
      to structure things so that I can deposit them into BioPerl?

    - should i just punt and run something (e.g. clustalw, pileup)
      against the putative and all the empirical and be done with it?
      Tool recommendations?

Thanks,

g.
From wackattack at gmail.com  Sat Jul 23 23:53:18 2005
From: wackattack at gmail.com (Wacki)
Date: Sun Jul 24 14:52:37 2005
Subject: [Bioperl-l] Accessing database settings
Message-ID: <2b8a4eeb05072320531f2349b8@mail.gmail.com>

I installed bioperl but didn't use the right password/username combo. How do 
I change it? Thanks for the help.

From amackey at pcbi.upenn.edu  Mon Jul 25 08:25:50 2005
From: amackey at pcbi.upenn.edu (Aaron J. Mackey)
Date: Mon Jul 25 08:19:01 2005
Subject: [Bioperl-l] Bioperl-ext, Staden and x86_64
In-Reply-To: <OF4EFD0CA2.B0ED7AFA-ONCA257043.0015B69B-CA257043.00170D76@nre.vic.gov.au>
References: <OF4EFD0CA2.B0ED7AFA-ONCA257043.0015B69B-CA257043.00170D76@nre.vic.gov.au>
Message-ID: <408F0127-3C94-4DD7-9B62-B7A6EE369B39@pcbi.upenn.edu>


I haven't looked, but if it's using an autoconf-like build system  
(i.e. you first type "configure" then "make"), you may need to add "-- 
enable-shared" to the "configure" invocation.

-Aaron

On Jul 19, 2005, at 12:11 AM, Andrew.Mather@dpi.vic.gov.au wrote:

> I looked around and found Verison 1.9.0 on
> Sourceforge and this appears to compile cleanly, however it doesn't  
> look
> like it's left any .so files in /usr/local/lib  (or anywhere else  
> for that
> matter).
>

From chiromatzo at gmail.com  Mon Jul 25 09:20:28 2005
From: chiromatzo at gmail.com (Alynne Chiromatzo)
Date: Mon Jul 25 09:13:10 2005
Subject: [Bioperl-l] How can I acess the alignmet score of the axt file?
Message-ID: <5865004505072506204715b019@mail.gmail.com>

Hi! I'm working with axt files. I need to know how can I acess the
aligment score from the axt file. I've tried to use the
$hsp->raw_score but it isn't worked. Anyone can help me?

Thanks.
Alynne Oya.

From jbedell at oriongenomics.com  Mon Jul 25 10:04:26 2005
From: jbedell at oriongenomics.com (Joseph Bedell)
Date: Mon Jul 25 09:55:08 2005
Subject: [Bioperl-l] Stop him before he codes
	again![almost-multiple-alignment tool?]
Message-ID: <434AF352F9D03C4C896782B8CC78BC7687FEE7@VADER.oriongenomics.com>

Hey George,

Have you looked at the display option of -m 1 in NCBI BLAST? That gives
a multiple sequence alignment-like output.

Joey

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Joseph A Bedell, Ph.D.         office: 314-615-6979 
Director, Bioinformatics         fax:    314-615-6975 
Orion Genomics                   cell:   314-518-1343
4041 Forest Park Ave
St. Louis, MO 63108
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>-----Original Message-----
>From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-
>bounces@portal.open-bio.org] On Behalf Of George Hartzell
>Sent: Saturday, July 23, 2005 3:45 PM
>To: BioPerl MailingList
>Subject: [Bioperl-l] Stop him before he codes again![almost-multiple-
>alignment tool?]
>
>
>Yep, there's nothing scarier than some people at the keyboard....
>
>I have:
>
>  1     putative sequence (DNA, not too big 1-5kb).
>  many  (1-10) empirically determined sequences (DNA) that should be
>        fairly similar to the putative sequence.
>
>It's trivial to:
>
>  produce a pairwise alignment of each empirical sequence against the
>  putative sequence.
>
>I'd ultimately like to produce (actually, I don't have much choice...):
>
>  an almost multiple-alignment-like figure, with the putative sequence
>  (e.g.) along the bottom and the empirical sequences piled up above
>  it, gapped in emp. and put. sequences where necessary.
>
>It's pretty much similar to piling up a bunch of EST's/cDNA vs. the
>corresponding genomic, with a simpler gapping model (no splicing,
>etc...).
>
>Ideally I'd like to just wave my hands and have it all work :).  I'll
>write code if I have to....
>
>Since I'm not really looking for a multiple alignment, I'd like to
>avoid the cost of actually computing/approximating one.
>
>I'd like to just do the pairwise alignments then shoehorn the results
>into an existing bioperl multiple-alignment representation and play
>with it from there.
>
>I'd love comments (and will happily pay in beer/coffee next time
>you're in Berkeley) about:
>
>    - existing tools that do just this (or even close)
>
>    - what's the cleanest bioperl object to shoehorn it into?
>      SimpleAlign?  Align?
>
>    - given one of those objects, are where should I start digging for
>      pretty output routines?
>
>    - if there's nothing particularly useful, any suggestions on how
>      to structure things so that I can deposit them into BioPerl?
>
>    - should i just punt and run something (e.g. clustalw, pileup)
>      against the putative and all the empirical and be done with it?
>      Tool recommendations?
>
>Thanks,
>
>g.
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l@portal.open-bio.org
>http://portal.open-bio.org/mailman/listinfo/bioperl-l

From brian_osborne at cognia.com  Mon Jul 25 10:11:16 2005
From: brian_osborne at cognia.com (Brian Osborne)
Date: Mon Jul 25 10:01:44 2005
Subject: [Bioperl-l] "Be forgiving in what you accept" and
	Bio::Tools::GuessSeqFormat
In-Reply-To: <200507211934.j6LJYJO3007600@satchel.alerce.com>
Message-ID: <BF0A6B44.2EB2%brian_osborne@cognia.com>

George,

Done.

Brian O.


On 7/21/05 3:34 PM, "George Hartzell" <hartzell@kestrel.alerce.com> wrote:

> 
> There's a great "old" Internet maxim, "Be forgiving in what you accept
> and strict in what you send".
> 
> The Bio::Seqio modules seem to be able to cope with "fasta" formatted
> files that have a space separating the ">" from the rest of the line
> (e.g.  "> ape") if a) you explicitly specify the format or b) if you
> have the sequence in a file that ends in "fa" (or generally matches
> the list of patterns that correspond to fasta file names).
> 
> But, if you happen to have the sequence in a file with a funny name
> (e.g. /var/tmp/apreq23ZHis [aka a form upload]) then it fails.  It
> can't guess based on the filename and the file content test is strict
> and wants to see the header line without the whitespace (">ape").
> 
> Is there any reason not to extend the regexp a bit and relax that
> constraint (since everything else seems to cope with it)?
> 
> Something like this:
> 
> *** /usr/local/lib/perl5/site_perl/5.8.6/Bio/Tools/GuessSeqFormat.pm.orig Thu
> Jul 21 12:30:55 2005
> --- /usr/local/lib/perl5/site_perl/5.8.6/Bio/Tools/GuessSeqFormat.pm Thu Jul
> 21 12:31:45 2005
> ***************
> *** 591,595 ****
>       my ($line, $lineno) = (shift, shift);
>       return (($lineno != 1 && $line =~ /^[A-IK-NP-Z]+$/i) ||
> !             $line =~ /^>\w/);
>   }
>   
> --- 591,595 ----
>       my ($line, $lineno) = (shift, shift);
>       return (($lineno != 1 && $line =~ /^[A-IK-NP-Z]+$/i) ||
> !             $line =~ /^>\s*\w/);
>   }
>   
> g.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l


From cain at cshl.edu  Mon Jul 25 10:34:09 2005
From: cain at cshl.edu (Scott Cain)
Date: Mon Jul 25 10:25:18 2005
Subject: [Bioperl-l] Accessing database settings
In-Reply-To: <2b8a4eeb05072320531f2349b8@mail.gmail.com>
References: <2b8a4eeb05072320531f2349b8@mail.gmail.com>
Message-ID: <1122302049.3293.5.camel@localhost.localdomain>

Hello,

I am guessing you are writing about accessing a GFF database, since you
are also posting questions to the gbrowse mailing list.  Also, from your
posts there, I'm guessing that you have figured it out.  If I am not
guessing correctly, please re-ask the question and make it a little more
clear what you mean.

Scott


On Sat, 2005-07-23 at 22:53 -0500, Wacki wrote:
> I installed bioperl but didn't use the right password/username combo. How do 
> I change it? Thanks for the help.
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain@cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory

From jason.stajich at duke.edu  Mon Jul 25 10:44:54 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Mon Jul 25 10:35:17 2005
Subject: [Bioperl-l] Accessing database settings
In-Reply-To: <1122302049.3293.5.camel@localhost.localdomain>
References: <2b8a4eeb05072320531f2349b8@mail.gmail.com>
	<1122302049.3293.5.camel@localhost.localdomain>
Message-ID: <3a974032dc21830d0456985486249ef8@duke.edu>

I assume you just mean for setting up the tests?  See t/BioDBGFF.t to  
see where it reads the conf from (t/data/dbfa).

-jason
On Jul 25, 2005, at 7:34 AM, Scott Cain wrote:

> Hello,
>
> I am guessing you are writing about accessing a GFF database, since you
> are also posting questions to the gbrowse mailing list.  Also, from  
> your
> posts there, I'm guessing that you have figured it out.  If I am not
> guessing correctly, please re-ask the question and make it a little  
> more
> clear what you mean.
>
> Scott
>
>
> On Sat, 2005-07-23 at 22:53 -0500, Wacki wrote:
>> I installed bioperl but didn't use the right password/username combo.  
>> How do
>> I change it? Thanks for the help.
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l@portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> --  
> ----------------------------------------------------------------------- 
> -
> Scott Cain, Ph. D.                                          
> cain@cshl.edu
> GMOD Coordinator (http://www.gmod.org/)                      
> 216-392-3087
> Cold Spring Harbor Laboratory
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
--
Jason Stajich
http://www.duke.edu/~jes12
jason.stajich -at- duke.edu

From avilella at gmail.com  Mon Jul 25 11:06:53 2005
From: avilella at gmail.com (Albert Vilella)
Date: Mon Jul 25 10:57:35 2005
Subject: [Bioperl-l] How can I acess the alignmet score of the axt file?
In-Reply-To: <5865004505072506204715b019@mail.gmail.com>
References: <5865004505072506204715b019@mail.gmail.com>
Message-ID: <1122304013.8201.5.camel@localhost.localdomain>

El dl 25 de 07 del 2005 a les 10:20 -0300, en/na Alynne Chiromatzo va
escriure:
> Hi! I'm working with axt files. I need to know how can I acess the
> aligment score from the axt file. I've tried to use the
> $hsp->raw_score but it isn't worked. Anyone can help me?

looking at bioperl-live/t/SearchIO.t it seems that raw_score is for a
$hit, whereas the method for $hsp would be "score":

while( my $hit = $r->next_hit ) {
    my $d = shift @dcompare;
    ok($hit->name, shift @$d);
    ok($hit->length, shift @$d);
    ok($hit->raw_score, shift @$d);
    ok($hit->significance, shift @$d);
    
    my $hsp = $hit->next_hsp;

so maybe $hsp->score?

Hope it helps,

    Albert.

> 
> Thanks.
> Alynne Oya.
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l

From taerwin at tpg.com.au  Mon Jul 25 19:30:05 2005
From: taerwin at tpg.com.au (Tim Erwin)
Date: Mon Jul 25 19:27:04 2005
Subject: [Bioperl-l] Blast : Bus Error
In-Reply-To: <7c7aa474050721025839062a98@mail.gmail.com>
References: <7c7aa474050721014961ce6a6f@mail.gmail.com>
	<2fb209dd0507210212672ea750@mail.gmail.com>
	<20050721094219.GA14638@ebi.ac.uk>
	<7c7aa474050721025839062a98@mail.gmail.com>
Message-ID: <1122334205.1779.101.camel@bacp4>

I don't think it would be the linux installation as this would be set up
on different partitions. It could be faulty memory you can test it with:

http://www.memtestosx.org/

The only other thing that I can think of is are you using the right
binary? (Did you download a linux binary?)

Regards,

Tim

On Thu, 2005-07-21 at 11:58 +0200, Ferdinand Marl?taz wrote:
> Well, I excclude memory problems (2 GB RAM on these machines) and
> Database SIze problems (The error happens both with large and little
> like 50 Mo DB). On top of that, I've already perform on the two
> computers identical blast searches and the other computer runs very
> well...
> I don't think about Hardware problems too because this bugging
> computer have led similar searches in the past without problem... So,
> something could happened in the configuration what makes the blast
> process faulty !  I just know that somebody have try to install linux
> on this computer and don't manage to finish this installation. Maybe a
> source of my current problems ?
> 
> What do you all think about that ?
> 
> Thanks 
> 
> Ferdi
> 
> 
> 2005/7/21, Andreas Kahari <ak@ebi.ac.uk>:
> > [not to the list]
> > 
> > Hi guys,
> > 
> > There could also be a problem with a faulty memory module...  If
> > the error is not consistently reproducible, then this is one
> > possible cause.
> > 
> > Running out of memory should not produce a Bus Error.  It might
> > produce a Segmentation Fault if the program doesn't care that
> > the memory allocation failed, but not a Bus Error (as far as I
> > know, but I don't run OS X here).
> > 
> > A way to diagnose this is to run exactly the same set-up on two
> > identical machines until one of them causes the error more than
> > once.  If the other machine seems to run ok then it is very
> > possible that there is a hardware fault on the first machine (or
> > some important system configuration setting is different without
> > you knowing it).
> > 
> > Regards,
> > Andreas
> > 
> > On Thu, Jul 21, 2005 at 11:12:26AM +0200, Laurent DOUCHY wrote:
> > > Hello,
> > > This problem can happen for several reasons :
> > > your ram is not sufficiant and /or  you are working against a db like
> > > nt too big for the combination PPC/blast/db; First verify your ram
> > > (500Mo are not enougth) , secondly try to work when you can on a part
> > > of nt ; try to  check the blast optimised by the Bioteam...
> > > Cordially
> > >
> > > LN
> > >
> > > 2005/7/21, Ferdinand Marl?taz <ferdinand.marletaz@gmail.com>:
> > > > Hi,
> > > >
> > > > I know my current problem is only farly related with bioperl but maybe
> > > > omebody would have already encountered it so, it can be tryed...
> > > >
> > > > I try to run blast (tblastx) on a G5 powermac computer (OS : OS 10.4
> > > > Tiger but the same was happening with 10.3 Panther), it starts perfect
> > > > normal but after sometimes, it stops and displays either 'bus error'
> > > > or 'segmentation fault'... I'm quite surprised because I've never got
> > > > this problem on a second identical G5 in my lab ? I've try to change
> > > > blast version from 2.10 to 2.11... but it don't solved the problem.
> > > > I verify that it's not related to my databases in reformating them
> > > > from fasta...
> > > >
> > > > So, I don't see where the problem can come from ? Does anybody have
> > > > encountered such problems or erros and have a solution or an idea
> > > > because I'd like to avoid reinstalling the system on this machine
> > > > cause loss of time...
> > [cut]
> > 
> > --
> > Andreas K?h?ri
> > 
> > EMBL-EBI/ensembl
> > www.ensembl.org
> > 
> > 1024D/C2E163CB F4C4 A41A 665B 448A 3FA9  6AEA 12E3 39DA C2E1 63CB
> >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

From Andrew.Mather at dpi.vic.gov.au  Mon Jul 25 21:24:51 2005
From: Andrew.Mather at dpi.vic.gov.au (Andrew.Mather@dpi.vic.gov.au)
Date: Mon Jul 25 21:16:58 2005
Subject: [Bioperl-l] Bioperl-ext, Staden and x86_64
Message-ID: <OFEFB68C2D.3C0082DB-ONCA25704A.0005B9B0-CA25704A.0007C4C3@nre.vic.gov.au>


Hi Aaron,

Yes, it does use the autoconf build system, but unfortunately,
--enable-shared (and --enable-shared=yes) made no observable difference.

This one's got me stumped.  The older version compiles and installs without
problems on the x86 machines, but the Opterons don't seem to want to
cooperate.

Andrew


|---------+---------------------------->
|         |           amackey@pcbi.upen|
|         |           n.edu            |
|         |                            |
|         |           25/07/2005 10:25 |
|         |           PM               |
|         |                            |
|---------+---------------------------->
  >------------------------------------------------------------------------------------------------------------------------------|
  |                                                                                                                              |
  |       To:       Andrew.Mather@dpi.vic.gov.au                                                                                 |
  |       cc:       bioperl-l@portal.open-bio.org                                                                                |
  |       Subject:  Re: [Bioperl-l] Bioperl-ext, Staden and x86_64                                                               |
  >------------------------------------------------------------------------------------------------------------------------------|


I haven't looked, but if it's using an autoconf-like build system
(i.e. you first type "configure" then "make"), you may need to add "--
enable-shared" to the "configure" invocation.

-Aaron

On Jul 19, 2005, at 12:11 AM, Andrew.Mather@dpi.vic.gov.au wrote:

> I looked around and found Verison 1.9.0 on
> Sourceforge and this appears to compile cleanly, however it doesn't
> look
> like it's left any .so files in /usr/local/lib  (or anywhere else
> for that
> matter).
>


From n.haigh at sheffield.ac.uk  Tue Jul 26 09:02:49 2005
From: n.haigh at sheffield.ac.uk (Nathan Haigh)
Date: Tue Jul 26 08:53:03 2005
Subject: [Bioperl-l] getting pubmed id from genbank files
Message-ID: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAAIBuBbth5gEeZS9eEGziB38KAAAAQAAAA9xt5Xc6HlkeY96g628hhJwEAAAAA@sheffield.ac.uk>

I want to be able to supply a list of GI's, retrieve the genbank files and
parse out the pubmed id's.

 
I know I can do the first steps of retrieving the genbank files directly,
but how do I get the pubmed id's? I've been playing around with things and
haven't yet found out if this can be done.

 
Cheers,

Nathan

 
----------------------------------

Nathan Haigh

Bioinformatics PostDoctoral Research Associate

 
Room B2 211

Department of Animal and Plant Sciences

University of Sheffield

Western Bank

Sheffield

S10 2TN

 
Tel: +44 (0)114 22 20112

Mob: +44 (0)7742 533 569

Fax: +44 (0)114 22 20002

 
From jason.stajich at duke.edu  Tue Jul 26 10:28:15 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Tue Jul 26 10:19:08 2005
Subject: [Bioperl-l] getting pubmed id from genbank files
Message-ID: <ab2c1075c846d68ea0f4efe4623050f3@duke.edu>


Here is part of the synopsis in Bio::Seq:

     foreach my $ref ( $ann->get_Annotations('reference') ) {
         print "Reference ",$ref->title,"\n";
     }

  so do $ref->pubmed instead of $ref->title.


-jason
> On Jul 26, 2005, at 6:02 AM, Nathan Haigh wrote:
>
>> I want to be able to supply a list of GI's, retrieve the genbank 
>> files and
>> parse out the pubmed id's.
>>
>>
>>
>> I know I can do the first steps of retrieving the genbank files 
>> directly,
>> but how do I get the pubmed id's? I've been playing around with 
>> things and
>> haven't yet found out if this can be done.
>>
>>
>>
>> Cheers,
>>
>> Nathan
>>
>>
>>
>> ----------------------------------
>>
>> Nathan Haigh
>>
>> Bioinformatics PostDoctoral Research Associate
>>
>>
>>
>> Room B2 211
>>
>> Department of Animal and Plant Sciences
>>
>> University of Sheffield
>>
>> Western Bank
>>
>> Sheffield
>>
>> S10 2TN
>>
>>
>>
>> Tel: +44 (0)114 22 20112
>>
>> Mob: +44 (0)7742 533 569
>>
>> Fax: +44 (0)114 22 20002
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l@portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>
> --
> Jason Stajich
> http://www.duke.edu/~jes12
> jason.stajich -at- duke.edu
>
>
--
Jason Stajich
http://www.duke.edu/~jes12
jason.stajich -at- duke.edu

From wackattack at gmail.com  Mon Jul 25 17:28:01 2005
From: wackattack at gmail.com (Wacki)
Date: Tue Jul 26 10:23:25 2005
Subject: [Bioperl-l] Accessing database settings
In-Reply-To: <1122302049.3293.5.camel@localhost.localdomain>
References: <2b8a4eeb05072320531f2349b8@mail.gmail.com>
	<1122302049.3293.5.camel@localhost.localdomain>
Message-ID: <2b8a4eeb050725142819660772@mail.gmail.com>

When I run: bp_load_gff.pl -c -d volvox volvox_all.fa volvox_all.gff


I get:
------------------------------------------------------------------------------------------------------------------------------------------------

DBI connect('volvox','',...) failed: Access denied for user ''@'localhost' 
to database 'volvox' at 
/usr/lib/perl5/site_perl/5.8.5/Bio/DB/GFF/Adaptor/dbi/caching_handle.pm line 
139

------------- EXCEPTION -------------
MSG: Can't connect to database: Access denied for user ''@'localhost' to 
database 'volvox'
STACK Bio::DB::GFF::Adaptor::dbi::caching_handle::new 
/usr/lib/perl5/site_perl/5.8.5/Bio/DB/GFF/Adaptor/dbi/caching_handle.pm:89
STACK Bio::DB::GFF::Adaptor::dbi::new 
/usr/lib/perl5/site_perl/5.8.5/Bio/DB/GFF/Adaptor/dbi.pm:93
STACK Bio::DB::GFF::Adaptor::dbi::mysql::new 
/usr/lib/perl5/site_perl/5.8.5/Bio/DB/GFF/Adaptor/dbi/mysql.pm:270
STACK Bio::DB::GFF::new /usr/lib/perl5/site_perl/5.8.5/Bio/DB/GFF.pm:599
STACK toplevel /usr/bin/bp_load_gff.pl:103

--------------------------------------


--------------------------------------


On 7/25/05, Scott Cain <cain@cshl.edu> wrote:
> 
> Hello,
> 
> I am guessing you are writing about accessing a GFF database, since you
> are also posting questions to the gbrowse mailing list. Also, from your
> posts there, I'm guessing that you have figured it out. If I am not
> guessing correctly, please re-ask the question and make it a little more
> clear what you mean.
> 
> Scott
> 
> 
> On Sat, 2005-07-23 at 22:53 -0500, Wacki wrote:
> > I installed bioperl but didn't use the right password/username combo. 
> How do
> > I change it? Thanks for the help.
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> --
> ------------------------------------------------------------------------
> Scott Cain, Ph. D. cain@cshl.edu
> GMOD Coordinator (http://www.gmod.org/) 216-392-3087
> Cold Spring Harbor Laboratory
> 
>

From n.haigh at sheffield.ac.uk  Tue Jul 26 10:49:22 2005
From: n.haigh at sheffield.ac.uk (Nathan Haigh)
Date: Tue Jul 26 10:39:36 2005
Subject: [Bioperl-l] getting pubmed id from genbank files
In-Reply-To: <ab2c1075c846d68ea0f4efe4623050f3@duke.edu>
Message-ID: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAAIBuBbth5gEeZS9eEGziB38KAAAAQAAAAEjSiWzrGeEy2NU5r+ssYWQEAAAAA@sheffield.ac.uk>

Yeah, I tried this after I found a previous post from someone wanting to do
the same thing and you suggested the same thing that time.

However, it doesn't return anything!

My script is simply:

-- snip --
use Bio::DB::GenBank;
use Data::Dumper;

my $db = Bio::DB::GenBank->new;

while (<STDIN>) {
        chomp;
        my $seq = $db->get_Seq_by_gi($_);
        my $ac = $seq->annotation;
        
        for my $ref ($ac->get_Annotations('reference')) {
                print "Reference :", $ref->title,"\t";
                print "PubMed :", $ref->pubmed,"\n";
        }
}
-- snip --

if I pass 46367591 on STDIN I get the following output:

-- snip --
Reference :Functional divergence in tandemly duplicated Arabidopsis thaliana
trypsin inhibitor genes        PubMed :
Reference :Direct Submission        PubMed :
Reference :Direct Submission        PubMed :
-- snip --

If I do Data::Dumper on $ref I get:

-- snip --
$VAR1 = bless( {
       'authors' => 'Clauss,M.J. and Mitchell-Olds,T.',
       'location' => 'Genetics 166 (3), 1419-1436 (2004) PUBMED   15082560',
       'title' => 'Functional divergence in tandemly duplicated Arabidopsis
thaliana trypsin inhibitor genes',
       'tagname' => 'reference'
     }, 'Bio::Annotation::Reference' ); 
-- snip --

The pubmed id doesn't seem to be getting parsed out! Any ideas?

Nathan

-----Original Message-----
From: Jason Stajich [mailto:jason.stajich@duke.edu] 
Sent: 26 July 2005 15:28
To: Bioperl-l@portal.open-bio.org
Cc: Nathan Haigh
Subject: [Bioperl-l] getting pubmed id from genbank files


Here is part of the synopsis in Bio::Seq:

     foreach my $ref ( $ann->get_Annotations('reference') ) {
         print "Reference ",$ref->title,"\n";
     }

  so do $ref->pubmed instead of $ref->title.


-jason
> On Jul 26, 2005, at 6:02 AM, Nathan Haigh wrote:
>
>> I want to be able to supply a list of GI's, retrieve the genbank 
>> files and
>> parse out the pubmed id's.
>>
>>
>>
>> I know I can do the first steps of retrieving the genbank files 
>> directly,
>> but how do I get the pubmed id's? I've been playing around with 
>> things and
>> haven't yet found out if this can be done.
>>
>>
>>
>> Cheers,
>>
>> Nathan
>>
>>
>>
>> ----------------------------------
>>
>> Nathan Haigh
>>
>> Bioinformatics PostDoctoral Research Associate
>>
>>
>>
>> Room B2 211
>>
>> Department of Animal and Plant Sciences
>>
>> University of Sheffield
>>
>> Western Bank
>>
>> Sheffield
>>
>> S10 2TN
>>
>>
>>
>> Tel: +44 (0)114 22 20112
>>
>> Mob: +44 (0)7742 533 569
>>
>> Fax: +44 (0)114 22 20002
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l@portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>
> --
> Jason Stajich
> http://www.duke.edu/~jes12
> jason.stajich -at- duke.edu
>
>
--
Jason Stajich
http://www.duke.edu/~jes12
jason.stajich -at- duke.edu


From golharam at umdnj.edu  Tue Jul 26 12:05:51 2005
From: golharam at umdnj.edu (Ryan Golhar)
Date: Tue Jul 26 11:53:06 2005
Subject: [Bioperl-l] Parsing EMBOSS::needle output
Message-ID: <00bd01c591fb$e53cc0f0$0301a8c0@GOLHARMOBILE1>

I'm trying to parse the output of EMBOSS::needle (EMBOSS 3.0.0) using

`needle -asequence /tmp/genbank.cds -bsequence ../Seq/$tuple/$organism -
gapopen 10 -gapextend 0.5 -outfile /tmp/compare.needle 2>/dev/null`;

my $alnobj = new Bio::AlignIO(-format => 'emboss',
                              -file   => '/tmp/compare.needle');
my $alignment = $alnobj->next_aln;
print "\tPercentage Identity: ", $alignment->percentage_identity, "\n";

However $alignment never gets defined.  $alnobj never returns an
alignment object.   I saw other posts relating to this but not
solutions...

Any ideas?

Ryan

From cain at cshl.edu  Tue Jul 26 12:18:47 2005
From: cain at cshl.edu (Scott Cain)
Date: Tue Jul 26 12:09:34 2005
Subject: [Bioperl-l] Accessing database settings
In-Reply-To: <2b8a4eeb050725142819660772@mail.gmail.com>
References: <2b8a4eeb05072320531f2349b8@mail.gmail.com>
	<1122302049.3293.5.camel@localhost.localdomain>
	<2b8a4eeb050725142819660772@mail.gmail.com>
Message-ID: <1122394727.3293.41.camel@localhost.localdomain>

Generally, the load script will get the user name from the shell, but in
your case it seems to not be picking it up.  From `perldoc
bp_load_gff.pl`, you can supply a --user argument to supply a MySQL
username.  This, of course, assumes that you have already granted your
MySQL user permission to operate on the database as the directions in
the INSTALL doc and the tutorial indicate.

Scott


On Mon, 2005-07-25 at 17:28 -0400, Wacki wrote:
> When I run:  bp_load_gff.pl -c -d volvox volvox_all.fa volvox_all.gff
> 
> 
> I get:
> ------------------------------------------------------------------------------------------------------------------------------------------------
> 
> DBI connect('volvox','',...) failed: Access denied for user
> ''@'localhost' to database 'volvox'
> at /usr/lib/perl5/site_perl/5.8.5/Bio/DB/GFF/Adaptor/dbi/caching_handle.pm line 139
> 
> ------------- EXCEPTION  -------------
> MSG: Can't connect to database: Access denied for user ''@'localhost'
> to database 'volvox'
> STACK
> Bio::DB::GFF::Adaptor::dbi::caching_handle::new /usr/lib/perl5/site_perl/5.8.5/Bio/DB/GFF/Adaptor/dbi/caching_handle.pm:89
> STACK
> Bio::DB::GFF::Adaptor::dbi::new /usr/lib/perl5/site_perl/5.8.5/Bio/DB/GFF/Adaptor/dbi.pm:93
> STACK
> Bio::DB::GFF::Adaptor::dbi::mysql::new /usr/lib/perl5/site_perl/5.8.5/Bio/DB/GFF/Adaptor/dbi/mysql.pm:270
> STACK
> Bio::DB::GFF::new /usr/lib/perl5/site_perl/5.8.5/Bio/DB/GFF.pm:599
> STACK toplevel /usr/bin/bp_load_gff.pl:103
> 
> --------------------------------------
> 
> 
> --------------------------------------
> 
> 
> On 7/25/05, Scott Cain <cain@cshl.edu> wrote:
>         Hello,
>         
>         I am guessing you are writing about accessing a GFF database,
>         since you
>         are also posting questions to the gbrowse mailing list.  Also,
>         from your
>         posts there, I'm guessing that you have figured it out.  If I
>         am not 
>         guessing correctly, please re-ask the question and make it a
>         little more
>         clear what you mean.
>         
>         Scott
>         
>         
>         On Sat, 2005-07-23 at 22:53 -0500, Wacki wrote:
>         > I installed bioperl but didn't use the right
>         password/username combo. How do 
>         > I change it? Thanks for the help.
>         >
>         > _______________________________________________
>         > Bioperl-l mailing list
>         > Bioperl-l@portal.open-bio.org
>         > http://portal.open-bio.org/mailman/listinfo/bioperl-l
>         --
>         ------------------------------------------------------------------------
>         Scott Cain, Ph. D.
>         cain@cshl.edu
>         GMOD Coordinator (http://www.gmod.org/)
>         216-392-3087
>         Cold Spring Harbor Laboratory
>         
> 
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain@cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory

From hlapp at gnf.org  Tue Jul 26 13:05:14 2005
From: hlapp at gnf.org (Hilmar Lapp)
Date: Tue Jul 26 12:55:47 2005
Subject: [Bioperl-l] getting pubmed id from genbank files
In-Reply-To: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAAIBuBbth5gEeZS9eEGziB38KAAAAQAAAAEjSiWzrGeEy2NU5r+ssYWQEAAAAA@sheffield.ac.uk>
References: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAAIBuBbth5gEeZS9eEGziB38KAAAAQAAAAEjSiWzrGeEy2NU5r+ssYWQEAAAAA@sheffield.ac.uk>
Message-ID: <c132abdd7e39f9343e1d00aaf89fbda0@gnf.org>


On Jul 26, 2005, at 7:49 AM, Nathan Haigh wrote:

> -- snip --
> $VAR1 = bless( {
>        'authors' => 'Clauss,M.J. and Mitchell-Olds,T.',
>        'location' => 'Genetics 166 (3), 1419-1436 (2004) PUBMED   
> 15082560',
>        'title' => 'Functional divergence in tandemly duplicated 
> Arabidopsis
> thaliana trypsin inhibitor genes',
>        'tagname' => 'reference'
>      }, 'Bio::Annotation::Reference' );
> -- snip --

This is odd. The PUBMED line should not be concatenated with the 
JOURNAL line. I wonder where this happens and why. Can you download the 
record from NCBI (using the web interface, format 'GenBank', 'Send all 
to file') and then parse it with Bio::SeqIO? If it works then the 
problem must be in the code that deals with the HTTP-response.

	-hilmar


>
> -----Original Message-----
> From: Jason Stajich [mailto:jason.stajich@duke.edu]
> Sent: 26 July 2005 15:28
> To: Bioperl-l@portal.open-bio.org
> Cc: Nathan Haigh
> Subject: [Bioperl-l] getting pubmed id from genbank files
>
>
>
> Here is part of the synopsis in Bio::Seq:
>
>      foreach my $ref ( $ann->get_Annotations('reference') ) {
>          print "Reference ",$ref->title,"\n";
>      }
>
>   so do $ref->pubmed instead of $ref->title.
>
>
> -jason
>> On Jul 26, 2005, at 6:02 AM, Nathan Haigh wrote:
>>
>>> I want to be able to supply a list of GI's, retrieve the genbank
>>> files and
>>> parse out the pubmed id's.
>>>
>>>
>>>
>>> I know I can do the first steps of retrieving the genbank files
>>> directly,
>>> but how do I get the pubmed id's? I've been playing around with
>>> things and
>>> haven't yet found out if this can be done.
>>>
>>>
>>>
>>> Cheers,
>>>
>>> Nathan
>>>
>>>
>>>
>>> ----------------------------------
>>>
>>> Nathan Haigh
>>>
>>> Bioinformatics PostDoctoral Research Associate
>>>
>>>
>>>
>>> Room B2 211
>>>
>>> Department of Animal and Plant Sciences
>>>
>>> University of Sheffield
>>>
>>> Western Bank
>>>
>>> Sheffield
>>>
>>> S10 2TN
>>>
>>>
>>>
>>> Tel: +44 (0)114 22 20112
>>>
>>> Mob: +44 (0)7742 533 569
>>>
>>> Fax: +44 (0)114 22 20002
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l@portal.open-bio.org
>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>
>> --
>> Jason Stajich
>> http://www.duke.edu/~jes12
>> jason.stajich -at- duke.edu
>>
>>
> --
> Jason Stajich
> http://www.duke.edu/~jes12
> jason.stajich -at- duke.edu
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From hartzell at kestrel.alerce.com  Tue Jul 26 14:54:23 2005
From: hartzell at kestrel.alerce.com (George Hartzell)
Date: Tue Jul 26 14:44:42 2005
Subject: [Bioperl-l] is the Bio::Ext::Align stuff supposed to work?
Message-ID: <17126.34527.231530.271197@satchel.alerce.com>


I've been playing with Bio::Tools::dpAlign, which involved installing
Bio::Ext.

Bio::Ext did a really poor job of installing itself (FreeBSD
6-{various}, perl 5.8.[67]).  I managed to mv and cp the various parts
around to where they were supposed to be.

I'm not sure if it's me, FreeBSD, or Bio::Ext.  Does it work for other
folks?  The tests all work fine, they get away with some judicious
-I../this-that-the-other, but if you copy e.g. the Align test file to
your home directory and just try to run it, it doesn't work.

In particular, the .so and .bs files didn't end up where they belong,
and I ended up with /.../Bio/Ext/Align/Align.pm instead
/.../Bio/Ext/Align.pm.

I'm sure I can figure it out and pass some patches back, just wanted
to understand who else might be seeing the problem.

g. 
From hlapp at gnf.org  Tue Jul 26 16:08:33 2005
From: hlapp at gnf.org (Hilmar Lapp)
Date: Tue Jul 26 15:58:56 2005
Subject: [Bioperl-l] getting pubmed id from genbank files
In-Reply-To: <CFE1DF3BA20F424689DA0881A14055BE863ECF@m.hg.genetics.utah.edu>
References: <CFE1DF3BA20F424689DA0881A14055BE863ECF@m.hg.genetics.utah.edu>
Message-ID: <9af6be32bc65e26bc28dfe18d1d68d8e@gnf.org>

There are indeed JOURNAL entries spanning multiple lines; the parser 
was once unable to deal with this and was subsequently fixed ... as we 
see this introduced other problems ...

On Jul 26, 2005, at 1:07 PM, Barry Moore wrote:

> Nathan-
>
> That sounds like you are using bioperl 1.4?  The error is in
> Bio/SeqIO/genbank.pm  and was fixed by Jason in cvs version 1.102 of
> that file.  However the current code still looks a bit odd to me.
> Starting at line 1068 of the current cvs version (1.119) of genebank.pm
> we have:
>
> 1068  if (/^\s{2}JOURNAL\s+(.*)/o) {
> 1069     push(@loc, $1);
> 1070     while ( defined($_ = $self->_readline) ) {
> 1071           # we only match when there are at least 4 spaces
> 1072           # there is probably a better way to match this
> 1073           # as it assumes that the describing tag is short enough
> 1074           /^\s{4,}(.*)/o && do { push(@loc, $1);
> 1075           next;
> 1076     };
> 1077     last;
> 1078  }
> 1079  $ref->location(join(' ', @loc));
>
> This is all dealing with parsing the Journal line which is handled fine
> by lines 1068-69.  The while loop at 1070 looks at successive lines to
> find something to add to the Journal line.  The regex at line 1074 used
> to read /^\s{3,}(.*)/o which would not match if the next line after
> JOURNAL began with '  MEDLINE', but would match '   PUBMED' (Nathan's
> situation) causing that line to be added to the JOURNAL line.  Is there
> ever a JOURNAL entry with more than one line?  If so, shouldn't the
> following lines always be untagged and thus indented 12 making the 
> regex
> /^\s{12}(.*)/o safer.  The current situation would add any line to
> JOURNAL line if it's tag is shorter than 6 characters, and I don't 
> think
> that's what we want.
>
> Barry
>
> -----Original Message-----
> From: bioperl-l-bounces@portal.open-bio.org
> [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Hilmar Lapp
> Sent: Tuesday, July 26, 2005 11:05 AM
> To: n.haigh@sheffield.ac.uk
> Cc: 'bioperl-l'
> Subject: Re: [Bioperl-l] getting pubmed id from genbank files
>
>
> On Jul 26, 2005, at 7:49 AM, Nathan Haigh wrote:
>
>> -- snip --
>> $VAR1 = bless( {
>>        'authors' => 'Clauss,M.J. and Mitchell-Olds,T.',
>>        'location' => 'Genetics 166 (3), 1419-1436 (2004) PUBMED
>> 15082560',
>>        'title' => 'Functional divergence in tandemly duplicated
>> Arabidopsis
>> thaliana trypsin inhibitor genes',
>>        'tagname' => 'reference'
>>      }, 'Bio::Annotation::Reference' );
>> -- snip --
>
> This is odd. The PUBMED line should not be concatenated with the
> JOURNAL line. I wonder where this happens and why. Can you download the
> record from NCBI (using the web interface, format 'GenBank', 'Send all
> to file') and then parse it with Bio::SeqIO? If it works then the
> problem must be in the code that deals with the HTTP-response.
>
> 	-hilmar
>
>
>>
>> -----Original Message-----
>> From: Jason Stajich [mailto:jason.stajich@duke.edu]
>> Sent: 26 July 2005 15:28
>> To: Bioperl-l@portal.open-bio.org
>> Cc: Nathan Haigh
>> Subject: [Bioperl-l] getting pubmed id from genbank files
>>
>>
>>
>> Here is part of the synopsis in Bio::Seq:
>>
>>      foreach my $ref ( $ann->get_Annotations('reference') ) {
>>          print "Reference ",$ref->title,"\n";
>>      }
>>
>>   so do $ref->pubmed instead of $ref->title.
>>
>>
>> -jason
>>> On Jul 26, 2005, at 6:02 AM, Nathan Haigh wrote:
>>>
>>>> I want to be able to supply a list of GI's, retrieve the genbank
>>>> files and
>>>> parse out the pubmed id's.
>>>>
>>>>
>>>>
>>>> I know I can do the first steps of retrieving the genbank files
>>>> directly,
>>>> but how do I get the pubmed id's? I've been playing around with
>>>> things and
>>>> haven't yet found out if this can be done.
>>>>
>>>>
>>>>
>>>> Cheers,
>>>>
>>>> Nathan
>>>>
>>>>
>>>>
>>>> ----------------------------------
>>>>
>>>> Nathan Haigh
>>>>
>>>> Bioinformatics PostDoctoral Research Associate
>>>>
>>>>
>>>>
>>>> Room B2 211
>>>>
>>>> Department of Animal and Plant Sciences
>>>>
>>>> University of Sheffield
>>>>
>>>> Western Bank
>>>>
>>>> Sheffield
>>>>
>>>> S10 2TN
>>>>
>>>>
>>>>
>>>> Tel: +44 (0)114 22 20112
>>>>
>>>> Mob: +44 (0)7742 533 569
>>>>
>>>> Fax: +44 (0)114 22 20002
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l@portal.open-bio.org
>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>> --
>>> Jason Stajich
>>> http://www.duke.edu/~jes12
>>> jason.stajich -at- duke.edu
>>>
>>>
>> --
>> Jason Stajich
>> http://www.duke.edu/~jes12
>> jason.stajich -at- duke.edu
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l@portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>
> -- 
> -------------------------------------------------------------
> Hilmar Lapp                            email: lapp at gnf.org
> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> -------------------------------------------------------------
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From hlapp at gmx.net  Tue Jul 26 16:20:07 2005
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue Jul 26 16:10:40 2005
Subject: [Bioperl-l] error installing bioperl-db
In-Reply-To: <42E63FA9.9070001@wam.umd.edu>
References: <42DD78A7.5060507@wam.umd.edu>
	<00afefecc06d2d2bb22f5be09fb4410a@gmx.net>
	<42DE7912.4020300@wam.umd.edu>
	<9e25eb983a2480c4493d3f0422ad0365@gmx.net>
	<42DFD96C.1050003@wam.umd.edu>
	<e96d8ae201ae0f2ed2eee8337d4d512f@gmx.net>
	<42E0E8F0.6080904@wam.umd.edu>
	<e40c5ad1793c6c5b3e4a317347e7850b@gmx.net>
	<42E63FA9.9070001@wam.umd.edu>
Message-ID: <13450598ab5947a3aad589497b44b3e7@gmx.net>

I fixed this. Should propagate to anonymous cvs within the next hours, 
the new version of the module will be 1.3. I tested against an 
PostgreSQL 8.0.3 server and all tests pass.

For the curious, the problem was that DBD::Pg binds all parameters as 
type VARCHAR by default, and does use 'real' prepared statements by 
default if the server is 8.x but not if it's 7.3.x. This is why the 
problem only surfaces when using an 8.x server. The server apparently 
doesn't like VARCHAR-type parameters bound to the SUBSTRING arguments, 
so what I did was explicitly specify the type as integer to the 
$sth->bind_param() call.

	-hilmar

On Jul 26, 2005, at 6:50 AM, Andrew Stewart wrote:

> I updated BiosequenceAdaptorDriver.pm to 1.2. Here's the first 
> erroneous bit of the make test. Looks like the same thing?
>
> -Andrew
>
>
> preparing SELECT statement: SELECT SUBSTRING(seq FROM ? FOR ?) FROM 
> biosequence WHERE bioentry_id = ?
> ok 30
> ok 31
> DBD::Pg::st execute failed: ERROR: invalid escape string
> HINT: Escape string must be empty or one character.
> CONTEXT: SQL function "substring" statement 1
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From hlapp at gmx.net  Tue Jul 26 17:42:08 2005
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue Jul 26 17:32:35 2005
Subject: [Bioperl-l] getting pubmed id from genbank files
In-Reply-To: <CFE1DF3BA20F424689DA0881A14055BE863ED0@m.hg.genetics.utah.edu>
References: <CFE1DF3BA20F424689DA0881A14055BE863ED0@m.hg.genetics.utah.edu>
Message-ID: <0d27e80a4a44b81e1686149febdfb6f2@gmx.net>

Right - but don't tell only me :-)

On Jul 26, 2005, at 1:29 PM, Barry Moore wrote:

> Then would it be safe to assume that in the case of multi-line JOURNAL
> entries, all lines following the initial tagged JOURNAL line would be
> untagged?  If so, the regex could probably be made a bit safer.
>
> Barry
>
> -----Original Message-----
> From: Hilmar Lapp [mailto:hlapp@gnf.org]
> Sent: Tuesday, July 26, 2005 2:09 PM
> To: Barry Moore
> Cc: bioperl-l; n.haigh@sheffield.ac.uk
> Subject: Re: [Bioperl-l] getting pubmed id from genbank files
>
> There are indeed JOURNAL entries spanning multiple lines; the parser
> was once unable to deal with this and was subsequently fixed ... as we
> see this introduced other problems ...
>
> On Jul 26, 2005, at 1:07 PM, Barry Moore wrote:
>
>> Nathan-
>>
>> That sounds like you are using bioperl 1.4?  The error is in
>> Bio/SeqIO/genbank.pm  and was fixed by Jason in cvs version 1.102 of
>> that file.  However the current code still looks a bit odd to me.
>> Starting at line 1068 of the current cvs version (1.119) of
> genebank.pm
>> we have:
>>
>> 1068  if (/^\s{2}JOURNAL\s+(.*)/o) {
>> 1069     push(@loc, $1);
>> 1070     while ( defined($_ = $self->_readline) ) {
>> 1071           # we only match when there are at least 4 spaces
>> 1072           # there is probably a better way to match this
>> 1073           # as it assumes that the describing tag is short enough
>> 1074           /^\s{4,}(.*)/o && do { push(@loc, $1);
>> 1075           next;
>> 1076     };
>> 1077     last;
>> 1078  }
>> 1079  $ref->location(join(' ', @loc));
>>
>> This is all dealing with parsing the Journal line which is handled
> fine
>> by lines 1068-69.  The while loop at 1070 looks at successive lines to
>> find something to add to the Journal line.  The regex at line 1074
> used
>> to read /^\s{3,}(.*)/o which would not match if the next line after
>> JOURNAL began with '  MEDLINE', but would match '   PUBMED' (Nathan's
>> situation) causing that line to be added to the JOURNAL line.  Is
> there
>> ever a JOURNAL entry with more than one line?  If so, shouldn't the
>> following lines always be untagged and thus indented 12 making the
>> regex
>> /^\s{12}(.*)/o safer.  The current situation would add any line to
>> JOURNAL line if it's tag is shorter than 6 characters, and I don't
>> think
>> that's what we want.
>>
>> Barry
>>
>> -----Original Message-----
>> From: bioperl-l-bounces@portal.open-bio.org
>> [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Hilmar
> Lapp
>> Sent: Tuesday, July 26, 2005 11:05 AM
>> To: n.haigh@sheffield.ac.uk
>> Cc: 'bioperl-l'
>> Subject: Re: [Bioperl-l] getting pubmed id from genbank files
>>
>>
>> On Jul 26, 2005, at 7:49 AM, Nathan Haigh wrote:
>>
>>> -- snip --
>>> $VAR1 = bless( {
>>>        'authors' => 'Clauss,M.J. and Mitchell-Olds,T.',
>>>        'location' => 'Genetics 166 (3), 1419-1436 (2004) PUBMED
>>> 15082560',
>>>        'title' => 'Functional divergence in tandemly duplicated
>>> Arabidopsis
>>> thaliana trypsin inhibitor genes',
>>>        'tagname' => 'reference'
>>>      }, 'Bio::Annotation::Reference' );
>>> -- snip --
>>
>> This is odd. The PUBMED line should not be concatenated with the
>> JOURNAL line. I wonder where this happens and why. Can you download
> the
>> record from NCBI (using the web interface, format 'GenBank', 'Send all
>> to file') and then parse it with Bio::SeqIO? If it works then the
>> problem must be in the code that deals with the HTTP-response.
>>
>> 	-hilmar
>>
>>
>>>
>>> -----Original Message-----
>>> From: Jason Stajich [mailto:jason.stajich@duke.edu]
>>> Sent: 26 July 2005 15:28
>>> To: Bioperl-l@portal.open-bio.org
>>> Cc: Nathan Haigh
>>> Subject: [Bioperl-l] getting pubmed id from genbank files
>>>
>>>
>>>
>>> Here is part of the synopsis in Bio::Seq:
>>>
>>>      foreach my $ref ( $ann->get_Annotations('reference') ) {
>>>          print "Reference ",$ref->title,"\n";
>>>      }
>>>
>>>   so do $ref->pubmed instead of $ref->title.
>>>
>>>
>>> -jason
>>>> On Jul 26, 2005, at 6:02 AM, Nathan Haigh wrote:
>>>>
>>>>> I want to be able to supply a list of GI's, retrieve the genbank
>>>>> files and
>>>>> parse out the pubmed id's.
>>>>>
>>>>>
>>>>>
>>>>> I know I can do the first steps of retrieving the genbank files
>>>>> directly,
>>>>> but how do I get the pubmed id's? I've been playing around with
>>>>> things and
>>>>> haven't yet found out if this can be done.
>>>>>
>>>>>
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Nathan
>>>>>
>>>>>
>>>>>
>>>>> ----------------------------------
>>>>>
>>>>> Nathan Haigh
>>>>>
>>>>> Bioinformatics PostDoctoral Research Associate
>>>>>
>>>>>
>>>>>
>>>>> Room B2 211
>>>>>
>>>>> Department of Animal and Plant Sciences
>>>>>
>>>>> University of Sheffield
>>>>>
>>>>> Western Bank
>>>>>
>>>>> Sheffield
>>>>>
>>>>> S10 2TN
>>>>>
>>>>>
>>>>>
>>>>> Tel: +44 (0)114 22 20112
>>>>>
>>>>> Mob: +44 (0)7742 533 569
>>>>>
>>>>> Fax: +44 (0)114 22 20002
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l@portal.open-bio.org
>>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>> --
>>>> Jason Stajich
>>>> http://www.duke.edu/~jes12
>>>> jason.stajich -at- duke.edu
>>>>
>>>>
>>> --
>>> Jason Stajich
>>> http://www.duke.edu/~jes12
>>> jason.stajich -at- duke.edu
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l@portal.open-bio.org
>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>
>> -- 
>> -------------------------------------------------------------
>> Hilmar Lapp                            email: lapp at gnf.org
>> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
>> -------------------------------------------------------------
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l@portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>
> -- 
> -------------------------------------------------------------
> Hilmar Lapp                            email: lapp at gnf.org
> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> -------------------------------------------------------------
>
>
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From jason.stajich at duke.edu  Tue Jul 26 21:13:09 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Tue Jul 26 21:03:52 2005
Subject: [Bioperl-l] Parsing EMBOSS::needle output
In-Reply-To: <00bd01c591fb$e53cc0f0$0301a8c0@GOLHARMOBILE1>
References: <00bd01c591fb$e53cc0f0$0301a8c0@GOLHARMOBILE1>
Message-ID: <4816562302e855bdf87abb347c216c8d@duke.edu>

I think the "emboss" format changed in 3.0.0
solutions:
a) fix the AlignIO::emboss parser to handle both flavors (old and new)
b) have it output MSF format and use AlignIO::msf.

-jason
On Jul 26, 2005, at 9:05 AM, Ryan Golhar wrote:

> I'm trying to parse the output of EMBOSS::needle (EMBOSS 3.0.0) using
>
> `needle -asequence /tmp/genbank.cds -bsequence ../Seq/$tuple/$organism 
> -
> gapopen 10 -gapextend 0.5 -outfile /tmp/compare.needle 2>/dev/null`;
>
> my $alnobj = new Bio::AlignIO(-format => 'emboss',
>                               -file   => '/tmp/compare.needle');
> my $alignment = $alnobj->next_aln;
> print "\tPercentage Identity: ", $alignment->percentage_identity, "\n";
>
> However $alignment never gets defined.  $alnobj never returns an
> alignment object.   I saw other posts relating to this but not
> solutions...
>
> Any ideas?
>
> Ryan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
--
Jason Stajich
http://www.duke.edu/~jes12
jason.stajich -at- duke.edu

From bmoore at genetics.utah.edu  Tue Jul 26 16:07:16 2005
From: bmoore at genetics.utah.edu (Barry Moore)
Date: Tue Jul 26 21:13:27 2005
Subject: [Bioperl-l] getting pubmed id from genbank files
Message-ID: <CFE1DF3BA20F424689DA0881A14055BE863ECF@m.hg.genetics.utah.edu>

Nathan-

That sounds like you are using bioperl 1.4?  The error is in
Bio/SeqIO/genbank.pm  and was fixed by Jason in cvs version 1.102 of
that file.  However the current code still looks a bit odd to me.
Starting at line 1068 of the current cvs version (1.119) of genebank.pm
we have:

1068  if (/^\s{2}JOURNAL\s+(.*)/o) {
1069     push(@loc, $1);
1070     while ( defined($_ = $self->_readline) ) {
1071           # we only match when there are at least 4 spaces
1072           # there is probably a better way to match this
1073           # as it assumes that the describing tag is short enough
1074           /^\s{4,}(.*)/o && do { push(@loc, $1);
1075           next;
1076     };
1077     last;
1078  }
1079  $ref->location(join(' ', @loc));

This is all dealing with parsing the Journal line which is handled fine
by lines 1068-69.  The while loop at 1070 looks at successive lines to
find something to add to the Journal line.  The regex at line 1074 used
to read /^\s{3,}(.*)/o which would not match if the next line after
JOURNAL began with '  MEDLINE', but would match '   PUBMED' (Nathan's
situation) causing that line to be added to the JOURNAL line.  Is there
ever a JOURNAL entry with more than one line?  If so, shouldn't the
following lines always be untagged and thus indented 12 making the regex
/^\s{12}(.*)/o safer.  The current situation would add any line to
JOURNAL line if it's tag is shorter than 6 characters, and I don't think
that's what we want.

Barry

-----Original Message-----
From: bioperl-l-bounces@portal.open-bio.org
[mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Hilmar Lapp
Sent: Tuesday, July 26, 2005 11:05 AM
To: n.haigh@sheffield.ac.uk
Cc: 'bioperl-l'
Subject: Re: [Bioperl-l] getting pubmed id from genbank files


On Jul 26, 2005, at 7:49 AM, Nathan Haigh wrote:

> -- snip --
> $VAR1 = bless( {
>        'authors' => 'Clauss,M.J. and Mitchell-Olds,T.',
>        'location' => 'Genetics 166 (3), 1419-1436 (2004) PUBMED   
> 15082560',
>        'title' => 'Functional divergence in tandemly duplicated 
> Arabidopsis
> thaliana trypsin inhibitor genes',
>        'tagname' => 'reference'
>      }, 'Bio::Annotation::Reference' );
> -- snip --

This is odd. The PUBMED line should not be concatenated with the 
JOURNAL line. I wonder where this happens and why. Can you download the 
record from NCBI (using the web interface, format 'GenBank', 'Send all 
to file') and then parse it with Bio::SeqIO? If it works then the 
problem must be in the code that deals with the HTTP-response.

	-hilmar


>
> -----Original Message-----
> From: Jason Stajich [mailto:jason.stajich@duke.edu]
> Sent: 26 July 2005 15:28
> To: Bioperl-l@portal.open-bio.org
> Cc: Nathan Haigh
> Subject: [Bioperl-l] getting pubmed id from genbank files
>
>
>
> Here is part of the synopsis in Bio::Seq:
>
>      foreach my $ref ( $ann->get_Annotations('reference') ) {
>          print "Reference ",$ref->title,"\n";
>      }
>
>   so do $ref->pubmed instead of $ref->title.
>
>
> -jason
>> On Jul 26, 2005, at 6:02 AM, Nathan Haigh wrote:
>>
>>> I want to be able to supply a list of GI's, retrieve the genbank
>>> files and
>>> parse out the pubmed id's.
>>>
>>>
>>>
>>> I know I can do the first steps of retrieving the genbank files
>>> directly,
>>> but how do I get the pubmed id's? I've been playing around with
>>> things and
>>> haven't yet found out if this can be done.
>>>
>>>
>>>
>>> Cheers,
>>>
>>> Nathan
>>>
>>>
>>>
>>> ----------------------------------
>>>
>>> Nathan Haigh
>>>
>>> Bioinformatics PostDoctoral Research Associate
>>>
>>>
>>>
>>> Room B2 211
>>>
>>> Department of Animal and Plant Sciences
>>>
>>> University of Sheffield
>>>
>>> Western Bank
>>>
>>> Sheffield
>>>
>>> S10 2TN
>>>
>>>
>>>
>>> Tel: +44 (0)114 22 20112
>>>
>>> Mob: +44 (0)7742 533 569
>>>
>>> Fax: +44 (0)114 22 20002
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l@portal.open-bio.org
>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>
>> --
>> Jason Stajich
>> http://www.duke.edu/~jes12
>> jason.stajich -at- duke.edu
>>
>>
> --
> Jason Stajich
> http://www.duke.edu/~jes12
> jason.stajich -at- duke.edu
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l

From bmoore at genetics.utah.edu  Tue Jul 26 16:31:03 2005
From: bmoore at genetics.utah.edu (Barry Moore)
Date: Tue Jul 26 21:13:30 2005
Subject: [Bioperl-l] getting pubmed id from genbank files
Message-ID: <CFE1DF3BA20F424689DA0881A14055BE863ED1@m.hg.genetics.utah.edu>

Then would it be safe to assume that in the case of multi-line JOURNAL
entries, all lines following the initial tagged JOURNAL line would be
untagged?  If so, the regex could probably be made a bit safer.

Barry

-----Original Message-----
From: Hilmar Lapp [mailto:hlapp@gnf.org] 
Sent: Tuesday, July 26, 2005 2:09 PM
To: Barry Moore
Cc: bioperl-l; n.haigh@sheffield.ac.uk
Subject: Re: [Bioperl-l] getting pubmed id from genbank files

There are indeed JOURNAL entries spanning multiple lines; the parser 
was once unable to deal with this and was subsequently fixed ... as we 
see this introduced other problems ...

On Jul 26, 2005, at 1:07 PM, Barry Moore wrote:

> Nathan-
>
> That sounds like you are using bioperl 1.4?  The error is in
> Bio/SeqIO/genbank.pm  and was fixed by Jason in cvs version 1.102 of
> that file.  However the current code still looks a bit odd to me.
> Starting at line 1068 of the current cvs version (1.119) of
genebank.pm
> we have:
>
> 1068  if (/^\s{2}JOURNAL\s+(.*)/o) {
> 1069     push(@loc, $1);
> 1070     while ( defined($_ = $self->_readline) ) {
> 1071           # we only match when there are at least 4 spaces
> 1072           # there is probably a better way to match this
> 1073           # as it assumes that the describing tag is short enough
> 1074           /^\s{4,}(.*)/o && do { push(@loc, $1);
> 1075           next;
> 1076     };
> 1077     last;
> 1078  }
> 1079  $ref->location(join(' ', @loc));
>
> This is all dealing with parsing the Journal line which is handled
fine
> by lines 1068-69.  The while loop at 1070 looks at successive lines to
> find something to add to the Journal line.  The regex at line 1074
used
> to read /^\s{3,}(.*)/o which would not match if the next line after
> JOURNAL began with '  MEDLINE', but would match '   PUBMED' (Nathan's
> situation) causing that line to be added to the JOURNAL line.  Is
there
> ever a JOURNAL entry with more than one line?  If so, shouldn't the
> following lines always be untagged and thus indented 12 making the 
> regex
> /^\s{12}(.*)/o safer.  The current situation would add any line to
> JOURNAL line if it's tag is shorter than 6 characters, and I don't 
> think
> that's what we want.
>
> Barry
>
> -----Original Message-----
> From: bioperl-l-bounces@portal.open-bio.org
> [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Hilmar
Lapp
> Sent: Tuesday, July 26, 2005 11:05 AM
> To: n.haigh@sheffield.ac.uk
> Cc: 'bioperl-l'
> Subject: Re: [Bioperl-l] getting pubmed id from genbank files
>
>
> On Jul 26, 2005, at 7:49 AM, Nathan Haigh wrote:
>
>> -- snip --
>> $VAR1 = bless( {
>>        'authors' => 'Clauss,M.J. and Mitchell-Olds,T.',
>>        'location' => 'Genetics 166 (3), 1419-1436 (2004) PUBMED
>> 15082560',
>>        'title' => 'Functional divergence in tandemly duplicated
>> Arabidopsis
>> thaliana trypsin inhibitor genes',
>>        'tagname' => 'reference'
>>      }, 'Bio::Annotation::Reference' );
>> -- snip --
>
> This is odd. The PUBMED line should not be concatenated with the
> JOURNAL line. I wonder where this happens and why. Can you download
the
> record from NCBI (using the web interface, format 'GenBank', 'Send all
> to file') and then parse it with Bio::SeqIO? If it works then the
> problem must be in the code that deals with the HTTP-response.
>
> 	-hilmar
>
>
>>
>> -----Original Message-----
>> From: Jason Stajich [mailto:jason.stajich@duke.edu]
>> Sent: 26 July 2005 15:28
>> To: Bioperl-l@portal.open-bio.org
>> Cc: Nathan Haigh
>> Subject: [Bioperl-l] getting pubmed id from genbank files
>>
>>
>>
>> Here is part of the synopsis in Bio::Seq:
>>
>>      foreach my $ref ( $ann->get_Annotations('reference') ) {
>>          print "Reference ",$ref->title,"\n";
>>      }
>>
>>   so do $ref->pubmed instead of $ref->title.
>>
>>
>> -jason
>>> On Jul 26, 2005, at 6:02 AM, Nathan Haigh wrote:
>>>
>>>> I want to be able to supply a list of GI's, retrieve the genbank
>>>> files and
>>>> parse out the pubmed id's.
>>>>
>>>>
>>>>
>>>> I know I can do the first steps of retrieving the genbank files
>>>> directly,
>>>> but how do I get the pubmed id's? I've been playing around with
>>>> things and
>>>> haven't yet found out if this can be done.
>>>>
>>>>
>>>>
>>>> Cheers,
>>>>
>>>> Nathan
>>>>
>>>>
>>>>
>>>> ----------------------------------
>>>>
>>>> Nathan Haigh
>>>>
>>>> Bioinformatics PostDoctoral Research Associate
>>>>
>>>>
>>>>
>>>> Room B2 211
>>>>
>>>> Department of Animal and Plant Sciences
>>>>
>>>> University of Sheffield
>>>>
>>>> Western Bank
>>>>
>>>> Sheffield
>>>>
>>>> S10 2TN
>>>>
>>>>
>>>>
>>>> Tel: +44 (0)114 22 20112
>>>>
>>>> Mob: +44 (0)7742 533 569
>>>>
>>>> Fax: +44 (0)114 22 20002
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l@portal.open-bio.org
>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>> --
>>> Jason Stajich
>>> http://www.duke.edu/~jes12
>>> jason.stajich -at- duke.edu
>>>
>>>
>> --
>> Jason Stajich
>> http://www.duke.edu/~jes12
>> jason.stajich -at- duke.edu
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l@portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>
> -- 
> -------------------------------------------------------------
> Hilmar Lapp                            email: lapp at gnf.org
> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> -------------------------------------------------------------
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From bmoore at genetics.utah.edu  Tue Jul 26 16:56:11 2005
From: bmoore at genetics.utah.edu (Barry Moore)
Date: Tue Jul 26 21:13:31 2005
Subject: [Bioperl-l] Parsing EMBOSS::needle output
Message-ID: <CFE1DF3BA20F424689DA0881A14055BE863ED2@m.hg.genetics.utah.edu>

Ryan-

This works for me with my own sequence files.  I don't know if your
mailer line wrapped your script, but when I copied your script I had to
fix the '-gapopen' parameter in your needle command line.  You can't
have any whitespace between the '-' and 'gapopen'.  Did you check to be
sure that /tmp/compare.needle was actually written?  If you still have
trouble, you can send along the files that your comparing, and I'll see
if they run for me.

Barry

-----Original Message-----
From: bioperl-l-bounces@portal.open-bio.org
[mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Ryan Golhar
Sent: Tuesday, July 26, 2005 10:06 AM
To: 'Bioperl List'
Subject: [Bioperl-l] Parsing EMBOSS::needle output

I'm trying to parse the output of EMBOSS::needle (EMBOSS 3.0.0) using

`needle -asequence /tmp/genbank.cds -bsequence ../Seq/$tuple/$organism -
gapopen 10 -gapextend 0.5 -outfile /tmp/compare.needle 2>/dev/null`;

my $alnobj = new Bio::AlignIO(-format => 'emboss',
                              -file   => '/tmp/compare.needle');
my $alignment = $alnobj->next_aln;
print "\tPercentage Identity: ", $alignment->percentage_identity, "\n";

However $alignment never gets defined.  $alnobj never returns an
alignment object.   I saw other posts relating to this but not
solutions...

Any ideas?

Ryan

_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l

From N.Haigh at sheffield.ac.uk  Wed Jul 27 04:09:59 2005
From: N.Haigh at sheffield.ac.uk (Nathan Haigh)
Date: Wed Jul 27 04:00:56 2005
Subject: [Bioperl-l] getting pubmed id from genbank files
In-Reply-To: <CFE1DF3BA20F424689DA0881A14055BE863ECF@m.hg.genetics.utah.edu>
References: <CFE1DF3BA20F424689DA0881A14055BE863ECF@m.hg.genetics.utah.edu>
Message-ID: <1122451799.42e7415740f57@webmail.shef.ac.uk>

Yeah, i'm pretty sure i was using bioperl-live updated that morning. Your explaination of the problem seems feasible from what i was looking at in the
perl debugger. I'll look into this a bit more later this morning.

Nathan

Quoting Barry Moore <bmoore@genetics.utah.edu>:

> Nathan-
> 
> That sounds like you are using bioperl 1.4?  The error is in
> Bio/SeqIO/genbank.pm  and was fixed by Jason in cvs version 1.102 of
> that file.  However the current code still looks a bit odd to me.
> Starting at line 1068 of the current cvs version (1.119) of genebank.pm
> we have:
> 
> 1068  if (/^\s{2}JOURNAL\s+(.*)/o) {
> 1069     push(@loc, $1);
> 1070     while ( defined($_ = $self->_readline) ) {
> 1071           # we only match when there are at least 4 spaces
> 1072           # there is probably a better way to match this
> 1073           # as it assumes that the describing tag is short enough
> 1074           /^\s{4,}(.*)/o && do { push(@loc, $1);
> 1075           next;
> 1076     };
> 1077     last;
> 1078  }
> 1079  $ref->location(join(' ', @loc));
> 
> This is all dealing with parsing the Journal line which is handled fine
> by lines 1068-69.  The while loop at 1070 looks at successive lines to
> find something to add to the Journal line.  The regex at line 1074 used
> to read /^\s{3,}(.*)/o which would not match if the next line after
> JOURNAL began with '  MEDLINE', but would match '   PUBMED' (Nathan's
> situation) causing that line to be added to the JOURNAL line.  Is there
> ever a JOURNAL entry with more than one line?  If so, shouldn't the
> following lines always be untagged and thus indented 12 making the regex
> /^\s{12}(.*)/o safer.  The current situation would add any line to
> JOURNAL line if it's tag is shorter than 6 characters, and I don't think
> that's what we want.
> 
> Barry
> 
> -----Original Message-----
> From: bioperl-l-bounces@portal.open-bio.org
> [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Hilmar Lapp
> Sent: Tuesday, July 26, 2005 11:05 AM
> To: n.haigh@sheffield.ac.uk
> Cc: 'bioperl-l'
> Subject: Re: [Bioperl-l] getting pubmed id from genbank files
> 
> 
> On Jul 26, 2005, at 7:49 AM, Nathan Haigh wrote:
> 
> > -- snip --
> > $VAR1 = bless( {
> >        'authors' => 'Clauss,M.J. and Mitchell-Olds,T.',
> >        'location' => 'Genetics 166 (3), 1419-1436 (2004) PUBMED   
> > 15082560',
> >        'title' => 'Functional divergence in tandemly duplicated 
> > Arabidopsis
> > thaliana trypsin inhibitor genes',
> >        'tagname' => 'reference'
> >      }, 'Bio::Annotation::Reference' );
> > -- snip --
> 
> This is odd. The PUBMED line should not be concatenated with the 
> JOURNAL line. I wonder where this happens and why. Can you download the 
> record from NCBI (using the web interface, format 'GenBank', 'Send all 
> to file') and then parse it with Bio::SeqIO? If it works then the 
> problem must be in the code that deals with the HTTP-response.
> 
> 	-hilmar
> 
> 
> >
> > -----Original Message-----
> > From: Jason Stajich [mailto:jason.stajich@duke.edu]
> > Sent: 26 July 2005 15:28
> > To: Bioperl-l@portal.open-bio.org
> > Cc: Nathan Haigh
> > Subject: [Bioperl-l] getting pubmed id from genbank files
> >
> >
> >
> > Here is part of the synopsis in Bio::Seq:
> >
> >      foreach my $ref ( $ann->get_Annotations('reference') ) {
> >          print "Reference ",$ref->title,"\n";
> >      }
> >
> >   so do $ref->pubmed instead of $ref->title.
> >
> >
> > -jason
> >> On Jul 26, 2005, at 6:02 AM, Nathan Haigh wrote:
> >>
> >>> I want to be able to supply a list of GI's, retrieve the genbank
> >>> files and
> >>> parse out the pubmed id's.
> >>>
> >>>
> >>>
> >>> I know I can do the first steps of retrieving the genbank files
> >>> directly,
> >>> but how do I get the pubmed id's? I've been playing around with
> >>> things and
> >>> haven't yet found out if this can be done.
> >>>
> >>>
> >>>
> >>> Cheers,
> >>>
> >>> Nathan
> >>>
> >>>
> >>>
> >>> ----------------------------------
> >>>
> >>> Nathan Haigh
> >>>
> >>> Bioinformatics PostDoctoral Research Associate
> >>>
> >>>
> >>>
> >>> Room B2 211
> >>>
> >>> Department of Animal and Plant Sciences
> >>>
> >>> University of Sheffield
> >>>
> >>> Western Bank
> >>>
> >>> Sheffield
> >>>
> >>> S10 2TN
> >>>
> >>>
> >>>
> >>> Tel: +44 (0)114 22 20112
> >>>
> >>> Mob: +44 (0)7742 533 569
> >>>
> >>> Fax: +44 (0)114 22 20002
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l@portal.open-bio.org
> >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >> --
> >> Jason Stajich
> >> http://www.duke.edu/~jes12
> >> jason.stajich -at- duke.edu
> >>
> >>
> > --
> > Jason Stajich
> > http://www.duke.edu/~jes12
> > jason.stajich -at- duke.edu
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
> -- 
> -------------------------------------------------------------
> Hilmar Lapp                            email: lapp at gnf.org
> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> -------------------------------------------------------------
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 


From Andrew.Mather at dpi.vic.gov.au  Wed Jul 27 06:52:46 2005
From: Andrew.Mather at dpi.vic.gov.au (Andrew.Mather@dpi.vic.gov.au)
Date: Wed Jul 27 06:44:09 2005
Subject: [Bioperl-l] is the Bio::Ext::Align stuff supposed to work?
Message-ID: <OF48068F4A.E48938BA-ONCA25704B.003A8016-CA25704B.003BC32E@nre.vic.gov.au>

Hi George
>
> I've been playing with Bio::Tools::dpAlign, which involved installing
> Bio::Ext.
>
> Bio::Ext did a really poor job of installing itself (FreeBSD
> 6-{various}, perl 5.8.[67]).  I managed to mv and cp the various parts
> around to where they were supposed to be.
>
> I'm not sure if it's me, FreeBSD, or Bio::Ext.  Does it work for other
> folks?  The tests all work fine, they get away with some judicious
> -I../this-that-the-other, but if you copy e.g. the Align test file to
> your home directory and just try to run it, it doesn't work.
>
> In particular, the .so and .bs files didn't end up where they belong,
> and I ended up with /.../Bio/Ext/Align/Align.pm instead
> /.../Bio/Ext/Align.pm.
>
> I'm sure I can figure it out and pass some patches back, just wanted
> to understand who else might be seeing the problem.
>

I've been having a few battles with staden io_lib myself, which have 
caused problems with Bio::Ext.

I have a system with a mix of RHEL3 on IA32 and AMD64 machines.  The 
staden compiled fine on the Intel machines and once I'd copied the usual 
.h files to where they were expected, Ext set up fine.

On the AMD's though, no such luck.  I had to find the 1.9 (or is it 1.1.9 
?..I'm not near the machines now)  version before it would even compile, 
however it doesn't create any .so files at all.

This isn't strictly a bioperl problem I suppose, but it is related.

There were a couple of suggestions raised here, but so far no good.

I've had to go on to other things at the moment, but I'm still trying find 
a solution when I can get back to it.

Andrew


Animal Genetics and Genomics, PIRVic Attwood
475 Mickleham Road, Attwood, 3049
ph +61 3 92174342
mob  0413 009 761


----------------
There are 10 kinds of people...those who understand binary and those who 
don't.
From senger at ebi.ac.uk  Wed Jul 27 10:57:33 2005
From: senger at ebi.ac.uk (Martin Senger)
Date: Wed Jul 27 10:48:04 2005
Subject: [Bioperl-l] Bio::Tools::Run::Analysis - small but important changes
In-Reply-To: <Pine.LNX.4.44.0507231700170.21622-100000@bagheera.ebi.ac.uk>
Message-ID: <Pine.LNX.4.44.0507271551280.1788-100000@bagheera.ebi.ac.uk>

Hi,
   This is a similar messages as was about Bio::Biblio recently: the
default location of the SOAP-based services running at EBI has been
changed. Nothing has changed to the API of these services. If you are
using these services (details at
http://www.ebi.ac.uk/soaplab/Perl_Client.html; but the pages are not yet
updated) just update your bioperl modules, or overwrite in your scripts 
the default location by the new one:
   -location => 'http://www.ebi.ac.uk/soaplab/service'

   Regards,
   Martin

-- 
Martin Senger

EMBL Outstation - Hinxton                Senger@EBI.ac.uk     
European Bioinformatics Institute        Phone: (+44) 1223 494636      
Wellcome Trust Genome Campus             (Switchboard:     494444)
Hinxton                                  Fax  : (+44) 1223 494468
Cambridge CB10 1SD
United Kingdom                           http://industry.ebi.ac.uk/~senger

From adil_iqbal75 at yahoo.com  Wed Jul 27 14:21:38 2005
From: adil_iqbal75 at yahoo.com (adil iqbal)
Date: Wed Jul 27 14:16:45 2005
Subject: [Bioperl-l] app kia bar bar messeges bhajthay ho kiu koi khas bhat
	hai agar ho plz urdo main likho do, nt like english ok
Message-ID: <20050727182138.59616.qmail@web32401.mail.mud.yahoo.com>

  
__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
From n.haigh at sheffield.ac.uk  Thu Jul 28 07:36:56 2005
From: n.haigh at sheffield.ac.uk (Nathan Haigh)
Date: Thu Jul 28 07:28:44 2005
Subject: [Bioperl-l] getting pubmed id from genbank files
In-Reply-To: <1122451799.42e7415740f57@webmail.shef.ac.uk>
Message-ID: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAAIBuBbth5gEeZS9eEGziB38KAAAAQAAAAVY8IU/RoHUmYkUbf+jYEtQEAAAAA@sheffield.ac.uk>

Big Oops!

I wasn't using bioperl live! Things now seem to be ok - well at lest with
that one genbank file!

Thanks for the input anyway! :o)
Nathan


-----Original Message-----
From: bioperl-l-bounces@portal.open-bio.org
[mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Nathan Haigh
Sent: 27 July 2005 09:10
To: Barry Moore
Cc: Hilmar Lapp; bioperl-l
Subject: RE: [Bioperl-l] getting pubmed id from genbank files

Yeah, i'm pretty sure i was using bioperl-live updated that morning. Your
explaination of the problem seems feasible from what i was looking at in the
perl debugger. I'll look into this a bit more later this morning.

Nathan

Quoting Barry Moore <bmoore@genetics.utah.edu>:

> Nathan-
> 
> That sounds like you are using bioperl 1.4?  The error is in
> Bio/SeqIO/genbank.pm  and was fixed by Jason in cvs version 1.102 of
> that file.  However the current code still looks a bit odd to me.
> Starting at line 1068 of the current cvs version (1.119) of genebank.pm
> we have:
> 
> 1068  if (/^\s{2}JOURNAL\s+(.*)/o) {
> 1069     push(@loc, $1);
> 1070     while ( defined($_ = $self->_readline) ) {
> 1071           # we only match when there are at least 4 spaces
> 1072           # there is probably a better way to match this
> 1073           # as it assumes that the describing tag is short enough
> 1074           /^\s{4,}(.*)/o && do { push(@loc, $1);
> 1075           next;
> 1076     };
> 1077     last;
> 1078  }
> 1079  $ref->location(join(' ', @loc));
> 
> This is all dealing with parsing the Journal line which is handled fine
> by lines 1068-69.  The while loop at 1070 looks at successive lines to
> find something to add to the Journal line.  The regex at line 1074 used
> to read /^\s{3,}(.*)/o which would not match if the next line after
> JOURNAL began with '  MEDLINE', but would match '   PUBMED' (Nathan's
> situation) causing that line to be added to the JOURNAL line.  Is there
> ever a JOURNAL entry with more than one line?  If so, shouldn't the
> following lines always be untagged and thus indented 12 making the regex
> /^\s{12}(.*)/o safer.  The current situation would add any line to
> JOURNAL line if it's tag is shorter than 6 characters, and I don't think
> that's what we want.
> 
> Barry
> 
> -----Original Message-----
> From: bioperl-l-bounces@portal.open-bio.org
> [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Hilmar Lapp
> Sent: Tuesday, July 26, 2005 11:05 AM
> To: n.haigh@sheffield.ac.uk
> Cc: 'bioperl-l'
> Subject: Re: [Bioperl-l] getting pubmed id from genbank files
> 
> 
> On Jul 26, 2005, at 7:49 AM, Nathan Haigh wrote:
> 
> > -- snip --
> > $VAR1 = bless( {
> >        'authors' => 'Clauss,M.J. and Mitchell-Olds,T.',
> >        'location' => 'Genetics 166 (3), 1419-1436 (2004) PUBMED   
> > 15082560',
> >        'title' => 'Functional divergence in tandemly duplicated 
> > Arabidopsis
> > thaliana trypsin inhibitor genes',
> >        'tagname' => 'reference'
> >      }, 'Bio::Annotation::Reference' );
> > -- snip --
> 
> This is odd. The PUBMED line should not be concatenated with the 
> JOURNAL line. I wonder where this happens and why. Can you download the 
> record from NCBI (using the web interface, format 'GenBank', 'Send all 
> to file') and then parse it with Bio::SeqIO? If it works then the 
> problem must be in the code that deals with the HTTP-response.
> 
> 	-hilmar
> 
> 
> >
> > -----Original Message-----
> > From: Jason Stajich [mailto:jason.stajich@duke.edu]
> > Sent: 26 July 2005 15:28
> > To: Bioperl-l@portal.open-bio.org
> > Cc: Nathan Haigh
> > Subject: [Bioperl-l] getting pubmed id from genbank files
> >
> >
> >
> > Here is part of the synopsis in Bio::Seq:
> >
> >      foreach my $ref ( $ann->get_Annotations('reference') ) {
> >          print "Reference ",$ref->title,"\n";
> >      }
> >
> >   so do $ref->pubmed instead of $ref->title.
> >
> >
> > -jason
> >> On Jul 26, 2005, at 6:02 AM, Nathan Haigh wrote:
> >>
> >>> I want to be able to supply a list of GI's, retrieve the genbank
> >>> files and
> >>> parse out the pubmed id's.
> >>>
> >>>
> >>>
> >>> I know I can do the first steps of retrieving the genbank files
> >>> directly,
> >>> but how do I get the pubmed id's? I've been playing around with
> >>> things and
> >>> haven't yet found out if this can be done.
> >>>
> >>>
> >>>
> >>> Cheers,
> >>>
> >>> Nathan
> >>>
> >>>
> >>>
> >>> ----------------------------------
> >>>
> >>> Nathan Haigh
> >>>
> >>> Bioinformatics PostDoctoral Research Associate
> >>>
> >>>
> >>>
> >>> Room B2 211
> >>>
> >>> Department of Animal and Plant Sciences
> >>>
> >>> University of Sheffield
> >>>
> >>> Western Bank
> >>>
> >>> Sheffield
> >>>
> >>> S10 2TN
> >>>
> >>>
> >>>
> >>> Tel: +44 (0)114 22 20112
> >>>
> >>> Mob: +44 (0)7742 533 569
> >>>
> >>> Fax: +44 (0)114 22 20002
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l@portal.open-bio.org
> >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >> --
> >> Jason Stajich
> >> http://www.duke.edu/~jes12
> >> jason.stajich -at- duke.edu
> >>
> >>
> > --
> > Jason Stajich
> > http://www.duke.edu/~jes12
> > jason.stajich -at- duke.edu
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
> -- 
> -------------------------------------------------------------
> Hilmar Lapp                            email: lapp at gnf.org
> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> -------------------------------------------------------------
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 


_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l


From cjm at fruitfly.org  Thu Jul 28 15:42:48 2005
From: cjm at fruitfly.org (Chris Mungall)
Date: Thu Jul 28 15:33:35 2005
Subject: [Bioperl-l] Fixing bioperl [was Re: [GMOD-devel] Re: [Gmod-gbrowse]
 Analysis features (Re: Final alpha release of gmod (chado))]
In-Reply-To: <1122570166.3288.10.camel@localhost.localdomain>
References: <9331C217-F039-11D9-A447-000393B8D01C@indiana.edu> 
	<42E909E3.2030102@infobiogen.fr>
	<1122570166.3288.10.camel@localhost.localdomain>
Message-ID: <Pine.OSX.4.58.0507281113390.8894@skerryvore.dhcp.lbl.gov>


[sorry for the cross-posting, but I think it's really important to have a
gmod to bioperl chit chat on this. I've removed gmod-gbrowse from the cc
list]

On Thu, 28 Jul 2005, Scott Cain wrote:

> Hi Cyril,
>
> I think Bio::Tools::GFF is somewhat hacky and not a tool I would use to
> produce 'safe' GFF3.  On the other hand Bio::FeatureIO is still a little
> immature, but it is what I used for the chado GFF3 bulk loader, so it
> does handle (parse) Target features.  So my suggestion would be to use
> BFIO::gff, but be prepared for some problems; when you find them
> complain loudly on the bioperl mailing list or fix the problems and
> commit them (or both!).

I think the answer may be even more complicated than this.

Lurkers and contributors to the bioperl mailing list may have noticed that
there has been some major obstacles in progressing lately, particularly in
getting a stable release of the code out. bp1.4 is fairly old, 1.5 is a
developers release, though this is the one required by GMOD.

My understanding is that this bottleneck can be traced back to changes in
the SeqFeature and Annotation model. These changes appear to be required
by Bio::SeqFeature::Annotated which is produced by Bio::FeatureIO::gff
(which in turn is used by the GMOD bulk loader, which is the main reason
GMOD requires 1.5, I believe?). Unfortunately, these changes also break
existing code and have a severe negative impact on memory usage.

Before advising Cyril and others to switch to BFIO::gff I think it's
important to make sure there is a clear path forward with bioperl. My
impression is that there is something of a stalemate here. The bioperl
developers would like to retract the aforementioned changes, but they
believe they cannot do this without breaking GMOD code.  They are also
extremely uncomfortable about leaving these changes in. Everyone gives up
and starts coding around bioperl.

Here is why the changes were introduced:

BioPerl has a 'scruffy' typing model, whereby feature types (primary_tag
in bioperl) and featureprop types (tags in bioperl) are labels or strings.
In contrast, Chado forces all types to be some class or relation in an
ontology.

Now obviously I'm rather partial to the Chado model, but that doesn't mean
I think it should be forced upon bioperl. I often use bioperl in scruffy
mode (on scruffy data); or in some combination whereby I map the scruffy
types to ontologies in some non-bioperl code. When using bioperl as a
middleware component over a nicely organised database, ontology-typed mode
is definitely best. However, the majority of bioperl users (including
myself) spend a large proportion of their time working with scruffy data,
in which case lightweight scruffy types are more appropriate.

It seems that there is a perfectly simple way of reconciling both
approaches. We revert bioperl back to the simpler scruffy model. The
majority of users and developers breathe a sigh of relief. We then extend
SeqFeatureI with something like SeqFeatureAnnotatedI. This forces types to
be stored as OntologyTerms (and I haven't even touched on some of the
problems here, but at least we are insulating the standard bioperl layer
that 99% of users use from these issues). All classes implementing SFAI
will necessarily implement SFI, and the primary_tag and tag_values methods
will be supported (not deprecated) as simple delegations to the
OntologyTerm objects.

We can then modify BFIO::gff (which is an incredibly useful piece of code)
and get rid of all the dependencies on SO and Bio::Ontology* and instead
allow the user of this module to plug in their own resolver/validator - so
they can choose whether they just want fast scruffy lightweight SFI
features, or whether they want ontology-typed SFAI features. If the
latter, then they can choose their own resolver strategy - by a user
supplied hash, by a copy of SO auto-downloaded from sourceforge, by a
local chado db, by the genbank->SO mapping table, during parsing vs
post-parsing, whatever. In fact there is already
Bio::SeqFeature::Tools::TypeMapper, but currently this is mostly concerned
with helping Bio::SeqFeature::Tools::Unflattener convert scruffy genbank
to something sensible.

GMOD (and perhaps biosql) would use SFAI, everyone else would use the
simpler SFI. Someone can even get a stable 1.6 release out before all the
SFAI details such as how the resolver would work are finalised. I'd really
like to see 1.6 include a simpler BFIO::gff that can optionally produces
features that aren't SeqFeature::Annotateds, but that's negotiable.

There's vast swathes of both GMOD and BioPerl code I'm not familiar with,
so it's possible my analysis above is flawed in some way. If it is, then
it's up to someone from either camp to speak up! If not, then there's no
excuses for the relevant people to start sorting out this mess by
commencing with the solution outlined above.

Cheers
Chris

>
> Scott
>
>
> On Thu, 2005-07-28 at 18:37 +0200, Cyril Pommier wrote:
> > Hello,
> > We are going to store analysis results in chado, and we are of course
> > very interressed by these futur evolutions of GFF3/chado.
> > So we would like to make sure that the parsers and conversions programs
> > we are writing now will be compatible with the futur GFF3.
> >
> > We are using Bio::SeqFeature::Generic objects that we write with
> > Bio::Tools::GFF.
> >
> > Do you think that Bio::Tools::GFF will be able to handle the new 'type'
> > column or is it better to switch to Bio::FeatureIO::gff ?
> >
> > Thanks in advance for any advice.
> >
> > Cyril
> >
> > Don Gilbert wrote:
> >
> > >
> > > Scott,
> > >
> > > Your notes in gmod_bulk_load_gff3.pl suggest it is headed in
> > > same direction I suggest below. More about these todo points
> > >
> > >> - address flybase"s use of of analysisfeature combined with feature to
> > >> give source-type information (in GFF terms). This will need to
> > >> be addressed in the GBrowse adaptor.
> > >> - modify the bulk loader to allow "mixed" GFF3 files (that is,
> > >> containing
> > >> both analysis results and annotations). See perldoc
> > >> gmod_bulk_load_gff3.pl
> > >> for more info
> > >
> > >
> > > Use of chado's analysisfeature table is something others who know
> > > it better can comment on. But after working with it for a while
> > > it makes sense to me to use in this way:
> > >
> > > For a future GFF -> Chado loader, treat analysis features such as
> > > gene finding results, BLAST, sim4 as 'analysisfeature type' rather
> > > than feature CV term type (the ones that now end up with a generic
> > > 'match' cvterm). In these cases the Analysis table is populated with
> > > program:database_sourcename
> > > as the basis of this 'analysisfeature type', such as
> > > match:blastx:na_pe.dros
> > > match:sim4:DGC
> > > match:genie:dummy (or maybe exon:genie)
> > >
> > > The program:database fits neatly in GFF source field, as
> > > #ref source type start stop ...
> > > chr1 blastx:na_pe.dros match 1 100 ...
> > > chr1 sim4:DGC match 1 100 ...
> > >
> > > These can be treated in database adaptor analogously to the CVterm
> > > table feature types. See at end a list of current GFF feature
> > > type:source from worm, rice, yeast, fly MODs. Fly and rice use a
> > > syntax like above and worm gff uses BLAT_EMBL_BEST, instead of
> > > BLAT:EMBL_BEST.
> > >
> > > From POD of your bulk_load_gff3.pl
> > > > Analysis
> > > > If you are loading analysis results (ie, BLAT results, gene
> > > > predictions), you should specify the -a flag. If no arguments are
> > > > supplied with the -a, then the loader will assume that the results
> > > > belong to an analysis set with a name that is the concatenation of
> > > > the source (column 2) and the method (column 3) with an underscore
> > > > in between.
> > >
> > > "... then the loader will assume that the results belong to an
> > > analysis table row with a program name and database source name
> > > taken from Source (column 2, colon separated program:sourcename),
> > > with a SOFA feature type taken from Method (column 3). If
> > > sourcename doesn't apply, e.g. genefinder, don't add or use 'dummy'.
> > > Use the generic 'match' SOFA type if others don't apply."
> > > [see also http://song.sourceforge.net/gff3-jan04.shtml#ALIGNMENTS]
> > >
> > > Note that sourcename of database is a common attribute (all those
> > > blasts, blats, sim4, ... are run on several different databases).
> > >
> > > For that underscore between method and source, where does that go into
> > > database? It is used as parts of program or database sourcename names,
> > > so it may be problematic to add one if not needed.
> > >
> > > Oh, I see now from bulk_load_gff3.PLS, you are creating a 'Name' entry
> > > for analysis table. This probably is less useful than using Program
> > > and Sourcename fields as flybase does, which comes from the common
> > > usage where people run various programs, with various database sources
> > > and want to plop the results into a database easily. These go into those
> > > two fields directly, no need to create or parse a Name entry
> > > (which can be and is null in flybase data).
> > >
> > > > my $search_analysis
> > > > = $db->prepare("SELECT analysis_id FROM analysis WHERE name=?");
> > >
> > > I think it would be better as
> > > my $search_analysis
> > > = $db->prepare("SELECT analysis_id FROM analysis WHERE program=? and
> > > sourcename=?");
> > >
> > > > Otherwise, the argument provided with -a will be taken
> > > > as the name of the analysis set. Either way, the analysis set must
> > > > already be in the analysis table. The easist way to do this is to
> > > > insert it directly in the psql shell:
> > > >
> > > > INSERT INTO analysis (name, program, programversion)
> > > > VALUES ('genscan 2005-2-28','genscan','5.4');
> > >
> > > My choice would be to populate the analysis table from GFF data, rather
> > > than expect prepraration by user (or as another option).
> > >
> > > INSERT INTO analysis (program, sourcename)
> > > VALUES ('tblastx','na_baylorf1_scfchunk.dpse');
> > > INSERT INTO analysis (program, sourcename)
> > > VALUES ('sim4','na_gb.dmel');
> > > INSERT INTO analysis (program, sourcename, programversion)
> > > VALUES ('genie_masked','dummy', '1.0');
> > >
> > > > There are other columns in the analysis table that are optional; see
> > > > the schema documentation and '\d analysis' in psql for more
> > > > information.
> > > >
> > > ....
> > > > A planned addtion to the functionality of handling analysis results
> > > > is to allow "mixed" GFF files, where some lines are analysis results
> > > > and some are not.
> > >
> > > This is the case for drosophila GFF now (see others also below). If
> > > you make the default assumption that if ($method =~ /.*match/) and
> > > ($source =~ m/([^:]+):(.+)/), you should get all/most of
> > > analysisfeature types, and probably not anything else.
> > >
> > > > Additionally, one will be able to supply lists of
> > > > types (optionally with sources) and their associated entry in the
> > > > analysis table. The format will probably be tag value pairs:
> > > >
> > > > --analysis match:Rice_est=rice_est_blast, \
> > > > match:Maize_cDNA=maize_cdna_blast, \
> > > > mRNA=genscan_prediction,exon=genscan_prediction
> > >
> > > My suggestion for this (as per GFF source,type columns) would be
> > > --analysis match:program:sourcename ...
> > > --analysis match:blast:Rice_est,match:blast:Maize_cDNA,\
> > > mRNA:genscan:dummy, exon:genscan:dummy
> > >
> > > I guess the 'dummy' data sourcename need not be added; flybase uses it
> > > to keep that field not-null, but it isn't required by the schema.
> > >
> > > Here are some snippets from the ChadoFC adaptor I modified
> > > from yours (will get into cvs.sf.net 'real soon'), showing that
> > > it isn't much work to add this as an analog to how cvterm types
> > > are used.
> > >
> > > -- Don
> > >
> > > ## Bio::DB::Das::ChadoFC.pm, part of new() - load analysis types
> > > ## treat similar to CV table types
> > >
> > > sub getAnalysisFeatureHash
> > > {
> > > my $self= shift;
> > >
> > > my $dbh= $self->dbh();
> > > my $sth = $dbh->prepare("select analysis_id,program,sourcename from
> > > analysis")
> > > or warn "unable to prepare select cvterms";
> > > $sth->execute or $self->throw("unable to select cvterms");
> > >
> > > my(%term2name,%name2term) = ({},{});
> > >
> > > while (my $hashref = $sth->fetchrow_hashref) {
> > >
> > > ## this is dgg syntax of analysis feature names for GFF
> > > ## all have generic 'match' method and program:source as 'source'
> > > ## a problem, want other main types: EST_match:xxx, mRNA:genie .. etc.
> > > my $anfeat= "match:".$hashref->{program}.":".$hashref->{sourcename};
> > >
> > > $term2name{ $hashref->{analysis_id} } = $anfeat;
> > > $name2term{ $anfeat } = $hashref->{analysis_id};
> > > }
> > > $self->an_term2name(\%term2name);
> > > $self->an_name2term(\%name2term);
> > > }
> > >
> > > ## Das::ChadoFC::Segment snippets
> > > sub features {
> > > $self->{has_anatype}=0;
> > > my $sql_range = '';
> > > my ($interbase_start,$rend,$srcfeature_id,$sql_types);
> > > unless ($feature_id) {
> > > $sql_range = $self->sql_range($rangetype);
> > >
> > > $sql_types = $self->sql_types($types, -1); # dgg
> > >
> > > $srcfeature_id = $self->{srcfeature_id};
> > > }
> > > ...
> > > elsif($self->{has_anatype}) {
> > > $from_part .= "left join analysisfeature af using (feature_id) ";
> > > }
> > >
> > >
> > > sub sql_types
> > > ..
> > > $valid_type = $factory->name2term($temp_type);
> > > $is_anatype= 0;
> > > unless ($valid_type) {
> > > $valid_type = $factory->an_name2term($temp_type);
> > > $self->{has_anatype}= $is_anatype= 1 if ($valid_type);
> > > }
> > > ..
> > > ## leave out extra invalid types
> > > if (!$valid_type) {
> > > ### skip
> > > } elsif ($temp_dbxref) {
> > > $sql_types .= $orsql."(f.type_id = $valid_type and fd.dbxref_id =
> > > $temp_dbxref)";
> > > } elsif($is_anatype) {
> > > $sql_types .= $orsql."(af.analysis_id = $valid_type)"; #<<<
> > > } else {
> > > $sql_types .= $orsql."(f.type_id = $valid_type)";
> > > }
> > >
> > >
> > > Lists of GFF feature type:source from some current MOD data
> > > where * are probably analysisfeature types (program:database)
> > >
> > > rice gff type:source
> > > ftp://ftp.gramene.org/pub/gramene/release17/data/sequence_annotation/
> > > gff3/
> > > --------------------
> > > CDS:known
> > > CDS:tigr
> > > EST:cmap
> > > EST_match:Barley (? might be EST_match:someprogram:Barley)
> > > EST_match:Maize
> > > EST_match:Millet
> > > EST_match:Rice
> > > EST_match:Sorghum
> > > EST_match:Wheat
> > > cDNA_match:Rice
> > > cross_genome_match:Maize
> > > cross_genome_match:Rice
> > > cross_genome_match:Sorghum
> > > * exon:FgenesH:Monocot
> > > exon:known
> > > exon:tigr
> > > five_prime_UTR:tigr
> > > gene:known
> > > gene:tigr
> > > * mRNA:FgenesH:Monocot
> > > mRNA:known
> > > mRNA:tigr
> > > microsatellite:cmap
> > > three_prime_UTR:known
> > > three_prime_UTR:tigr
> > > transposable_element_insertion_site:cmap
> > >
> > > worm gff type:source
> > > ftp://ftp.wormbase.org/pub/wormbase/species/elegans/
> > > genome_feature_tables/GFF3/
> > > ----------------------
> > > CDS:Coding_transcript
> > > * CDS:Genefinder
> > > CDS:Transposon_CDS
> > > CDS:history
> > > * CDS:twinscan
> > > * EST_match:BLAT_EST_BEST (~ EST_match:BLAT:EST_BEST)
> > > * EST_match:BLAT_EST_OTHER
> > > PCR_product:GenePair_STS
> > > PCR_product:Orfeome
> > > RNAi_reagent:RNAi_primary
> > > RNAi_reagent:RNAi_secondary
> > > SNP:Allele
> > > binding_site:binding_site
> > > * cDNA_match:BLAT_mRNA_BEST (~ cDNA_match:BLAT:mRNA_BEST )
> > > * cDNA_match:BLAT_mRNA_OTHER
> > > clone_end:.
> > > clone_start:.
> > > complex_substitution :Allele
> > > deletion:Allele
> > > exon:Coding_transcript
> > > * exon:Genefinder
> > > exon:Non_coding_transcript
> > > exon:Pseudogene
> > > exon:Transposon_CDS
> > > exon:history
> > > exon:miRNA
> > > exon:rRNA
> > > exon:scRNA
> > > exon:snRNA
> > > exon:snoRNA
> > > exon:tRNA
> > > * exon:tRNAscan-SE-1.23
> > > * exon:twinscan
> > > experimental_result_region:Expr_profile
> > > experimental_result_region:cDNA_for_RNAi
> > > * expressed_sequence_match:BLAT_OST_BEST (~
> > > expressed_sequence_match:BLAT:OST_BEST )
> > > * expressed_sequence_match:BLAT_OST_OTHER
> > > five_prime_UTR:Coding_transcript
> > > gene:Coding_transcript
> > > gene:gene
> > > gene:history
> > > gene:landmark
> > > insertion:Allele
> > > inverted_repeat:inverted
> > > mRNA:Coding_transcript
> > > * mRNA:Genefinder
> > > mRNA:Transposon_CDS
> > > mRNA:history
> > > * mRNA:twinscan
> > > miRNA:miRNA
> > > nc_primary_transcript:Non_coding_transcript
> > > * nucleotide_match:BLAT_EMBL_BEST (~ nucleotide_match:BLAT:EMBL_BEST )
> > > * nucleotide_match:BLAT_EMBL_OTHER
> > > * nucleotide_match:BLAT_TC1_BEST
> > > * nucleotide_match:BLAT_TC1_OTHER
> > > * nucleotide_match:BLAT_ncRNA_BEST
> > > * nucleotide_match:BLAT_ncRNA_OTHER
> > > * nucleotide_match:TEC_RED
> > > * nucleotide_match:waba_coding
> > > * nucleotide_match:waba_strong
> > > * nucleotide_match:waba_weak
> > > oligo:.
> > > operon:operon
> > > polyA_signal_sequence:polyA_signal_sequence
> > > polyA_site:polyA_site
> > > processed_transcript:gene
> > > protein_coding_primary_transcript:Coding_transcript
> > > * protein_match:wublastx
> > > pseudogene:Pseudogene
> > > pseudogene:history
> > > rRNA:rRNA
> > > reagent:Oligo_set
> > > region:.
> > > region:Genbank
> > > region:Genomic_canonical
> > > region:Link
> > > * repeat_region:RepeatMasker
> > > scRNA:scRNA
> > > sequence_variant:.
> > > sequence_variant:Allele
> > > snRNA:snRNA
> > > snoRNA:snoRNA
> > > substitution:Allele
> > > tRNA:tRNA
> > > * tRNA:tRNAscan-SE-1.23
> > > tandem_repeat:tandem
> > > three_prime_UTR:Coding_transcript
> > > trans_splice_acceptor_site:SL1
> > > trans_splice_acceptor_site:SL2
> > > transcript:SAGE_transcript
> > > * translated_nucleotide_match:BLAT_NEMATODE (~
> > > translated_nucleotide_match:BLAT:NEMATODE )
> > > transposable_element:Transposon
> > > transposable_element:Transposon_CDS
> > > transposable_element_insertion_site:Allele
> > > transposable_element_insertion_site:Mos_insertion_allele
> > >
> > >
> > > fly gff type:source
> > > ftp://ftp.flybase.net/genomes/dmel/current/gff/
> > > -----------------------
> > > BAC:.
> > > CDS:.
> > > aberration_junction:.
> > > chromosome:.
> > > chromosome_arm:.
> > > chromosome_band:.
> > > enhancer:.
> > > exon:.
> > > five_prime_UTR:.
> > > gene:.
> > > insertion_site:.
> > > intron:.
> > > mRNA:.
> > > * match:RNAiHDP
> > > * match:assembly:path
> > > * match:blastx:aa_SPTR.dmel
> > > * match:blastx:aa_SPTR.insect
> > > * match:blastx:aa_SPTR.othinv
> > > * match:blastx:aa_SPTR.othvert
> > > * match:blastx:aa_SPTR.plant
> > > * match:blastx:aa_SPTR.primate
> > > * match:blastx:aa_SPTR.rodent
> > > * match:blastx:aa_SPTR.worm
> > > * match:blastx:aa_SPTR.yeast
> > > * match:genscan
> > > * match:repeatmasker
> > > * match:sim4:na_ARGs.dros
> > > * match:sim4:na_ARGsCDS.dros
> > > * match:sim4:na_DGC_dros
> > > * match:sim4:na_dbEST.diff.dmel
> > > * match:sim4:na_dbEST.same.dmel
> > > * match:sim4:na_gadfly_dmel_r2
> > > * match:sim4:na_gb.dmel
> > > * match:sim4:na_gb.tpa.dmel
> > > * match:sim4:na_smallRNA.dros
> > > * match:sim4:na_transcript_dmel_r31
> > > * match:sim4:na_transcript_dmel_r32
> > > * match:tRNAscan-SE:.
> > > * match:tblastx:na_agambiae
> > > * match:tblastx:na_dbEST.insect
> > > * match:tblastx:na_dpse
> > > * match_part:RNAiHDP
> > > * match_part:assembly:path
> > > * match_part:blastx:aa_SPTR.dmel
> > > * match_part:blastx:aa_SPTR.insect
> > > * match_part:blastx:aa_SPTR.othinv
> > > * match_part:blastx:aa_SPTR.othvert
> > > * match_part:blastx:aa_SPTR.plant
> > > * match_part:blastx:aa_SPTR.primate
> > > * match_part:blastx:aa_SPTR.rodent
> > > * match_part:blastx:aa_SPTR.worm
> > > * match_part:blastx:aa_SPTR.yeast
> > > * match_part:genscan
> > > * match_part:repeatmasker
> > > * match_part:sim4:na_ARGs.dros
> > > * match_part:sim4:na_ARGsCDS.dros
> > > * match_part:sim4:na_DGC_dros
> > > * match_part:sim4:na_dbEST.diff.dmel
> > > * match_part:sim4:na_dbEST.same.dmel
> > > * match_part:sim4:na_gadfly_dmel_r2
> > > * match_part:sim4:na_gb.dmel
> > > * match_part:sim4:na_gb.tpa.dmel
> > > * match_part:sim4:na_smallRNA.dros
> > > * match_part:sim4:na_transcript_dmel_r31
> > > * match_part:sim4:na_transcript_dmel_r32
> > > * match_part:tRNAscan-SE:.
> > > * match_part:tblastx:na_agambiae
> > > * match_part:tblastx:na_dbEST.insect
> > > * match_part:tblastx:na_dpse
> > > mature_peptide:.
> > > ncRNA:.
> > > oligo:.
> > > point_mutation:.
> > > polyA_site:.
> > > protein_binding_site:.
> > > pseudogene:.
> > > region:.
> > > regulatory_region:.
> > > rescue_fragment:.
> > > scaffold:.
> > > sequence_variant:.
> > > snRNA:.
> > > snoRNA:.
> > > tRNA:.
> > > three_prime_UTR:.
> > > transcription_start_site:.
> > > transposable_element:.
> > > transposable_element_insertion_site:. 3116
> > >
> > >
> > > yeast gff type:source count
> > > ftp://genome-ftp.stanford.edu/pub/yeast/data_download/
> > > chromosomal_feature/saccharomyces_cerevisiae.gff
> > > -------------------------
> > > ARS:SGD
> > > CDS:SGD
> > > binding_site:SGD
> > > centromere:SGD
> > > chromosome:SGD
> > > gene:SGD
> > > insertion:SGD
> > > intron:SGD
> > > ncRNA:SGD
> > > nc_primary_transcript:SGD
> > > nucleotide_match:SGD
> > > pseudogene:SGD
> > > rRNA:SGD
> > > region:SGD
> > > region:landmark
> > > repeat_family:SGD
> > > repeat_region:SGD
> > > snRNA:SGD
> > > snoRNA:SGD
> > > tRNA:SGD
> > > telomere:SGD
> > > transposable_element:SGD
> > > transposable_element_gene:SGD
> > >
> > > -- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405
> > > -- gilbertd@indiana.edu -- http://marmot.bio.indiana.edu/
> > >
> > >
> > >
> > > -------------------------------------------------------
> > > This SF.Net email is sponsored by the 'Do More With Dual!' webinar
> > > happening
> > > July 14 at 8am PDT/11am EDT. We invite you to explore the latest in dual
> > > core and dual graphics technology at this free one hour event hosted
> > > by HP, AMD, and NVIDIA. To register visit
> > > http://www.hp.com/go/dualwebinar
> > > _______________________________________________
> > > Gmod-gbrowse mailing list
> > > Gmod-gbrowse@lists.sourceforge.net
> > > https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
> > >
> >
> >
> --
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                         cain@cshl.edu
> GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
> Cold Spring Harbor Laboratory
>
>
>
> -------------------------------------------------------
> SF.Net email is Sponsored by the Better Software Conference & EXPO September
> 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
> Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
> Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
> _______________________________________________
> Gmod-devel mailing list
> Gmod-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/gmod-devel
>


From birney at ebi.ac.uk  Thu Jul 28 19:20:39 2005
From: birney at ebi.ac.uk (Ewan Birney)
Date: Thu Jul 28 19:12:17 2005
Subject: [Bioperl-l] Fixing bioperl [was Re: [GMOD-devel] Re:
	[Gmod-gbrowse]
	Analysis features (Re: Final alpha release of gmod (chado))]
In-Reply-To: <Pine.OSX.4.58.0507281113390.8894@skerryvore.dhcp.lbl.gov>
References: <9331C217-F039-11D9-A447-000393B8D01C@indiana.edu>
	<42E909E3.2030102@infobiogen.fr>	<1122570166.3288.10.camel@localhost.localdomain>
	<Pine.OSX.4.58.0507281113390.8894@skerryvore.dhcp.lbl.gov>
Message-ID: <42E96847.1060900@ebi.ac.uk>


Just my $0.02 on this....


Chris - this seems bang on the money and what we should
do (roll back out the changes, extend the interface and then
in the extended interface have the "scruffy" types delegate
to the short_name or whatever in the main types).


So - for what it is worth, this is the way to go for me.


Chris Mungall wrote:
> [sorry for the cross-posting, but I think it's really important to have a
> gmod to bioperl chit chat on this. I've removed gmod-gbrowse from the cc
> list]
> 
> On Thu, 28 Jul 2005, Scott Cain wrote:
> 
> 
>>Hi Cyril,
>>
>>I think Bio::Tools::GFF is somewhat hacky and not a tool I would use to
>>produce 'safe' GFF3.  On the other hand Bio::FeatureIO is still a little
>>immature, but it is what I used for the chado GFF3 bulk loader, so it
>>does handle (parse) Target features.  So my suggestion would be to use
>>BFIO::gff, but be prepared for some problems; when you find them
>>complain loudly on the bioperl mailing list or fix the problems and
>>commit them (or both!).
> 
> 
> I think the answer may be even more complicated than this.
> 
> Lurkers and contributors to the bioperl mailing list may have noticed that
> there has been some major obstacles in progressing lately, particularly in
> getting a stable release of the code out. bp1.4 is fairly old, 1.5 is a
> developers release, though this is the one required by GMOD.
> 
> My understanding is that this bottleneck can be traced back to changes in
> the SeqFeature and Annotation model. These changes appear to be required
> by Bio::SeqFeature::Annotated which is produced by Bio::FeatureIO::gff
> (which in turn is used by the GMOD bulk loader, which is the main reason
> GMOD requires 1.5, I believe?). Unfortunately, these changes also break
> existing code and have a severe negative impact on memory usage.
> 
> Before advising Cyril and others to switch to BFIO::gff I think it's
> important to make sure there is a clear path forward with bioperl. My
> impression is that there is something of a stalemate here. The bioperl
> developers would like to retract the aforementioned changes, but they
> believe they cannot do this without breaking GMOD code.  They are also
> extremely uncomfortable about leaving these changes in. Everyone gives up
> and starts coding around bioperl.
> 
> Here is why the changes were introduced:
> 
> BioPerl has a 'scruffy' typing model, whereby feature types (primary_tag
> in bioperl) and featureprop types (tags in bioperl) are labels or strings.
> In contrast, Chado forces all types to be some class or relation in an
> ontology.
> 
> Now obviously I'm rather partial to the Chado model, but that doesn't mean
> I think it should be forced upon bioperl. I often use bioperl in scruffy
> mode (on scruffy data); or in some combination whereby I map the scruffy
> types to ontologies in some non-bioperl code. When using bioperl as a
> middleware component over a nicely organised database, ontology-typed mode
> is definitely best. However, the majority of bioperl users (including
> myself) spend a large proportion of their time working with scruffy data,
> in which case lightweight scruffy types are more appropriate.
> 
> It seems that there is a perfectly simple way of reconciling both
> approaches. We revert bioperl back to the simpler scruffy model. The
> majority of users and developers breathe a sigh of relief. We then extend
> SeqFeatureI with something like SeqFeatureAnnotatedI. This forces types to
> be stored as OntologyTerms (and I haven't even touched on some of the
> problems here, but at least we are insulating the standard bioperl layer
> that 99% of users use from these issues). All classes implementing SFAI
> will necessarily implement SFI, and the primary_tag and tag_values methods
> will be supported (not deprecated) as simple delegations to the
> OntologyTerm objects.
> 
> We can then modify BFIO::gff (which is an incredibly useful piece of code)
> and get rid of all the dependencies on SO and Bio::Ontology* and instead
> allow the user of this module to plug in their own resolver/validator - so
> they can choose whether they just want fast scruffy lightweight SFI
> features, or whether they want ontology-typed SFAI features. If the
> latter, then they can choose their own resolver strategy - by a user
> supplied hash, by a copy of SO auto-downloaded from sourceforge, by a
> local chado db, by the genbank->SO mapping table, during parsing vs
> post-parsing, whatever. In fact there is already
> Bio::SeqFeature::Tools::TypeMapper, but currently this is mostly concerned
> with helping Bio::SeqFeature::Tools::Unflattener convert scruffy genbank
> to something sensible.
> 
> GMOD (and perhaps biosql) would use SFAI, everyone else would use the
> simpler SFI. Someone can even get a stable 1.6 release out before all the
> SFAI details such as how the resolver would work are finalised. I'd really
> like to see 1.6 include a simpler BFIO::gff that can optionally produces
> features that aren't SeqFeature::Annotateds, but that's negotiable.
> 
> There's vast swathes of both GMOD and BioPerl code I'm not familiar with,
> so it's possible my analysis above is flawed in some way. If it is, then
> it's up to someone from either camp to speak up! If not, then there's no
> excuses for the relevant people to start sorting out this mess by
> commencing with the solution outlined above.
> 
> Cheers
> Chris
> 
> 
>>Scott
>>
>>
>>On Thu, 2005-07-28 at 18:37 +0200, Cyril Pommier wrote:
>>
>>>Hello,
>>>We are going to store analysis results in chado, and we are of course
>>>very interressed by these futur evolutions of GFF3/chado.
>>>So we would like to make sure that the parsers and conversions programs
>>>we are writing now will be compatible with the futur GFF3.
>>>
>>>We are using Bio::SeqFeature::Generic objects that we write with
>>>Bio::Tools::GFF.
>>>
>>>Do you think that Bio::Tools::GFF will be able to handle the new 'type'
>>>column or is it better to switch to Bio::FeatureIO::gff ?
>>>
>>>Thanks in advance for any advice.
>>>
>>>Cyril
>>>
>>>Don Gilbert wrote:
>>>
>>>
>>>>Scott,
>>>>
>>>>Your notes in gmod_bulk_load_gff3.pl suggest it is headed in
>>>>same direction I suggest below. More about these todo points
>>>>
>>>>
>>>>>- address flybase"s use of of analysisfeature combined with feature to
>>>>>give source-type information (in GFF terms). This will need to
>>>>>be addressed in the GBrowse adaptor.
>>>>>- modify the bulk loader to allow "mixed" GFF3 files (that is,
>>>>>containing
>>>>>both analysis results and annotations). See perldoc
>>>>>gmod_bulk_load_gff3.pl
>>>>>for more info
>>>>
>>>>
>>>>Use of chado's analysisfeature table is something others who know
>>>>it better can comment on. But after working with it for a while
>>>>it makes sense to me to use in this way:
>>>>
>>>>For a future GFF -> Chado loader, treat analysis features such as
>>>>gene finding results, BLAST, sim4 as 'analysisfeature type' rather
>>>>than feature CV term type (the ones that now end up with a generic
>>>>'match' cvterm). In these cases the Analysis table is populated with
>>>>program:database_sourcename
>>>>as the basis of this 'analysisfeature type', such as
>>>>match:blastx:na_pe.dros
>>>>match:sim4:DGC
>>>>match:genie:dummy (or maybe exon:genie)
>>>>
>>>>The program:database fits neatly in GFF source field, as
>>>>#ref source type start stop ...
>>>>chr1 blastx:na_pe.dros match 1 100 ...
>>>>chr1 sim4:DGC match 1 100 ...
>>>>
>>>>These can be treated in database adaptor analogously to the CVterm
>>>>table feature types. See at end a list of current GFF feature
>>>>type:source from worm, rice, yeast, fly MODs. Fly and rice use a
>>>>syntax like above and worm gff uses BLAT_EMBL_BEST, instead of
>>>>BLAT:EMBL_BEST.
>>>>
>>>>From POD of your bulk_load_gff3.pl
>>>>
>>>>>Analysis
>>>>>If you are loading analysis results (ie, BLAT results, gene
>>>>>predictions), you should specify the -a flag. If no arguments are
>>>>>supplied with the -a, then the loader will assume that the results
>>>>>belong to an analysis set with a name that is the concatenation of
>>>>>the source (column 2) and the method (column 3) with an underscore
>>>>>in between.
>>>>
>>>>"... then the loader will assume that the results belong to an
>>>>analysis table row with a program name and database source name
>>>>taken from Source (column 2, colon separated program:sourcename),
>>>>with a SOFA feature type taken from Method (column 3). If
>>>>sourcename doesn't apply, e.g. genefinder, don't add or use 'dummy'.
>>>>Use the generic 'match' SOFA type if others don't apply."
>>>>[see also http://song.sourceforge.net/gff3-jan04.shtml#ALIGNMENTS]
>>>>
>>>>Note that sourcename of database is a common attribute (all those
>>>>blasts, blats, sim4, ... are run on several different databases).
>>>>
>>>>For that underscore between method and source, where does that go into
>>>>database? It is used as parts of program or database sourcename names,
>>>>so it may be problematic to add one if not needed.
>>>>
>>>>Oh, I see now from bulk_load_gff3.PLS, you are creating a 'Name' entry
>>>>for analysis table. This probably is less useful than using Program
>>>>and Sourcename fields as flybase does, which comes from the common
>>>>usage where people run various programs, with various database sources
>>>>and want to plop the results into a database easily. These go into those
>>>>two fields directly, no need to create or parse a Name entry
>>>>(which can be and is null in flybase data).
>>>>
>>>>
>>>>>my $search_analysis
>>>>>= $db->prepare("SELECT analysis_id FROM analysis WHERE name=?");
>>>>
>>>>I think it would be better as
>>>>my $search_analysis
>>>>= $db->prepare("SELECT analysis_id FROM analysis WHERE program=? and
>>>>sourcename=?");
>>>>
>>>>
>>>>>Otherwise, the argument provided with -a will be taken
>>>>>as the name of the analysis set. Either way, the analysis set must
>>>>>already be in the analysis table. The easist way to do this is to
>>>>>insert it directly in the psql shell:
>>>>>
>>>>>INSERT INTO analysis (name, program, programversion)
>>>>>VALUES ('genscan 2005-2-28','genscan','5.4');
>>>>
>>>>My choice would be to populate the analysis table from GFF data, rather
>>>>than expect prepraration by user (or as another option).
>>>>
>>>>INSERT INTO analysis (program, sourcename)
>>>>VALUES ('tblastx','na_baylorf1_scfchunk.dpse');
>>>>INSERT INTO analysis (program, sourcename)
>>>>VALUES ('sim4','na_gb.dmel');
>>>>INSERT INTO analysis (program, sourcename, programversion)
>>>>VALUES ('genie_masked','dummy', '1.0');
>>>>
>>>>
>>>>>There are other columns in the analysis table that are optional; see
>>>>>the schema documentation and '\d analysis' in psql for more
>>>>>information.
>>>>>
>>>>
>>>>....
>>>>
>>>>>A planned addtion to the functionality of handling analysis results
>>>>>is to allow "mixed" GFF files, where some lines are analysis results
>>>>>and some are not.
>>>>
>>>>This is the case for drosophila GFF now (see others also below). If
>>>>you make the default assumption that if ($method =~ /.*match/) and
>>>>($source =~ m/([^:]+):(.+)/), you should get all/most of
>>>>analysisfeature types, and probably not anything else.
>>>>
>>>>
>>>>>Additionally, one will be able to supply lists of
>>>>>types (optionally with sources) and their associated entry in the
>>>>>analysis table. The format will probably be tag value pairs:
>>>>>
>>>>>--analysis match:Rice_est=rice_est_blast, \
>>>>>match:Maize_cDNA=maize_cdna_blast, \
>>>>>mRNA=genscan_prediction,exon=genscan_prediction
>>>>
>>>>My suggestion for this (as per GFF source,type columns) would be
>>>>--analysis match:program:sourcename ...
>>>>--analysis match:blast:Rice_est,match:blast:Maize_cDNA,\
>>>>mRNA:genscan:dummy, exon:genscan:dummy
>>>>
>>>>I guess the 'dummy' data sourcename need not be added; flybase uses it
>>>>to keep that field not-null, but it isn't required by the schema.
>>>>
>>>>Here are some snippets from the ChadoFC adaptor I modified
>>>>from yours (will get into cvs.sf.net 'real soon'), showing that
>>>>it isn't much work to add this as an analog to how cvterm types
>>>>are used.
>>>>
>>>>-- Don
>>>>
>>>>## Bio::DB::Das::ChadoFC.pm, part of new() - load analysis types
>>>>## treat similar to CV table types
>>>>
>>>>sub getAnalysisFeatureHash
>>>>{
>>>>my $self= shift;
>>>>
>>>>my $dbh= $self->dbh();
>>>>my $sth = $dbh->prepare("select analysis_id,program,sourcename from
>>>>analysis")
>>>>or warn "unable to prepare select cvterms";
>>>>$sth->execute or $self->throw("unable to select cvterms");
>>>>
>>>>my(%term2name,%name2term) = ({},{});
>>>>
>>>>while (my $hashref = $sth->fetchrow_hashref) {
>>>>
>>>>## this is dgg syntax of analysis feature names for GFF
>>>>## all have generic 'match' method and program:source as 'source'
>>>>## a problem, want other main types: EST_match:xxx, mRNA:genie .. etc.
>>>>my $anfeat= "match:".$hashref->{program}.":".$hashref->{sourcename};
>>>>
>>>>$term2name{ $hashref->{analysis_id} } = $anfeat;
>>>>$name2term{ $anfeat } = $hashref->{analysis_id};
>>>>}
>>>>$self->an_term2name(\%term2name);
>>>>$self->an_name2term(\%name2term);
>>>>}
>>>>
>>>>## Das::ChadoFC::Segment snippets
>>>>sub features {
>>>>$self->{has_anatype}=0;
>>>>my $sql_range = '';
>>>>my ($interbase_start,$rend,$srcfeature_id,$sql_types);
>>>>unless ($feature_id) {
>>>>$sql_range = $self->sql_range($rangetype);
>>>>
>>>>$sql_types = $self->sql_types($types, -1); # dgg
>>>>
>>>>$srcfeature_id = $self->{srcfeature_id};
>>>>}
>>>>...
>>>>elsif($self->{has_anatype}) {
>>>>$from_part .= "left join analysisfeature af using (feature_id) ";
>>>>}
>>>>
>>>>
>>>>sub sql_types
>>>>..
>>>>$valid_type = $factory->name2term($temp_type);
>>>>$is_anatype= 0;
>>>>unless ($valid_type) {
>>>>$valid_type = $factory->an_name2term($temp_type);
>>>>$self->{has_anatype}= $is_anatype= 1 if ($valid_type);
>>>>}
>>>>..
>>>>## leave out extra invalid types
>>>>if (!$valid_type) {
>>>>### skip
>>>>} elsif ($temp_dbxref) {
>>>>$sql_types .= $orsql."(f.type_id = $valid_type and fd.dbxref_id =
>>>>$temp_dbxref)";
>>>>} elsif($is_anatype) {
>>>>$sql_types .= $orsql."(af.analysis_id = $valid_type)"; #<<<
>>>>} else {
>>>>$sql_types .= $orsql."(f.type_id = $valid_type)";
>>>>}
>>>>
>>>>
>>>>Lists of GFF feature type:source from some current MOD data
>>>>where * are probably analysisfeature types (program:database)
>>>>
>>>>rice gff type:source
>>>>ftp://ftp.gramene.org/pub/gramene/release17/data/sequence_annotation/
>>>>gff3/
>>>>--------------------
>>>>CDS:known
>>>>CDS:tigr
>>>>EST:cmap
>>>>EST_match:Barley (? might be EST_match:someprogram:Barley)
>>>>EST_match:Maize
>>>>EST_match:Millet
>>>>EST_match:Rice
>>>>EST_match:Sorghum
>>>>EST_match:Wheat
>>>>cDNA_match:Rice
>>>>cross_genome_match:Maize
>>>>cross_genome_match:Rice
>>>>cross_genome_match:Sorghum
>>>>* exon:FgenesH:Monocot
>>>>exon:known
>>>>exon:tigr
>>>>five_prime_UTR:tigr
>>>>gene:known
>>>>gene:tigr
>>>>* mRNA:FgenesH:Monocot
>>>>mRNA:known
>>>>mRNA:tigr
>>>>microsatellite:cmap
>>>>three_prime_UTR:known
>>>>three_prime_UTR:tigr
>>>>transposable_element_insertion_site:cmap
>>>>
>>>>worm gff type:source
>>>>ftp://ftp.wormbase.org/pub/wormbase/species/elegans/
>>>>genome_feature_tables/GFF3/
>>>>----------------------
>>>>CDS:Coding_transcript
>>>>* CDS:Genefinder
>>>>CDS:Transposon_CDS
>>>>CDS:history
>>>>* CDS:twinscan
>>>>* EST_match:BLAT_EST_BEST (~ EST_match:BLAT:EST_BEST)
>>>>* EST_match:BLAT_EST_OTHER
>>>>PCR_product:GenePair_STS
>>>>PCR_product:Orfeome
>>>>RNAi_reagent:RNAi_primary
>>>>RNAi_reagent:RNAi_secondary
>>>>SNP:Allele
>>>>binding_site:binding_site
>>>>* cDNA_match:BLAT_mRNA_BEST (~ cDNA_match:BLAT:mRNA_BEST )
>>>>* cDNA_match:BLAT_mRNA_OTHER
>>>>clone_end:.
>>>>clone_start:.
>>>>complex_substitution :Allele
>>>>deletion:Allele
>>>>exon:Coding_transcript
>>>>* exon:Genefinder
>>>>exon:Non_coding_transcript
>>>>exon:Pseudogene
>>>>exon:Transposon_CDS
>>>>exon:history
>>>>exon:miRNA
>>>>exon:rRNA
>>>>exon:scRNA
>>>>exon:snRNA
>>>>exon:snoRNA
>>>>exon:tRNA
>>>>* exon:tRNAscan-SE-1.23
>>>>* exon:twinscan
>>>>experimental_result_region:Expr_profile
>>>>experimental_result_region:cDNA_for_RNAi
>>>>* expressed_sequence_match:BLAT_OST_BEST (~
>>>>expressed_sequence_match:BLAT:OST_BEST )
>>>>* expressed_sequence_match:BLAT_OST_OTHER
>>>>five_prime_UTR:Coding_transcript
>>>>gene:Coding_transcript
>>>>gene:gene
>>>>gene:history
>>>>gene:landmark
>>>>insertion:Allele
>>>>inverted_repeat:inverted
>>>>mRNA:Coding_transcript
>>>>* mRNA:Genefinder
>>>>mRNA:Transposon_CDS
>>>>mRNA:history
>>>>* mRNA:twinscan
>>>>miRNA:miRNA
>>>>nc_primary_transcript:Non_coding_transcript
>>>>* nucleotide_match:BLAT_EMBL_BEST (~ nucleotide_match:BLAT:EMBL_BEST )
>>>>* nucleotide_match:BLAT_EMBL_OTHER
>>>>* nucleotide_match:BLAT_TC1_BEST
>>>>* nucleotide_match:BLAT_TC1_OTHER
>>>>* nucleotide_match:BLAT_ncRNA_BEST
>>>>* nucleotide_match:BLAT_ncRNA_OTHER
>>>>* nucleotide_match:TEC_RED
>>>>* nucleotide_match:waba_coding
>>>>* nucleotide_match:waba_strong
>>>>* nucleotide_match:waba_weak
>>>>oligo:.
>>>>operon:operon
>>>>polyA_signal_sequence:polyA_signal_sequence
>>>>polyA_site:polyA_site
>>>>processed_transcript:gene
>>>>protein_coding_primary_transcript:Coding_transcript
>>>>* protein_match:wublastx
>>>>pseudogene:Pseudogene
>>>>pseudogene:history
>>>>rRNA:rRNA
>>>>reagent:Oligo_set
>>>>region:.
>>>>region:Genbank
>>>>region:Genomic_canonical
>>>>region:Link
>>>>* repeat_region:RepeatMasker
>>>>scRNA:scRNA
>>>>sequence_variant:.
>>>>sequence_variant:Allele
>>>>snRNA:snRNA
>>>>snoRNA:snoRNA
>>>>substitution:Allele
>>>>tRNA:tRNA
>>>>* tRNA:tRNAscan-SE-1.23
>>>>tandem_repeat:tandem
>>>>three_prime_UTR:Coding_transcript
>>>>trans_splice_acceptor_site:SL1
>>>>trans_splice_acceptor_site:SL2
>>>>transcript:SAGE_transcript
>>>>* translated_nucleotide_match:BLAT_NEMATODE (~
>>>>translated_nucleotide_match:BLAT:NEMATODE )
>>>>transposable_element:Transposon
>>>>transposable_element:Transposon_CDS
>>>>transposable_element_insertion_site:Allele
>>>>transposable_element_insertion_site:Mos_insertion_allele
>>>>
>>>>
>>>>fly gff type:source
>>>>ftp://ftp.flybase.net/genomes/dmel/current/gff/
>>>>-----------------------
>>>>BAC:.
>>>>CDS:.
>>>>aberration_junction:.
>>>>chromosome:.
>>>>chromosome_arm:.
>>>>chromosome_band:.
>>>>enhancer:.
>>>>exon:.
>>>>five_prime_UTR:.
>>>>gene:.
>>>>insertion_site:.
>>>>intron:.
>>>>mRNA:.
>>>>* match:RNAiHDP
>>>>* match:assembly:path
>>>>* match:blastx:aa_SPTR.dmel
>>>>* match:blastx:aa_SPTR.insect
>>>>* match:blastx:aa_SPTR.othinv
>>>>* match:blastx:aa_SPTR.othvert
>>>>* match:blastx:aa_SPTR.plant
>>>>* match:blastx:aa_SPTR.primate
>>>>* match:blastx:aa_SPTR.rodent
>>>>* match:blastx:aa_SPTR.worm
>>>>* match:blastx:aa_SPTR.yeast
>>>>* match:genscan
>>>>* match:repeatmasker
>>>>* match:sim4:na_ARGs.dros
>>>>* match:sim4:na_ARGsCDS.dros
>>>>* match:sim4:na_DGC_dros
>>>>* match:sim4:na_dbEST.diff.dmel
>>>>* match:sim4:na_dbEST.same.dmel
>>>>* match:sim4:na_gadfly_dmel_r2
>>>>* match:sim4:na_gb.dmel
>>>>* match:sim4:na_gb.tpa.dmel
>>>>* match:sim4:na_smallRNA.dros
>>>>* match:sim4:na_transcript_dmel_r31
>>>>* match:sim4:na_transcript_dmel_r32
>>>>* match:tRNAscan-SE:.
>>>>* match:tblastx:na_agambiae
>>>>* match:tblastx:na_dbEST.insect
>>>>* match:tblastx:na_dpse
>>>>* match_part:RNAiHDP
>>>>* match_part:assembly:path
>>>>* match_part:blastx:aa_SPTR.dmel
>>>>* match_part:blastx:aa_SPTR.insect
>>>>* match_part:blastx:aa_SPTR.othinv
>>>>* match_part:blastx:aa_SPTR.othvert
>>>>* match_part:blastx:aa_SPTR.plant
>>>>* match_part:blastx:aa_SPTR.primate
>>>>* match_part:blastx:aa_SPTR.rodent
>>>>* match_part:blastx:aa_SPTR.worm
>>>>* match_part:blastx:aa_SPTR.yeast
>>>>* match_part:genscan
>>>>* match_part:repeatmasker
>>>>* match_part:sim4:na_ARGs.dros
>>>>* match_part:sim4:na_ARGsCDS.dros
>>>>* match_part:sim4:na_DGC_dros
>>>>* match_part:sim4:na_dbEST.diff.dmel
>>>>* match_part:sim4:na_dbEST.same.dmel
>>>>* match_part:sim4:na_gadfly_dmel_r2
>>>>* match_part:sim4:na_gb.dmel
>>>>* match_part:sim4:na_gb.tpa.dmel
>>>>* match_part:sim4:na_smallRNA.dros
>>>>* match_part:sim4:na_transcript_dmel_r31
>>>>* match_part:sim4:na_transcript_dmel_r32
>>>>* match_part:tRNAscan-SE:.
>>>>* match_part:tblastx:na_agambiae
>>>>* match_part:tblastx:na_dbEST.insect
>>>>* match_part:tblastx:na_dpse
>>>>mature_peptide:.
>>>>ncRNA:.
>>>>oligo:.
>>>>point_mutation:.
>>>>polyA_site:.
>>>>protein_binding_site:.
>>>>pseudogene:.
>>>>region:.
>>>>regulatory_region:.
>>>>rescue_fragment:.
>>>>scaffold:.
>>>>sequence_variant:.
>>>>snRNA:.
>>>>snoRNA:.
>>>>tRNA:.
>>>>three_prime_UTR:.
>>>>transcription_start_site:.
>>>>transposable_element:.
>>>>transposable_element_insertion_site:. 3116
>>>>
>>>>
>>>>yeast gff type:source count
>>>>ftp://genome-ftp.stanford.edu/pub/yeast/data_download/
>>>>chromosomal_feature/saccharomyces_cerevisiae.gff
>>>>-------------------------
>>>>ARS:SGD
>>>>CDS:SGD
>>>>binding_site:SGD
>>>>centromere:SGD
>>>>chromosome:SGD
>>>>gene:SGD
>>>>insertion:SGD
>>>>intron:SGD
>>>>ncRNA:SGD
>>>>nc_primary_transcript:SGD
>>>>nucleotide_match:SGD
>>>>pseudogene:SGD
>>>>rRNA:SGD
>>>>region:SGD
>>>>region:landmark
>>>>repeat_family:SGD
>>>>repeat_region:SGD
>>>>snRNA:SGD
>>>>snoRNA:SGD
>>>>tRNA:SGD
>>>>telomere:SGD
>>>>transposable_element:SGD
>>>>transposable_element_gene:SGD
>>>>
>>>>-- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405
>>>>-- gilbertd@indiana.edu -- http://marmot.bio.indiana.edu/
>>>>
>>>>
>>>>
>>>>-------------------------------------------------------
>>>>This SF.Net email is sponsored by the 'Do More With Dual!' webinar
>>>>happening
>>>>July 14 at 8am PDT/11am EDT. We invite you to explore the latest in dual
>>>>core and dual graphics technology at this free one hour event hosted
>>>>by HP, AMD, and NVIDIA. To register visit
>>>>http://www.hp.com/go/dualwebinar
>>>>_______________________________________________
>>>>Gmod-gbrowse mailing list
>>>>Gmod-gbrowse@lists.sourceforge.net
>>>>https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
>>>>
>>>
>>>
>>--
>>------------------------------------------------------------------------
>>Scott Cain, Ph. D.                                         cain@cshl.edu
>>GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
>>Cold Spring Harbor Laboratory
>>
>>
>>
>>-------------------------------------------------------
>>SF.Net email is Sponsored by the Better Software Conference & EXPO September
>>19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
>>Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
>>Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
>>_______________________________________________
>>Gmod-devel mailing list
>>Gmod-devel@lists.sourceforge.net
>>https://lists.sourceforge.net/lists/listinfo/gmod-devel
>>
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
From pmiguel at purdue.edu  Fri Jul 29 10:45:09 2005
From: pmiguel at purdue.edu (Phillip SanMiguel)
Date: Fri Jul 29 10:36:13 2005
Subject: [Bioperl-l] Patching lucy
Message-ID: <42EA40F5.3090707@purdue.edu>

The patch to lucy source code from (the appendix):

http://doc.bioperl.org/releases/bioperl-1.4/Bio/Tools/Lucy.html

seems not to work for lucy-1.19p or lucy-1.19s. Actually patch runs 
fine, but the resulting executable (after make) seg faults when run on 
the lucy test data.

Any advice?

I've sent email directly to the module creator, Andrew G. Walsh, as 
requested in the module. But I'm not sure if the module creator 
regularly monitors the hotmail account listed therein. So I thought I'd 
post here, in case someone had a patch that would work on lucy-1.19.

-- 
Phillip SanMiguel
Purdue Genomics Core Facility
From cain at cshl.edu  Fri Jul 29 11:17:12 2005
From: cain at cshl.edu (Scott Cain)
Date: Fri Jul 29 11:07:52 2005
Subject: [Bioperl-l] Re: Fixing bioperl [was Re: [GMOD-devel] Re:
	[Gmod-gbrowse]
	Analysis features (Re: Final alpha release of gmod (chado))]
In-Reply-To: <Pine.OSX.4.58.0507281113390.8894@skerryvore.dhcp.lbl.gov>
References: <9331C217-F039-11D9-A447-000393B8D01C@indiana.edu>
	<42E909E3.2030102@infobiogen.fr>
	<1122570166.3288.10.camel@localhost.localdomain>
	<Pine.OSX.4.58.0507281113390.8894@skerryvore.dhcp.lbl.gov>
Message-ID: <1122650232.10455.31.camel@localhost.localdomain>

Hi Chris,

I agree that the changes you suggest below need to happen, and I am
willing to move forward with them.  After the last release of
gmod/chado, I was planning to restructure several sections of the gmod
architecture, so incorporating changes in bioperl will just go along for
the ride.

The main section of affected code in gmod is the GFF bulk loader, but
after we make the changes to the bioperl API, it shouldn't be too hard
to fix the loader.  In fact, some of those changes may have already
started.  I remember a few weeks before I release the gmod/chado
package, Hilmar sent out an announcement that he made some changes.
While I should have paid attention then, I was busy getting my release
together, and everything seemed to work, so I ignored it.
Unfortunately, the reason things continued to work was that I forgot to
update my bioperl-live, and as a result, the gmod release doesn't work
with bioperl-live.  So now, there is a tarball of bioperl released with
the gmod release.

OK, mentally put parenthesis around most of the last paragraph, as it is
mostly an aside.

The other section of code that could have been affected but won't be is
the ontology loader.  The current ontology loader depends on
Bio::Ontology, but I was already planning on migrating to go-perl for
loading ontologies anyway, so that won't be a problem.

So, who wants to take the lead on this?

Thanks,
Scott


On Thu, 2005-07-28 at 12:42 -0700, Chris Mungall wrote:
> I think the answer may be even more complicated than this.
> 
> Lurkers and contributors to the bioperl mailing list may have noticed that
> there has been some major obstacles in progressing lately, particularly in
> getting a stable release of the code out. bp1.4 is fairly old, 1.5 is a
> developers release, though this is the one required by GMOD.
> 
> My understanding is that this bottleneck can be traced back to changes in
> the SeqFeature and Annotation model. These changes appear to be required
> by Bio::SeqFeature::Annotated which is produced by Bio::FeatureIO::gff
> (which in turn is used by the GMOD bulk loader, which is the main reason
> GMOD requires 1.5, I believe?). Unfortunately, these changes also break
> existing code and have a severe negative impact on memory usage.
> 
> Before advising Cyril and others to switch to BFIO::gff I think it's
> important to make sure there is a clear path forward with bioperl. My
> impression is that there is something of a stalemate here. The bioperl
> developers would like to retract the aforementioned changes, but they
> believe they cannot do this without breaking GMOD code.  They are also
> extremely uncomfortable about leaving these changes in. Everyone gives up
> and starts coding around bioperl.
> 
> Here is why the changes were introduced:
> 
> BioPerl has a 'scruffy' typing model, whereby feature types (primary_tag
> in bioperl) and featureprop types (tags in bioperl) are labels or strings.
> In contrast, Chado forces all types to be some class or relation in an
> ontology.
> 
> Now obviously I'm rather partial to the Chado model, but that doesn't mean
> I think it should be forced upon bioperl. I often use bioperl in scruffy
> mode (on scruffy data); or in some combination whereby I map the scruffy
> types to ontologies in some non-bioperl code. When using bioperl as a
> middleware component over a nicely organised database, ontology-typed mode
> is definitely best. However, the majority of bioperl users (including
> myself) spend a large proportion of their time working with scruffy data,
> in which case lightweight scruffy types are more appropriate.
> 
> It seems that there is a perfectly simple way of reconciling both
> approaches. We revert bioperl back to the simpler scruffy model. The
> majority of users and developers breathe a sigh of relief. We then extend
> SeqFeatureI with something like SeqFeatureAnnotatedI. This forces types to
> be stored as OntologyTerms (and I haven't even touched on some of the
> problems here, but at least we are insulating the standard bioperl layer
> that 99% of users use from these issues). All classes implementing SFAI
> will necessarily implement SFI, and the primary_tag and tag_values methods
> will be supported (not deprecated) as simple delegations to the
> OntologyTerm objects.
> 
> We can then modify BFIO::gff (which is an incredibly useful piece of code)
> and get rid of all the dependencies on SO and Bio::Ontology* and instead
> allow the user of this module to plug in their own resolver/validator - so
> they can choose whether they just want fast scruffy lightweight SFI
> features, or whether they want ontology-typed SFAI features. If the
> latter, then they can choose their own resolver strategy - by a user
> supplied hash, by a copy of SO auto-downloaded from sourceforge, by a
> local chado db, by the genbank->SO mapping table, during parsing vs
> post-parsing, whatever. In fact there is already
> Bio::SeqFeature::Tools::TypeMapper, but currently this is mostly concerned
> with helping Bio::SeqFeature::Tools::Unflattener convert scruffy genbank
> to something sensible.
> 
> GMOD (and perhaps biosql) would use SFAI, everyone else would use the
> simpler SFI. Someone can even get a stable 1.6 release out before all the
> SFAI details such as how the resolver would work are finalised. I'd really
> like to see 1.6 include a simpler BFIO::gff that can optionally produces
> features that aren't SeqFeature::Annotateds, but that's negotiable.
> 
> There's vast swathes of both GMOD and BioPerl code I'm not familiar with,
> so it's possible my analysis above is flawed in some way. If it is, then
> it's up to someone from either camp to speak up! If not, then there's no
> excuses for the relevant people to start sorting out this mess by
> commencing with the solution outlined above.
> 
> Cheers
> Chris
> 
> >
> > Scott
> >
> >
> > On Thu, 2005-07-28 at 18:37 +0200, Cyril Pommier wrote:
> > > Hello,
> > > We are going to store analysis results in chado, and we are of course
> > > very interressed by these futur evolutions of GFF3/chado.
> > > So we would like to make sure that the parsers and conversions programs
> > > we are writing now will be compatible with the futur GFF3.
> > >
> > > We are using Bio::SeqFeature::Generic objects that we write with
> > > Bio::Tools::GFF.
> > >
> > > Do you think that Bio::Tools::GFF will be able to handle the new 'type'
> > > column or is it better to switch to Bio::FeatureIO::gff ?
> > >
> > > Thanks in advance for any advice.
> > >
> > > Cyril
> > >
> > > Don Gilbert wrote:
> > >
> > > >
> > > > Scott,
> > > >
> > > > Your notes in gmod_bulk_load_gff3.pl suggest it is headed in
> > > > same direction I suggest below. More about these todo points
> > > >
> > > >> - address flybase"s use of of analysisfeature combined with feature to
> > > >> give source-type information (in GFF terms). This will need to
> > > >> be addressed in the GBrowse adaptor.
> > > >> - modify the bulk loader to allow "mixed" GFF3 files (that is,
> > > >> containing
> > > >> both analysis results and annotations). See perldoc
> > > >> gmod_bulk_load_gff3.pl
> > > >> for more info
> > > >
> > > >
> > > > Use of chado's analysisfeature table is something others who know
> > > > it better can comment on. But after working with it for a while
> > > > it makes sense to me to use in this way:
> > > >
> > > > For a future GFF -> Chado loader, treat analysis features such as
> > > > gene finding results, BLAST, sim4 as 'analysisfeature type' rather
> > > > than feature CV term type (the ones that now end up with a generic
> > > > 'match' cvterm). In these cases the Analysis table is populated with
> > > > program:database_sourcename
> > > > as the basis of this 'analysisfeature type', such as
> > > > match:blastx:na_pe.dros
> > > > match:sim4:DGC
> > > > match:genie:dummy (or maybe exon:genie)
> > > >
> > > > The program:database fits neatly in GFF source field, as
> > > > #ref source type start stop ...
> > > > chr1 blastx:na_pe.dros match 1 100 ...
> > > > chr1 sim4:DGC match 1 100 ...
> > > >
> > > > These can be treated in database adaptor analogously to the CVterm
> > > > table feature types. See at end a list of current GFF feature
> > > > type:source from worm, rice, yeast, fly MODs. Fly and rice use a
> > > > syntax like above and worm gff uses BLAT_EMBL_BEST, instead of
> > > > BLAT:EMBL_BEST.
> > > >
> > > > From POD of your bulk_load_gff3.pl
> > > > > Analysis
> > > > > If you are loading analysis results (ie, BLAT results, gene
> > > > > predictions), you should specify the -a flag. If no arguments are
> > > > > supplied with the -a, then the loader will assume that the results
> > > > > belong to an analysis set with a name that is the concatenation of
> > > > > the source (column 2) and the method (column 3) with an underscore
> > > > > in between.
> > > >
> > > > "... then the loader will assume that the results belong to an
> > > > analysis table row with a program name and database source name
> > > > taken from Source (column 2, colon separated program:sourcename),
> > > > with a SOFA feature type taken from Method (column 3). If
> > > > sourcename doesn't apply, e.g. genefinder, don't add or use 'dummy'.
> > > > Use the generic 'match' SOFA type if others don't apply."
> > > > [see also http://song.sourceforge.net/gff3-jan04.shtml#ALIGNMENTS]
> > > >
> > > > Note that sourcename of database is a common attribute (all those
> > > > blasts, blats, sim4, ... are run on several different databases).
> > > >
> > > > For that underscore between method and source, where does that go into
> > > > database? It is used as parts of program or database sourcename names,
> > > > so it may be problematic to add one if not needed.
> > > >
> > > > Oh, I see now from bulk_load_gff3.PLS, you are creating a 'Name' entry
> > > > for analysis table. This probably is less useful than using Program
> > > > and Sourcename fields as flybase does, which comes from the common
> > > > usage where people run various programs, with various database sources
> > > > and want to plop the results into a database easily. These go into those
> > > > two fields directly, no need to create or parse a Name entry
> > > > (which can be and is null in flybase data).
> > > >
> > > > > my $search_analysis
> > > > > = $db->prepare("SELECT analysis_id FROM analysis WHERE name=?");
> > > >
> > > > I think it would be better as
> > > > my $search_analysis
> > > > = $db->prepare("SELECT analysis_id FROM analysis WHERE program=? and
> > > > sourcename=?");
> > > >
> > > > > Otherwise, the argument provided with -a will be taken
> > > > > as the name of the analysis set. Either way, the analysis set must
> > > > > already be in the analysis table. The easist way to do this is to
> > > > > insert it directly in the psql shell:
> > > > >
> > > > > INSERT INTO analysis (name, program, programversion)
> > > > > VALUES ('genscan 2005-2-28','genscan','5.4');
> > > >
> > > > My choice would be to populate the analysis table from GFF data, rather
> > > > than expect prepraration by user (or as another option).
> > > >
> > > > INSERT INTO analysis (program, sourcename)
> > > > VALUES ('tblastx','na_baylorf1_scfchunk.dpse');
> > > > INSERT INTO analysis (program, sourcename)
> > > > VALUES ('sim4','na_gb.dmel');
> > > > INSERT INTO analysis (program, sourcename, programversion)
> > > > VALUES ('genie_masked','dummy', '1.0');
> > > >
> > > > > There are other columns in the analysis table that are optional; see
> > > > > the schema documentation and '\d analysis' in psql for more
> > > > > information.
> > > > >
> > > > ....
> > > > > A planned addtion to the functionality of handling analysis results
> > > > > is to allow "mixed" GFF files, where some lines are analysis results
> > > > > and some are not.
> > > >
> > > > This is the case for drosophila GFF now (see others also below). If
> > > > you make the default assumption that if ($method =~ /.*match/) and
> > > > ($source =~ m/([^:]+):(.+)/), you should get all/most of
> > > > analysisfeature types, and probably not anything else.
> > > >
> > > > > Additionally, one will be able to supply lists of
> > > > > types (optionally with sources) and their associated entry in the
> > > > > analysis table. The format will probably be tag value pairs:
> > > > >
> > > > > --analysis match:Rice_est=rice_est_blast, \
> > > > > match:Maize_cDNA=maize_cdna_blast, \
> > > > > mRNA=genscan_prediction,exon=genscan_prediction
> > > >
> > > > My suggestion for this (as per GFF source,type columns) would be
> > > > --analysis match:program:sourcename ...
> > > > --analysis match:blast:Rice_est,match:blast:Maize_cDNA,\
> > > > mRNA:genscan:dummy, exon:genscan:dummy
> > > >
> > > > I guess the 'dummy' data sourcename need not be added; flybase uses it
> > > > to keep that field not-null, but it isn't required by the schema.
> > > >
> > > > Here are some snippets from the ChadoFC adaptor I modified
> > > > from yours (will get into cvs.sf.net 'real soon'), showing that
> > > > it isn't much work to add this as an analog to how cvterm types
> > > > are used.
> > > >
> > > > -- Don
> > > >
> > > > ## Bio::DB::Das::ChadoFC.pm, part of new() - load analysis types
> > > > ## treat similar to CV table types
> > > >
> > > > sub getAnalysisFeatureHash
> > > > {
> > > > my $self= shift;
> > > >
> > > > my $dbh= $self->dbh();
> > > > my $sth = $dbh->prepare("select analysis_id,program,sourcename from
> > > > analysis")
> > > > or warn "unable to prepare select cvterms";
> > > > $sth->execute or $self->throw("unable to select cvterms");
> > > >
> > > > my(%term2name,%name2term) = ({},{});
> > > >
> > > > while (my $hashref = $sth->fetchrow_hashref) {
> > > >
> > > > ## this is dgg syntax of analysis feature names for GFF
> > > > ## all have generic 'match' method and program:source as 'source'
> > > > ## a problem, want other main types: EST_match:xxx, mRNA:genie .. etc.
> > > > my $anfeat= "match:".$hashref->{program}.":".$hashref->{sourcename};
> > > >
> > > > $term2name{ $hashref->{analysis_id} } = $anfeat;
> > > > $name2term{ $anfeat } = $hashref->{analysis_id};
> > > > }
> > > > $self->an_term2name(\%term2name);
> > > > $self->an_name2term(\%name2term);
> > > > }
> > > >
> > > > ## Das::ChadoFC::Segment snippets
> > > > sub features {
> > > > $self->{has_anatype}=0;
> > > > my $sql_range = '';
> > > > my ($interbase_start,$rend,$srcfeature_id,$sql_types);
> > > > unless ($feature_id) {
> > > > $sql_range = $self->sql_range($rangetype);
> > > >
> > > > $sql_types = $self->sql_types($types, -1); # dgg
> > > >
> > > > $srcfeature_id = $self->{srcfeature_id};
> > > > }
> > > > ...
> > > > elsif($self->{has_anatype}) {
> > > > $from_part .= "left join analysisfeature af using (feature_id) ";
> > > > }
> > > >
> > > >
> > > > sub sql_types
> > > > ..
> > > > $valid_type = $factory->name2term($temp_type);
> > > > $is_anatype= 0;
> > > > unless ($valid_type) {
> > > > $valid_type = $factory->an_name2term($temp_type);
> > > > $self->{has_anatype}= $is_anatype= 1 if ($valid_type);
> > > > }
> > > > ..
> > > > ## leave out extra invalid types
> > > > if (!$valid_type) {
> > > > ### skip
> > > > } elsif ($temp_dbxref) {
> > > > $sql_types .= $orsql."(f.type_id = $valid_type and fd.dbxref_id =
> > > > $temp_dbxref)";
> > > > } elsif($is_anatype) {
> > > > $sql_types .= $orsql."(af.analysis_id = $valid_type)"; #<<<
> > > > } else {
> > > > $sql_types .= $orsql."(f.type_id = $valid_type)";
> > > > }
> > > >
> > > >
> > > > Lists of GFF feature type:source from some current MOD data
> > > > where * are probably analysisfeature types (program:database)
> > > >
> > > > rice gff type:source
> > > > ftp://ftp.gramene.org/pub/gramene/release17/data/sequence_annotation/
> > > > gff3/
> > > > --------------------
> > > > CDS:known
> > > > CDS:tigr
> > > > EST:cmap
> > > > EST_match:Barley (? might be EST_match:someprogram:Barley)
> > > > EST_match:Maize
> > > > EST_match:Millet
> > > > EST_match:Rice
> > > > EST_match:Sorghum
> > > > EST_match:Wheat
> > > > cDNA_match:Rice
> > > > cross_genome_match:Maize
> > > > cross_genome_match:Rice
> > > > cross_genome_match:Sorghum
> > > > * exon:FgenesH:Monocot
> > > > exon:known
> > > > exon:tigr
> > > > five_prime_UTR:tigr
> > > > gene:known
> > > > gene:tigr
> > > > * mRNA:FgenesH:Monocot
> > > > mRNA:known
> > > > mRNA:tigr
> > > > microsatellite:cmap
> > > > three_prime_UTR:known
> > > > three_prime_UTR:tigr
> > > > transposable_element_insertion_site:cmap
> > > >
> > > > worm gff type:source
> > > > ftp://ftp.wormbase.org/pub/wormbase/species/elegans/
> > > > genome_feature_tables/GFF3/
> > > > ----------------------
> > > > CDS:Coding_transcript
> > > > * CDS:Genefinder
> > > > CDS:Transposon_CDS
> > > > CDS:history
> > > > * CDS:twinscan
> > > > * EST_match:BLAT_EST_BEST (~ EST_match:BLAT:EST_BEST)
> > > > * EST_match:BLAT_EST_OTHER
> > > > PCR_product:GenePair_STS
> > > > PCR_product:Orfeome
> > > > RNAi_reagent:RNAi_primary
> > > > RNAi_reagent:RNAi_secondary
> > > > SNP:Allele
> > > > binding_site:binding_site
> > > > * cDNA_match:BLAT_mRNA_BEST (~ cDNA_match:BLAT:mRNA_BEST )
> > > > * cDNA_match:BLAT_mRNA_OTHER
> > > > clone_end:.
> > > > clone_start:.
> > > > complex_substitution :Allele
> > > > deletion:Allele
> > > > exon:Coding_transcript
> > > > * exon:Genefinder
> > > > exon:Non_coding_transcript
> > > > exon:Pseudogene
> > > > exon:Transposon_CDS
> > > > exon:history
> > > > exon:miRNA
> > > > exon:rRNA
> > > > exon:scRNA
> > > > exon:snRNA
> > > > exon:snoRNA
> > > > exon:tRNA
> > > > * exon:tRNAscan-SE-1.23
> > > > * exon:twinscan
> > > > experimental_result_region:Expr_profile
> > > > experimental_result_region:cDNA_for_RNAi
> > > > * expressed_sequence_match:BLAT_OST_BEST (~
> > > > expressed_sequence_match:BLAT:OST_BEST )
> > > > * expressed_sequence_match:BLAT_OST_OTHER
> > > > five_prime_UTR:Coding_transcript
> > > > gene:Coding_transcript
> > > > gene:gene
> > > > gene:history
> > > > gene:landmark
> > > > insertion:Allele
> > > > inverted_repeat:inverted
> > > > mRNA:Coding_transcript
> > > > * mRNA:Genefinder
> > > > mRNA:Transposon_CDS
> > > > mRNA:history
> > > > * mRNA:twinscan
> > > > miRNA:miRNA
> > > > nc_primary_transcript:Non_coding_transcript
> > > > * nucleotide_match:BLAT_EMBL_BEST (~ nucleotide_match:BLAT:EMBL_BEST )
> > > > * nucleotide_match:BLAT_EMBL_OTHER
> > > > * nucleotide_match:BLAT_TC1_BEST
> > > > * nucleotide_match:BLAT_TC1_OTHER
> > > > * nucleotide_match:BLAT_ncRNA_BEST
> > > > * nucleotide_match:BLAT_ncRNA_OTHER
> > > > * nucleotide_match:TEC_RED
> > > > * nucleotide_match:waba_coding
> > > > * nucleotide_match:waba_strong
> > > > * nucleotide_match:waba_weak
> > > > oligo:.
> > > > operon:operon
> > > > polyA_signal_sequence:polyA_signal_sequence
> > > > polyA_site:polyA_site
> > > > processed_transcript:gene
> > > > protein_coding_primary_transcript:Coding_transcript
> > > > * protein_match:wublastx
> > > > pseudogene:Pseudogene
> > > > pseudogene:history
> > > > rRNA:rRNA
> > > > reagent:Oligo_set
> > > > region:.
> > > > region:Genbank
> > > > region:Genomic_canonical
> > > > region:Link
> > > > * repeat_region:RepeatMasker
> > > > scRNA:scRNA
> > > > sequence_variant:.
> > > > sequence_variant:Allele
> > > > snRNA:snRNA
> > > > snoRNA:snoRNA
> > > > substitution:Allele
> > > > tRNA:tRNA
> > > > * tRNA:tRNAscan-SE-1.23
> > > > tandem_repeat:tandem
> > > > three_prime_UTR:Coding_transcript
> > > > trans_splice_acceptor_site:SL1
> > > > trans_splice_acceptor_site:SL2
> > > > transcript:SAGE_transcript
> > > > * translated_nucleotide_match:BLAT_NEMATODE (~
> > > > translated_nucleotide_match:BLAT:NEMATODE )
> > > > transposable_element:Transposon
> > > > transposable_element:Transposon_CDS
> > > > transposable_element_insertion_site:Allele
> > > > transposable_element_insertion_site:Mos_insertion_allele
> > > >
> > > >
> > > > fly gff type:source
> > > > ftp://ftp.flybase.net/genomes/dmel/current/gff/
> > > > -----------------------
> > > > BAC:.
> > > > CDS:.
> > > > aberration_junction:.
> > > > chromosome:.
> > > > chromosome_arm:.
> > > > chromosome_band:.
> > > > enhancer:.
> > > > exon:.
> > > > five_prime_UTR:.
> > > > gene:.
> > > > insertion_site:.
> > > > intron:.
> > > > mRNA:.
> > > > * match:RNAiHDP
> > > > * match:assembly:path
> > > > * match:blastx:aa_SPTR.dmel
> > > > * match:blastx:aa_SPTR.insect
> > > > * match:blastx:aa_SPTR.othinv
> > > > * match:blastx:aa_SPTR.othvert
> > > > * match:blastx:aa_SPTR.plant
> > > > * match:blastx:aa_SPTR.primate
> > > > * match:blastx:aa_SPTR.rodent
> > > > * match:blastx:aa_SPTR.worm
> > > > * match:blastx:aa_SPTR.yeast
> > > > * match:genscan
> > > > * match:repeatmasker
> > > > * match:sim4:na_ARGs.dros
> > > > * match:sim4:na_ARGsCDS.dros
> > > > * match:sim4:na_DGC_dros
> > > > * match:sim4:na_dbEST.diff.dmel
> > > > * match:sim4:na_dbEST.same.dmel
> > > > * match:sim4:na_gadfly_dmel_r2
> > > > * match:sim4:na_gb.dmel
> > > > * match:sim4:na_gb.tpa.dmel
> > > > * match:sim4:na_smallRNA.dros
> > > > * match:sim4:na_transcript_dmel_r31
> > > > * match:sim4:na_transcript_dmel_r32
> > > > * match:tRNAscan-SE:.
> > > > * match:tblastx:na_agambiae
> > > > * match:tblastx:na_dbEST.insect
> > > > * match:tblastx:na_dpse
> > > > * match_part:RNAiHDP
> > > > * match_part:assembly:path
> > > > * match_part:blastx:aa_SPTR.dmel
> > > > * match_part:blastx:aa_SPTR.insect
> > > > * match_part:blastx:aa_SPTR.othinv
> > > > * match_part:blastx:aa_SPTR.othvert
> > > > * match_part:blastx:aa_SPTR.plant
> > > > * match_part:blastx:aa_SPTR.primate
> > > > * match_part:blastx:aa_SPTR.rodent
> > > > * match_part:blastx:aa_SPTR.worm
> > > > * match_part:blastx:aa_SPTR.yeast
> > > > * match_part:genscan
> > > > * match_part:repeatmasker
> > > > * match_part:sim4:na_ARGs.dros
> > > > * match_part:sim4:na_ARGsCDS.dros
> > > > * match_part:sim4:na_DGC_dros
> > > > * match_part:sim4:na_dbEST.diff.dmel
> > > > * match_part:sim4:na_dbEST.same.dmel
> > > > * match_part:sim4:na_gadfly_dmel_r2
> > > > * match_part:sim4:na_gb.dmel
> > > > * match_part:sim4:na_gb.tpa.dmel
> > > > * match_part:sim4:na_smallRNA.dros
> > > > * match_part:sim4:na_transcript_dmel_r31
> > > > * match_part:sim4:na_transcript_dmel_r32
> > > > * match_part:tRNAscan-SE:.
> > > > * match_part:tblastx:na_agambiae
> > > > * match_part:tblastx:na_dbEST.insect
> > > > * match_part:tblastx:na_dpse
> > > > mature_peptide:.
> > > > ncRNA:.
> > > > oligo:.
> > > > point_mutation:.
> > > > polyA_site:.
> > > > protein_binding_site:.
> > > > pseudogene:.
> > > > region:.
> > > > regulatory_region:.
> > > > rescue_fragment:.
> > > > scaffold:.
> > > > sequence_variant:.
> > > > snRNA:.
> > > > snoRNA:.
> > > > tRNA:.
> > > > three_prime_UTR:.
> > > > transcription_start_site:.
> > > > transposable_element:.
> > > > transposable_element_insertion_site:. 3116
> > > >
> > > >
> > > > yeast gff type:source count
> > > > ftp://genome-ftp.stanford.edu/pub/yeast/data_download/
> > > > chromosomal_feature/saccharomyces_cerevisiae.gff
> > > > -------------------------
> > > > ARS:SGD
> > > > CDS:SGD
> > > > binding_site:SGD
> > > > centromere:SGD
> > > > chromosome:SGD
> > > > gene:SGD
> > > > insertion:SGD
> > > > intron:SGD
> > > > ncRNA:SGD
> > > > nc_primary_transcript:SGD
> > > > nucleotide_match:SGD
> > > > pseudogene:SGD
> > > > rRNA:SGD
> > > > region:SGD
> > > > region:landmark
> > > > repeat_family:SGD
> > > > repeat_region:SGD
> > > > snRNA:SGD
> > > > snoRNA:SGD
> > > > tRNA:SGD
> > > > telomere:SGD
> > > > transposable_element:SGD
> > > > transposable_element_gene:SGD
> > > >
> > > > -- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405
> > > > -- gilbertd@indiana.edu -- http://marmot.bio.indiana.edu/
> > > >
> > > >
> > > >
> > > > -------------------------------------------------------
> > > > This SF.Net email is sponsored by the 'Do More With Dual!' webinar
> > > > happening
> > > > July 14 at 8am PDT/11am EDT. We invite you to explore the latest in dual
> > > > core and dual graphics technology at this free one hour event hosted
> > > > by HP, AMD, and NVIDIA. To register visit
> > > > http://www.hp.com/go/dualwebinar
> > > > _______________________________________________
> > > > Gmod-gbrowse mailing list
> > > > Gmod-gbrowse@lists.sourceforge.net
> > > > https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
> > > >
> > >
> > >
> > --
> > ------------------------------------------------------------------------
> > Scott Cain, Ph. D.                                         cain@cshl.edu
> > GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
> > Cold Spring Harbor Laboratory
> >
> >
> >
> > -------------------------------------------------------
> > SF.Net email is Sponsored by the Better Software Conference & EXPO September
> > 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
> > Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
> > Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
> > _______________________________________________
> > Gmod-devel mailing list
> > Gmod-devel@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/gmod-devel
> >
> 
> 
> 
> 
> -------------------------------------------------------
> SF.Net email is Sponsored by the Better Software Conference & EXPO September
> 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
> Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
> Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
> _______________________________________________
> Gmod-devel mailing list
> Gmod-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/gmod-devel
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain@cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory

From mebradley at chem.ufl.edu  Fri Jul 29 17:39:08 2005
From: mebradley at chem.ufl.edu (Michael Bradley)
Date: Fri Jul 29 17:47:02 2005
Subject: [Bioperl-l] constructing a tree object
Message-ID: <003201c59485$f68500c0$ab05a8c0@bradleydell>

Can anyone tell me how to do $treeObj = Bio::TreeIO->new(-file
"somefile", -format 'newick' ) from a variable instead of a file?
 
Suppose that my tree is stored in $treestring. I would like to do
something like : $treeObj = Bio::TreeIO->new(-$treestring, -format
'newick' ) .
 
Thanks, 
 
Mike Bradley 
From hlapp at gnf.org  Fri Jul 29 20:07:35 2005
From: hlapp at gnf.org (Hilmar Lapp)
Date: Fri Jul 29 19:58:09 2005
Subject: [Bioperl-l] Re: Fixing bioperl [was Re: [GMOD-devel] Re:
	[Gmod-gbrowse] Analysis features (Re: Final alpha release of
	gmod (chado))]
In-Reply-To: <Pine.OSX.4.58.0507281113390.8894@skerryvore.dhcp.lbl.gov>
References: <9331C217-F039-11D9-A447-000393B8D01C@indiana.edu>
	<42E909E3.2030102@infobiogen.fr>
	<1122570166.3288.10.camel@localhost.localdomain>
	<Pine.OSX.4.58.0507281113390.8894@skerryvore.dhcp.lbl.gov>
Message-ID: <08c0281f4eda0b376b27944a7aa99191@gnf.org>

Hi Chris,

this sounds like a way to go. As you note, I'm not very comfortable  
with transforming the API to use structured objects where flat strings  
will do just fine in most applications until someone demonstrates that  
this doesn't substantially impact performance/memory hogging in  
large-throughput use-cases. And operator overloading is just too  
bug-prone IMHO.

The one thing that makes me hesitate is the introduction of another  
interface - but maybe I should be cool if you are ;) OTOH, adding typed  
methods to SeqFeatureI instead of recasting the existing ones maybe  
just causes as much confusion.

	-hilmar

On Jul 28, 2005, at 12:42 PM, Chris Mungall wrote:

>
> [sorry for the cross-posting, but I think it's really important to  
> have a
> gmod to bioperl chit chat on this. I've removed gmod-gbrowse from the  
> cc
> list]
>
> On Thu, 28 Jul 2005, Scott Cain wrote:
>
>> Hi Cyril,
>>
>> I think Bio::Tools::GFF is somewhat hacky and not a tool I would use  
>> to
>> produce 'safe' GFF3.  On the other hand Bio::FeatureIO is still a  
>> little
>> immature, but it is what I used for the chado GFF3 bulk loader, so it
>> does handle (parse) Target features.  So my suggestion would be to use
>> BFIO::gff, but be prepared for some problems; when you find them
>> complain loudly on the bioperl mailing list or fix the problems and
>> commit them (or both!).
>
> I think the answer may be even more complicated than this.
>
> Lurkers and contributors to the bioperl mailing list may have noticed  
> that
> there has been some major obstacles in progressing lately,  
> particularly in
> getting a stable release of the code out. bp1.4 is fairly old, 1.5 is a
> developers release, though this is the one required by GMOD.
>
> My understanding is that this bottleneck can be traced back to changes  
> in
> the SeqFeature and Annotation model. These changes appear to be  
> required
> by Bio::SeqFeature::Annotated which is produced by Bio::FeatureIO::gff
> (which in turn is used by the GMOD bulk loader, which is the main  
> reason
> GMOD requires 1.5, I believe?). Unfortunately, these changes also break
> existing code and have a severe negative impact on memory usage.
>
> Before advising Cyril and others to switch to BFIO::gff I think it's
> important to make sure there is a clear path forward with bioperl. My
> impression is that there is something of a stalemate here. The bioperl
> developers would like to retract the aforementioned changes, but they
> believe they cannot do this without breaking GMOD code.  They are also
> extremely uncomfortable about leaving these changes in. Everyone gives  
> up
> and starts coding around bioperl.
>
> Here is why the changes were introduced:
>
> BioPerl has a 'scruffy' typing model, whereby feature types  
> (primary_tag
> in bioperl) and featureprop types (tags in bioperl) are labels or  
> strings.
> In contrast, Chado forces all types to be some class or relation in an
> ontology.
>
> Now obviously I'm rather partial to the Chado model, but that doesn't  
> mean
> I think it should be forced upon bioperl. I often use bioperl in  
> scruffy
> mode (on scruffy data); or in some combination whereby I map the  
> scruffy
> types to ontologies in some non-bioperl code. When using bioperl as a
> middleware component over a nicely organised database, ontology-typed  
> mode
> is definitely best. However, the majority of bioperl users (including
> myself) spend a large proportion of their time working with scruffy  
> data,
> in which case lightweight scruffy types are more appropriate.
>
> It seems that there is a perfectly simple way of reconciling both
> approaches. We revert bioperl back to the simpler scruffy model. The
> majority of users and developers breathe a sigh of relief. We then  
> extend
> SeqFeatureI with something like SeqFeatureAnnotatedI. This forces  
> types to
> be stored as OntologyTerms (and I haven't even touched on some of the
> problems here, but at least we are insulating the standard bioperl  
> layer
> that 99% of users use from these issues). All classes implementing SFAI
> will necessarily implement SFI, and the primary_tag and tag_values  
> methods
> will be supported (not deprecated) as simple delegations to the
> OntologyTerm objects.
>
> We can then modify BFIO::gff (which is an incredibly useful piece of  
> code)
> and get rid of all the dependencies on SO and Bio::Ontology* and  
> instead
> allow the user of this module to plug in their own resolver/validator  
> - so
> they can choose whether they just want fast scruffy lightweight SFI
> features, or whether they want ontology-typed SFAI features. If the
> latter, then they can choose their own resolver strategy - by a user
> supplied hash, by a copy of SO auto-downloaded from sourceforge, by a
> local chado db, by the genbank->SO mapping table, during parsing vs
> post-parsing, whatever. In fact there is already
> Bio::SeqFeature::Tools::TypeMapper, but currently this is mostly  
> concerned
> with helping Bio::SeqFeature::Tools::Unflattener convert scruffy  
> genbank
> to something sensible.
>
> GMOD (and perhaps biosql) would use SFAI, everyone else would use the
> simpler SFI. Someone can even get a stable 1.6 release out before all  
> the
> SFAI details such as how the resolver would work are finalised. I'd  
> really
> like to see 1.6 include a simpler BFIO::gff that can optionally  
> produces
> features that aren't SeqFeature::Annotateds, but that's negotiable.
>
> There's vast swathes of both GMOD and BioPerl code I'm not familiar  
> with,
> so it's possible my analysis above is flawed in some way. If it is,  
> then
> it's up to someone from either camp to speak up! If not, then there's  
> no
> excuses for the relevant people to start sorting out this mess by
> commencing with the solution outlined above.
>
> Cheers
> Chris
>
>>
>> Scott
>>
>>
>> On Thu, 2005-07-28 at 18:37 +0200, Cyril Pommier wrote:
>>> Hello,
>>> We are going to store analysis results in chado, and we are of course
>>> very interressed by these futur evolutions of GFF3/chado.
>>> So we would like to make sure that the parsers and conversions  
>>> programs
>>> we are writing now will be compatible with the futur GFF3.
>>>
>>> We are using Bio::SeqFeature::Generic objects that we write with
>>> Bio::Tools::GFF.
>>>
>>> Do you think that Bio::Tools::GFF will be able to handle the new  
>>> 'type'
>>> column or is it better to switch to Bio::FeatureIO::gff ?
>>>
>>> Thanks in advance for any advice.
>>>
>>> Cyril
>>>
>>> Don Gilbert wrote:
>>>
>>>>
>>>> Scott,
>>>>
>>>> Your notes in gmod_bulk_load_gff3.pl suggest it is headed in
>>>> same direction I suggest below. More about these todo points
>>>>
>>>>> - address flybase"s use of of analysisfeature combined with  
>>>>> feature to
>>>>> give source-type information (in GFF terms). This will need to
>>>>> be addressed in the GBrowse adaptor.
>>>>> - modify the bulk loader to allow "mixed" GFF3 files (that is,
>>>>> containing
>>>>> both analysis results and annotations). See perldoc
>>>>> gmod_bulk_load_gff3.pl
>>>>> for more info
>>>>
>>>>
>>>> Use of chado's analysisfeature table is something others who know
>>>> it better can comment on. But after working with it for a while
>>>> it makes sense to me to use in this way:
>>>>
>>>> For a future GFF -> Chado loader, treat analysis features such as
>>>> gene finding results, BLAST, sim4 as 'analysisfeature type' rather
>>>> than feature CV term type (the ones that now end up with a generic
>>>> 'match' cvterm). In these cases the Analysis table is populated with
>>>> program:database_sourcename
>>>> as the basis of this 'analysisfeature type', such as
>>>> match:blastx:na_pe.dros
>>>> match:sim4:DGC
>>>> match:genie:dummy (or maybe exon:genie)
>>>>
>>>> The program:database fits neatly in GFF source field, as
>>>> #ref source type start stop ...
>>>> chr1 blastx:na_pe.dros match 1 100 ...
>>>> chr1 sim4:DGC match 1 100 ...
>>>>
>>>> These can be treated in database adaptor analogously to the CVterm
>>>> table feature types. See at end a list of current GFF feature
>>>> type:source from worm, rice, yeast, fly MODs. Fly and rice use a
>>>> syntax like above and worm gff uses BLAT_EMBL_BEST, instead of
>>>> BLAT:EMBL_BEST.
>>>>
>>>> From POD of your bulk_load_gff3.pl
>>>>> Analysis
>>>>> If you are loading analysis results (ie, BLAT results, gene
>>>>> predictions), you should specify the -a flag. If no arguments are
>>>>> supplied with the -a, then the loader will assume that the results
>>>>> belong to an analysis set with a name that is the concatenation of
>>>>> the source (column 2) and the method (column 3) with an underscore
>>>>> in between.
>>>>
>>>> "... then the loader will assume that the results belong to an
>>>> analysis table row with a program name and database source name
>>>> taken from Source (column 2, colon separated program:sourcename),
>>>> with a SOFA feature type taken from Method (column 3). If
>>>> sourcename doesn't apply, e.g. genefinder, don't add or use 'dummy'.
>>>> Use the generic 'match' SOFA type if others don't apply."
>>>> [see also http://song.sourceforge.net/gff3-jan04.shtml#ALIGNMENTS]
>>>>
>>>> Note that sourcename of database is a common attribute (all those
>>>> blasts, blats, sim4, ... are run on several different databases).
>>>>
>>>> For that underscore between method and source, where does that go  
>>>> into
>>>> database? It is used as parts of program or database sourcename  
>>>> names,
>>>> so it may be problematic to add one if not needed.
>>>>
>>>> Oh, I see now from bulk_load_gff3.PLS, you are creating a 'Name'  
>>>> entry
>>>> for analysis table. This probably is less useful than using Program
>>>> and Sourcename fields as flybase does, which comes from the common
>>>> usage where people run various programs, with various database  
>>>> sources
>>>> and want to plop the results into a database easily. These go into  
>>>> those
>>>> two fields directly, no need to create or parse a Name entry
>>>> (which can be and is null in flybase data).
>>>>
>>>>> my $search_analysis
>>>>> = $db->prepare("SELECT analysis_id FROM analysis WHERE name=?");
>>>>
>>>> I think it would be better as
>>>> my $search_analysis
>>>> = $db->prepare("SELECT analysis_id FROM analysis WHERE program=? and
>>>> sourcename=?");
>>>>
>>>>> Otherwise, the argument provided with -a will be taken
>>>>> as the name of the analysis set. Either way, the analysis set must
>>>>> already be in the analysis table. The easist way to do this is to
>>>>> insert it directly in the psql shell:
>>>>>
>>>>> INSERT INTO analysis (name, program, programversion)
>>>>> VALUES ('genscan 2005-2-28','genscan','5.4');
>>>>
>>>> My choice would be to populate the analysis table from GFF data,  
>>>> rather
>>>> than expect prepraration by user (or as another option).
>>>>
>>>> INSERT INTO analysis (program, sourcename)
>>>> VALUES ('tblastx','na_baylorf1_scfchunk.dpse');
>>>> INSERT INTO analysis (program, sourcename)
>>>> VALUES ('sim4','na_gb.dmel');
>>>> INSERT INTO analysis (program, sourcename, programversion)
>>>> VALUES ('genie_masked','dummy', '1.0');
>>>>
>>>>> There are other columns in the analysis table that are optional;  
>>>>> see
>>>>> the schema documentation and '\d analysis' in psql for more
>>>>> information.
>>>>>
>>>> ....
>>>>> A planned addtion to the functionality of handling analysis results
>>>>> is to allow "mixed" GFF files, where some lines are analysis  
>>>>> results
>>>>> and some are not.
>>>>
>>>> This is the case for drosophila GFF now (see others also below). If
>>>> you make the default assumption that if ($method =~ /.*match/) and
>>>> ($source =~ m/([^:]+):(.+)/), you should get all/most of
>>>> analysisfeature types, and probably not anything else.
>>>>
>>>>> Additionally, one will be able to supply lists of
>>>>> types (optionally with sources) and their associated entry in the
>>>>> analysis table. The format will probably be tag value pairs:
>>>>>
>>>>> --analysis match:Rice_est=rice_est_blast, \
>>>>> match:Maize_cDNA=maize_cdna_blast, \
>>>>> mRNA=genscan_prediction,exon=genscan_prediction
>>>>
>>>> My suggestion for this (as per GFF source,type columns) would be
>>>> --analysis match:program:sourcename ...
>>>> --analysis match:blast:Rice_est,match:blast:Maize_cDNA,\
>>>> mRNA:genscan:dummy, exon:genscan:dummy
>>>>
>>>> I guess the 'dummy' data sourcename need not be added; flybase uses  
>>>> it
>>>> to keep that field not-null, but it isn't required by the schema.
>>>>
>>>> Here are some snippets from the ChadoFC adaptor I modified
>>>> from yours (will get into cvs.sf.net 'real soon'), showing that
>>>> it isn't much work to add this as an analog to how cvterm types
>>>> are used.
>>>>
>>>> -- Don
>>>>
>>>> ## Bio::DB::Das::ChadoFC.pm, part of new() - load analysis types
>>>> ## treat similar to CV table types
>>>>
>>>> sub getAnalysisFeatureHash
>>>> {
>>>> my $self= shift;
>>>>
>>>> my $dbh= $self->dbh();
>>>> my $sth = $dbh->prepare("select analysis_id,program,sourcename from
>>>> analysis")
>>>> or warn "unable to prepare select cvterms";
>>>> $sth->execute or $self->throw("unable to select cvterms");
>>>>
>>>> my(%term2name,%name2term) = ({},{});
>>>>
>>>> while (my $hashref = $sth->fetchrow_hashref) {
>>>>
>>>> ## this is dgg syntax of analysis feature names for GFF
>>>> ## all have generic 'match' method and program:source as 'source'
>>>> ## a problem, want other main types: EST_match:xxx, mRNA:genie ..  
>>>> etc.
>>>> my $anfeat= "match:".$hashref->{program}.":".$hashref->{sourcename};
>>>>
>>>> $term2name{ $hashref->{analysis_id} } = $anfeat;
>>>> $name2term{ $anfeat } = $hashref->{analysis_id};
>>>> }
>>>> $self->an_term2name(\%term2name);
>>>> $self->an_name2term(\%name2term);
>>>> }
>>>>
>>>> ## Das::ChadoFC::Segment snippets
>>>> sub features {
>>>> $self->{has_anatype}=0;
>>>> my $sql_range = '';
>>>> my ($interbase_start,$rend,$srcfeature_id,$sql_types);
>>>> unless ($feature_id) {
>>>> $sql_range = $self->sql_range($rangetype);
>>>>
>>>> $sql_types = $self->sql_types($types, -1); # dgg
>>>>
>>>> $srcfeature_id = $self->{srcfeature_id};
>>>> }
>>>> ...
>>>> elsif($self->{has_anatype}) {
>>>> $from_part .= "left join analysisfeature af using (feature_id) ";
>>>> }
>>>>
>>>>
>>>> sub sql_types
>>>> ..
>>>> $valid_type = $factory->name2term($temp_type);
>>>> $is_anatype= 0;
>>>> unless ($valid_type) {
>>>> $valid_type = $factory->an_name2term($temp_type);
>>>> $self->{has_anatype}= $is_anatype= 1 if ($valid_type);
>>>> }
>>>> ..
>>>> ## leave out extra invalid types
>>>> if (!$valid_type) {
>>>> ### skip
>>>> } elsif ($temp_dbxref) {
>>>> $sql_types .= $orsql."(f.type_id = $valid_type and fd.dbxref_id =
>>>> $temp_dbxref)";
>>>> } elsif($is_anatype) {
>>>> $sql_types .= $orsql."(af.analysis_id = $valid_type)"; #<<<
>>>> } else {
>>>> $sql_types .= $orsql."(f.type_id = $valid_type)";
>>>> }
>>>>
>>>>
>>>> Lists of GFF feature type:source from some current MOD data
>>>> where * are probably analysisfeature types (program:database)
>>>>
>>>> rice gff type:source
>>>> ftp://ftp.gramene.org/pub/gramene/release17/data/ 
>>>> sequence_annotation/
>>>> gff3/
>>>> --------------------
>>>> CDS:known
>>>> CDS:tigr
>>>> EST:cmap
>>>> EST_match:Barley (? might be EST_match:someprogram:Barley)
>>>> EST_match:Maize
>>>> EST_match:Millet
>>>> EST_match:Rice
>>>> EST_match:Sorghum
>>>> EST_match:Wheat
>>>> cDNA_match:Rice
>>>> cross_genome_match:Maize
>>>> cross_genome_match:Rice
>>>> cross_genome_match:Sorghum
>>>> * exon:FgenesH:Monocot
>>>> exon:known
>>>> exon:tigr
>>>> five_prime_UTR:tigr
>>>> gene:known
>>>> gene:tigr
>>>> * mRNA:FgenesH:Monocot
>>>> mRNA:known
>>>> mRNA:tigr
>>>> microsatellite:cmap
>>>> three_prime_UTR:known
>>>> three_prime_UTR:tigr
>>>> transposable_element_insertion_site:cmap
>>>>
>>>> worm gff type:source
>>>> ftp://ftp.wormbase.org/pub/wormbase/species/elegans/
>>>> genome_feature_tables/GFF3/
>>>> ----------------------
>>>> CDS:Coding_transcript
>>>> * CDS:Genefinder
>>>> CDS:Transposon_CDS
>>>> CDS:history
>>>> * CDS:twinscan
>>>> * EST_match:BLAT_EST_BEST (~ EST_match:BLAT:EST_BEST)
>>>> * EST_match:BLAT_EST_OTHER
>>>> PCR_product:GenePair_STS
>>>> PCR_product:Orfeome
>>>> RNAi_reagent:RNAi_primary
>>>> RNAi_reagent:RNAi_secondary
>>>> SNP:Allele
>>>> binding_site:binding_site
>>>> * cDNA_match:BLAT_mRNA_BEST (~ cDNA_match:BLAT:mRNA_BEST )
>>>> * cDNA_match:BLAT_mRNA_OTHER
>>>> clone_end:.
>>>> clone_start:.
>>>> complex_substitution :Allele
>>>> deletion:Allele
>>>> exon:Coding_transcript
>>>> * exon:Genefinder
>>>> exon:Non_coding_transcript
>>>> exon:Pseudogene
>>>> exon:Transposon_CDS
>>>> exon:history
>>>> exon:miRNA
>>>> exon:rRNA
>>>> exon:scRNA
>>>> exon:snRNA
>>>> exon:snoRNA
>>>> exon:tRNA
>>>> * exon:tRNAscan-SE-1.23
>>>> * exon:twinscan
>>>> experimental_result_region:Expr_profile
>>>> experimental_result_region:cDNA_for_RNAi
>>>> * expressed_sequence_match:BLAT_OST_BEST (~
>>>> expressed_sequence_match:BLAT:OST_BEST )
>>>> * expressed_sequence_match:BLAT_OST_OTHER
>>>> five_prime_UTR:Coding_transcript
>>>> gene:Coding_transcript
>>>> gene:gene
>>>> gene:history
>>>> gene:landmark
>>>> insertion:Allele
>>>> inverted_repeat:inverted
>>>> mRNA:Coding_transcript
>>>> * mRNA:Genefinder
>>>> mRNA:Transposon_CDS
>>>> mRNA:history
>>>> * mRNA:twinscan
>>>> miRNA:miRNA
>>>> nc_primary_transcript:Non_coding_transcript
>>>> * nucleotide_match:BLAT_EMBL_BEST (~  
>>>> nucleotide_match:BLAT:EMBL_BEST )
>>>> * nucleotide_match:BLAT_EMBL_OTHER
>>>> * nucleotide_match:BLAT_TC1_BEST
>>>> * nucleotide_match:BLAT_TC1_OTHER
>>>> * nucleotide_match:BLAT_ncRNA_BEST
>>>> * nucleotide_match:BLAT_ncRNA_OTHER
>>>> * nucleotide_match:TEC_RED
>>>> * nucleotide_match:waba_coding
>>>> * nucleotide_match:waba_strong
>>>> * nucleotide_match:waba_weak
>>>> oligo:.
>>>> operon:operon
>>>> polyA_signal_sequence:polyA_signal_sequence
>>>> polyA_site:polyA_site
>>>> processed_transcript:gene
>>>> protein_coding_primary_transcript:Coding_transcript
>>>> * protein_match:wublastx
>>>> pseudogene:Pseudogene
>>>> pseudogene:history
>>>> rRNA:rRNA
>>>> reagent:Oligo_set
>>>> region:.
>>>> region:Genbank
>>>> region:Genomic_canonical
>>>> region:Link
>>>> * repeat_region:RepeatMasker
>>>> scRNA:scRNA
>>>> sequence_variant:.
>>>> sequence_variant:Allele
>>>> snRNA:snRNA
>>>> snoRNA:snoRNA
>>>> substitution:Allele
>>>> tRNA:tRNA
>>>> * tRNA:tRNAscan-SE-1.23
>>>> tandem_repeat:tandem
>>>> three_prime_UTR:Coding_transcript
>>>> trans_splice_acceptor_site:SL1
>>>> trans_splice_acceptor_site:SL2
>>>> transcript:SAGE_transcript
>>>> * translated_nucleotide_match:BLAT_NEMATODE (~
>>>> translated_nucleotide_match:BLAT:NEMATODE )
>>>> transposable_element:Transposon
>>>> transposable_element:Transposon_CDS
>>>> transposable_element_insertion_site:Allele
>>>> transposable_element_insertion_site:Mos_insertion_allele
>>>>
>>>>
>>>> fly gff type:source
>>>> ftp://ftp.flybase.net/genomes/dmel/current/gff/
>>>> -----------------------
>>>> BAC:.
>>>> CDS:.
>>>> aberration_junction:.
>>>> chromosome:.
>>>> chromosome_arm:.
>>>> chromosome_band:.
>>>> enhancer:.
>>>> exon:.
>>>> five_prime_UTR:.
>>>> gene:.
>>>> insertion_site:.
>>>> intron:.
>>>> mRNA:.
>>>> * match:RNAiHDP
>>>> * match:assembly:path
>>>> * match:blastx:aa_SPTR.dmel
>>>> * match:blastx:aa_SPTR.insect
>>>> * match:blastx:aa_SPTR.othinv
>>>> * match:blastx:aa_SPTR.othvert
>>>> * match:blastx:aa_SPTR.plant
>>>> * match:blastx:aa_SPTR.primate
>>>> * match:blastx:aa_SPTR.rodent
>>>> * match:blastx:aa_SPTR.worm
>>>> * match:blastx:aa_SPTR.yeast
>>>> * match:genscan
>>>> * match:repeatmasker
>>>> * match:sim4:na_ARGs.dros
>>>> * match:sim4:na_ARGsCDS.dros
>>>> * match:sim4:na_DGC_dros
>>>> * match:sim4:na_dbEST.diff.dmel
>>>> * match:sim4:na_dbEST.same.dmel
>>>> * match:sim4:na_gadfly_dmel_r2
>>>> * match:sim4:na_gb.dmel
>>>> * match:sim4:na_gb.tpa.dmel
>>>> * match:sim4:na_smallRNA.dros
>>>> * match:sim4:na_transcript_dmel_r31
>>>> * match:sim4:na_transcript_dmel_r32
>>>> * match:tRNAscan-SE:.
>>>> * match:tblastx:na_agambiae
>>>> * match:tblastx:na_dbEST.insect
>>>> * match:tblastx:na_dpse
>>>> * match_part:RNAiHDP
>>>> * match_part:assembly:path
>>>> * match_part:blastx:aa_SPTR.dmel
>>>> * match_part:blastx:aa_SPTR.insect
>>>> * match_part:blastx:aa_SPTR.othinv
>>>> * match_part:blastx:aa_SPTR.othvert
>>>> * match_part:blastx:aa_SPTR.plant
>>>> * match_part:blastx:aa_SPTR.primate
>>>> * match_part:blastx:aa_SPTR.rodent
>>>> * match_part:blastx:aa_SPTR.worm
>>>> * match_part:blastx:aa_SPTR.yeast
>>>> * match_part:genscan
>>>> * match_part:repeatmasker
>>>> * match_part:sim4:na_ARGs.dros
>>>> * match_part:sim4:na_ARGsCDS.dros
>>>> * match_part:sim4:na_DGC_dros
>>>> * match_part:sim4:na_dbEST.diff.dmel
>>>> * match_part:sim4:na_dbEST.same.dmel
>>>> * match_part:sim4:na_gadfly_dmel_r2
>>>> * match_part:sim4:na_gb.dmel
>>>> * match_part:sim4:na_gb.tpa.dmel
>>>> * match_part:sim4:na_smallRNA.dros
>>>> * match_part:sim4:na_transcript_dmel_r31
>>>> * match_part:sim4:na_transcript_dmel_r32
>>>> * match_part:tRNAscan-SE:.
>>>> * match_part:tblastx:na_agambiae
>>>> * match_part:tblastx:na_dbEST.insect
>>>> * match_part:tblastx:na_dpse
>>>> mature_peptide:.
>>>> ncRNA:.
>>>> oligo:.
>>>> point_mutation:.
>>>> polyA_site:.
>>>> protein_binding_site:.
>>>> pseudogene:.
>>>> region:.
>>>> regulatory_region:.
>>>> rescue_fragment:.
>>>> scaffold:.
>>>> sequence_variant:.
>>>> snRNA:.
>>>> snoRNA:.
>>>> tRNA:.
>>>> three_prime_UTR:.
>>>> transcription_start_site:.
>>>> transposable_element:.
>>>> transposable_element_insertion_site:. 3116
>>>>
>>>>
>>>> yeast gff type:source count
>>>> ftp://genome-ftp.stanford.edu/pub/yeast/data_download/
>>>> chromosomal_feature/saccharomyces_cerevisiae.gff
>>>> -------------------------
>>>> ARS:SGD
>>>> CDS:SGD
>>>> binding_site:SGD
>>>> centromere:SGD
>>>> chromosome:SGD
>>>> gene:SGD
>>>> insertion:SGD
>>>> intron:SGD
>>>> ncRNA:SGD
>>>> nc_primary_transcript:SGD
>>>> nucleotide_match:SGD
>>>> pseudogene:SGD
>>>> rRNA:SGD
>>>> region:SGD
>>>> region:landmark
>>>> repeat_family:SGD
>>>> repeat_region:SGD
>>>> snRNA:SGD
>>>> snoRNA:SGD
>>>> tRNA:SGD
>>>> telomere:SGD
>>>> transposable_element:SGD
>>>> transposable_element_gene:SGD
>>>>
>>>> -- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405
>>>> -- gilbertd@indiana.edu -- http://marmot.bio.indiana.edu/
>>>>
>>>>
>>>>
>>>> -------------------------------------------------------
>>>> This SF.Net email is sponsored by the 'Do More With Dual!' webinar
>>>> happening
>>>> July 14 at 8am PDT/11am EDT. We invite you to explore the latest in  
>>>> dual
>>>> core and dual graphics technology at this free one hour event hosted
>>>> by HP, AMD, and NVIDIA. To register visit
>>>> http://www.hp.com/go/dualwebinar
>>>> _______________________________________________
>>>> Gmod-gbrowse mailing list
>>>> Gmod-gbrowse@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
>>>>
>>>
>>>
>> --
>> ---------------------------------------------------------------------- 
>> --
>> Scott Cain, Ph. D.                                          
>> cain@cshl.edu
>> GMOD Coordinator (http://www.gmod.org/)                      
>> 216-392-3087
>> Cold Spring Harbor Laboratory
>>
>>
>>
>> -------------------------------------------------------
>> SF.Net email is Sponsored by the Better Software Conference & EXPO  
>> September
>> 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
>> Agile & Plan-Driven Development * Managing Projects & Teams * Testing  
>> & QA
>> Security * Process Improvement & Measurement *  
>> http://www.sqe.com/bsce5sf
>> _______________________________________________
>> Gmod-devel mailing list
>> Gmod-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/gmod-devel
>>
>
>
>
>
> -------------------------------------------------------
> SF.Net email is Sponsored by the Better Software Conference & EXPO  
> September
> 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
> Agile & Plan-Driven Development * Managing Projects & Teams * Testing  
> & QA
> Security * Process Improvement & Measurement *  
> http://www.sqe.com/bsce5sf
> _______________________________________________
> Gmod-devel mailing list
> Gmod-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/gmod-devel
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From hlapp at gnf.org  Fri Jul 29 20:20:19 2005
From: hlapp at gnf.org (Hilmar Lapp)
Date: Fri Jul 29 20:10:54 2005
Subject: [Bioperl-l] Re: Fixing bioperl [was Re: Analysis features]
In-Reply-To: <1122650232.10455.31.camel@localhost.localdomain>
References: <9331C217-F039-11D9-A447-000393B8D01C@indiana.edu>
	<42E909E3.2030102@infobiogen.fr>
	<1122570166.3288.10.camel@localhost.localdomain>
	<Pine.OSX.4.58.0507281113390.8894@skerryvore.dhcp.lbl.gov>
	<1122650232.10455.31.camel@localhost.localdomain>
Message-ID: <51a02b5bd508f35301ee3c847b104895@gnf.org>


On Jul 29, 2005, at 8:17 AM, Scott Cain wrote:

>
> The main section of affected code in gmod is the GFF bulk loader, but
> after we make the changes to the bioperl API, it shouldn't be too hard
> to fix the loader.  In fact, some of those changes may have already
> started.  I remember a few weeks before I release the gmod/chado
> package, Hilmar sent out an announcement that he made some changes.

You mean around the time of ISMB? I fixed the ontology modules ... they  
should actually work better now not worse unless you assumed the  
presence of some bugs ;)

> While I should have paid attention then, I was busy getting my release
> together, and everything seemed to work, so I ignored it.
> Unfortunately, the reason things continued to work was that I forgot to
> update my bioperl-live, and as a result, the gmod release doesn't work
> with bioperl-live.

Scott, what would really help sometimes is if in such a situation you  
run the bioperl test suite and report the result if there are any  
failures, especially those that appear potentially connected to your  
problem. Last time the gmod ontology loader ceased to work the problem  
would have been readily exposed by the ontology tests in bioperl. It  
just helps in zooming in on the problem.

I'd be eager to help make bioperl work with gmod and vice versa and I'm  
sure many others are too, but it'll be difficult if we don't work  
towards this collaboratively. For this I really liked the spirit of  
Chris' proposal - that's the way to make this work.

> [...]
> The other section of code that could have been affected but won't be is
> the ontology loader.  The current ontology loader depends on
> Bio::Ontology, but I was already planning on migrating to go-perl for
> loading ontologies anyway, so that won't be a problem.

I'm closing in on the last bugs in the go-perl integration. It remains  
to be seen how fast the result is as Chris made me aware in Detroit,  
but if it works this will give you both worlds at your choosing.

	-hilmar

>
> So, who wants to take the lead on this?
>
> Thanks,
> Scott
>
>
> On Thu, 2005-07-28 at 12:42 -0700, Chris Mungall wrote:
>> I think the answer may be even more complicated than this.
>>
>> Lurkers and contributors to the bioperl mailing list may have noticed  
>> that
>> there has been some major obstacles in progressing lately,  
>> particularly in
>> getting a stable release of the code out. bp1.4 is fairly old, 1.5 is  
>> a
>> developers release, though this is the one required by GMOD.
>>
>> My understanding is that this bottleneck can be traced back to  
>> changes in
>> the SeqFeature and Annotation model. These changes appear to be  
>> required
>> by Bio::SeqFeature::Annotated which is produced by Bio::FeatureIO::gff
>> (which in turn is used by the GMOD bulk loader, which is the main  
>> reason
>> GMOD requires 1.5, I believe?). Unfortunately, these changes also  
>> break
>> existing code and have a severe negative impact on memory usage.
>>
>> Before advising Cyril and others to switch to BFIO::gff I think it's
>> important to make sure there is a clear path forward with bioperl. My
>> impression is that there is something of a stalemate here. The bioperl
>> developers would like to retract the aforementioned changes, but they
>> believe they cannot do this without breaking GMOD code.  They are also
>> extremely uncomfortable about leaving these changes in. Everyone  
>> gives up
>> and starts coding around bioperl.
>>
>> Here is why the changes were introduced:
>>
>> BioPerl has a 'scruffy' typing model, whereby feature types  
>> (primary_tag
>> in bioperl) and featureprop types (tags in bioperl) are labels or  
>> strings.
>> In contrast, Chado forces all types to be some class or relation in an
>> ontology.
>>
>> Now obviously I'm rather partial to the Chado model, but that doesn't  
>> mean
>> I think it should be forced upon bioperl. I often use bioperl in  
>> scruffy
>> mode (on scruffy data); or in some combination whereby I map the  
>> scruffy
>> types to ontologies in some non-bioperl code. When using bioperl as a
>> middleware component over a nicely organised database, ontology-typed  
>> mode
>> is definitely best. However, the majority of bioperl users (including
>> myself) spend a large proportion of their time working with scruffy  
>> data,
>> in which case lightweight scruffy types are more appropriate.
>>
>> It seems that there is a perfectly simple way of reconciling both
>> approaches. We revert bioperl back to the simpler scruffy model. The
>> majority of users and developers breathe a sigh of relief. We then  
>> extend
>> SeqFeatureI with something like SeqFeatureAnnotatedI. This forces  
>> types to
>> be stored as OntologyTerms (and I haven't even touched on some of the
>> problems here, but at least we are insulating the standard bioperl  
>> layer
>> that 99% of users use from these issues). All classes implementing  
>> SFAI
>> will necessarily implement SFI, and the primary_tag and tag_values  
>> methods
>> will be supported (not deprecated) as simple delegations to the
>> OntologyTerm objects.
>>
>> We can then modify BFIO::gff (which is an incredibly useful piece of  
>> code)
>> and get rid of all the dependencies on SO and Bio::Ontology* and  
>> instead
>> allow the user of this module to plug in their own resolver/validator  
>> - so
>> they can choose whether they just want fast scruffy lightweight SFI
>> features, or whether they want ontology-typed SFAI features. If the
>> latter, then they can choose their own resolver strategy - by a user
>> supplied hash, by a copy of SO auto-downloaded from sourceforge, by a
>> local chado db, by the genbank->SO mapping table, during parsing vs
>> post-parsing, whatever. In fact there is already
>> Bio::SeqFeature::Tools::TypeMapper, but currently this is mostly  
>> concerned
>> with helping Bio::SeqFeature::Tools::Unflattener convert scruffy  
>> genbank
>> to something sensible.
>>
>> GMOD (and perhaps biosql) would use SFAI, everyone else would use the
>> simpler SFI. Someone can even get a stable 1.6 release out before all  
>> the
>> SFAI details such as how the resolver would work are finalised. I'd  
>> really
>> like to see 1.6 include a simpler BFIO::gff that can optionally  
>> produces
>> features that aren't SeqFeature::Annotateds, but that's negotiable.
>>
>> There's vast swathes of both GMOD and BioPerl code I'm not familiar  
>> with,
>> so it's possible my analysis above is flawed in some way. If it is,  
>> then
>> it's up to someone from either camp to speak up! If not, then there's  
>> no
>> excuses for the relevant people to start sorting out this mess by
>> commencing with the solution outlined above.
>>
>> Cheers
>> Chris
>>
>>>
>>> Scott
>>>
>>>
>>> On Thu, 2005-07-28 at 18:37 +0200, Cyril Pommier wrote:
>>>> Hello,
>>>> We are going to store analysis results in chado, and we are of  
>>>> course
>>>> very interressed by these futur evolutions of GFF3/chado.
>>>> So we would like to make sure that the parsers and conversions  
>>>> programs
>>>> we are writing now will be compatible with the futur GFF3.
>>>>
>>>> We are using Bio::SeqFeature::Generic objects that we write with
>>>> Bio::Tools::GFF.
>>>>
>>>> Do you think that Bio::Tools::GFF will be able to handle the new  
>>>> 'type'
>>>> column or is it better to switch to Bio::FeatureIO::gff ?
>>>>
>>>> Thanks in advance for any advice.
>>>>
>>>> Cyril
>>>>
>>>> Don Gilbert wrote:
>>>>
>>>>>
>>>>> Scott,
>>>>>
>>>>> Your notes in gmod_bulk_load_gff3.pl suggest it is headed in
>>>>> same direction I suggest below. More about these todo points
>>>>>
>>>>>> - address flybase"s use of of analysisfeature combined with  
>>>>>> feature to
>>>>>> give source-type information (in GFF terms). This will need to
>>>>>> be addressed in the GBrowse adaptor.
>>>>>> - modify the bulk loader to allow "mixed" GFF3 files (that is,
>>>>>> containing
>>>>>> both analysis results and annotations). See perldoc
>>>>>> gmod_bulk_load_gff3.pl
>>>>>> for more info
>>>>>
>>>>>
>>>>> Use of chado's analysisfeature table is something others who know
>>>>> it better can comment on. But after working with it for a while
>>>>> it makes sense to me to use in this way:
>>>>>
>>>>> For a future GFF -> Chado loader, treat analysis features such as
>>>>> gene finding results, BLAST, sim4 as 'analysisfeature type' rather
>>>>> than feature CV term type (the ones that now end up with a generic
>>>>> 'match' cvterm). In these cases the Analysis table is populated  
>>>>> with
>>>>> program:database_sourcename
>>>>> as the basis of this 'analysisfeature type', such as
>>>>> match:blastx:na_pe.dros
>>>>> match:sim4:DGC
>>>>> match:genie:dummy (or maybe exon:genie)
>>>>>
>>>>> The program:database fits neatly in GFF source field, as
>>>>> #ref source type start stop ...
>>>>> chr1 blastx:na_pe.dros match 1 100 ...
>>>>> chr1 sim4:DGC match 1 100 ...
>>>>>
>>>>> These can be treated in database adaptor analogously to the CVterm
>>>>> table feature types. See at end a list of current GFF feature
>>>>> type:source from worm, rice, yeast, fly MODs. Fly and rice use a
>>>>> syntax like above and worm gff uses BLAT_EMBL_BEST, instead of
>>>>> BLAT:EMBL_BEST.
>>>>>
>>>>> From POD of your bulk_load_gff3.pl
>>>>>> Analysis
>>>>>> If you are loading analysis results (ie, BLAT results, gene
>>>>>> predictions), you should specify the -a flag. If no arguments are
>>>>>> supplied with the -a, then the loader will assume that the results
>>>>>> belong to an analysis set with a name that is the concatenation of
>>>>>> the source (column 2) and the method (column 3) with an underscore
>>>>>> in between.
>>>>>
>>>>> "... then the loader will assume that the results belong to an
>>>>> analysis table row with a program name and database source name
>>>>> taken from Source (column 2, colon separated program:sourcename),
>>>>> with a SOFA feature type taken from Method (column 3). If
>>>>> sourcename doesn't apply, e.g. genefinder, don't add or use  
>>>>> 'dummy'.
>>>>> Use the generic 'match' SOFA type if others don't apply."
>>>>> [see also http://song.sourceforge.net/gff3-jan04.shtml#ALIGNMENTS]
>>>>>
>>>>> Note that sourcename of database is a common attribute (all those
>>>>> blasts, blats, sim4, ... are run on several different databases).
>>>>>
>>>>> For that underscore between method and source, where does that go  
>>>>> into
>>>>> database? It is used as parts of program or database sourcename  
>>>>> names,
>>>>> so it may be problematic to add one if not needed.
>>>>>
>>>>> Oh, I see now from bulk_load_gff3.PLS, you are creating a 'Name'  
>>>>> entry
>>>>> for analysis table. This probably is less useful than using Program
>>>>> and Sourcename fields as flybase does, which comes from the common
>>>>> usage where people run various programs, with various database  
>>>>> sources
>>>>> and want to plop the results into a database easily. These go into  
>>>>> those
>>>>> two fields directly, no need to create or parse a Name entry
>>>>> (which can be and is null in flybase data).
>>>>>
>>>>>> my $search_analysis
>>>>>> = $db->prepare("SELECT analysis_id FROM analysis WHERE name=?");
>>>>>
>>>>> I think it would be better as
>>>>> my $search_analysis
>>>>> = $db->prepare("SELECT analysis_id FROM analysis WHERE program=?  
>>>>> and
>>>>> sourcename=?");
>>>>>
>>>>>> Otherwise, the argument provided with -a will be taken
>>>>>> as the name of the analysis set. Either way, the analysis set must
>>>>>> already be in the analysis table. The easist way to do this is to
>>>>>> insert it directly in the psql shell:
>>>>>>
>>>>>> INSERT INTO analysis (name, program, programversion)
>>>>>> VALUES ('genscan 2005-2-28','genscan','5.4');
>>>>>
>>>>> My choice would be to populate the analysis table from GFF data,  
>>>>> rather
>>>>> than expect prepraration by user (or as another option).
>>>>>
>>>>> INSERT INTO analysis (program, sourcename)
>>>>> VALUES ('tblastx','na_baylorf1_scfchunk.dpse');
>>>>> INSERT INTO analysis (program, sourcename)
>>>>> VALUES ('sim4','na_gb.dmel');
>>>>> INSERT INTO analysis (program, sourcename, programversion)
>>>>> VALUES ('genie_masked','dummy', '1.0');
>>>>>
>>>>>> There are other columns in the analysis table that are optional;  
>>>>>> see
>>>>>> the schema documentation and '\d analysis' in psql for more
>>>>>> information.
>>>>>>
>>>>> ....
>>>>>> A planned addtion to the functionality of handling analysis  
>>>>>> results
>>>>>> is to allow "mixed" GFF files, where some lines are analysis  
>>>>>> results
>>>>>> and some are not.
>>>>>
>>>>> This is the case for drosophila GFF now (see others also below). If
>>>>> you make the default assumption that if ($method =~ /.*match/) and
>>>>> ($source =~ m/([^:]+):(.+)/), you should get all/most of
>>>>> analysisfeature types, and probably not anything else.
>>>>>
>>>>>> Additionally, one will be able to supply lists of
>>>>>> types (optionally with sources) and their associated entry in the
>>>>>> analysis table. The format will probably be tag value pairs:
>>>>>>
>>>>>> --analysis match:Rice_est=rice_est_blast, \
>>>>>> match:Maize_cDNA=maize_cdna_blast, \
>>>>>> mRNA=genscan_prediction,exon=genscan_prediction
>>>>>
>>>>> My suggestion for this (as per GFF source,type columns) would be
>>>>> --analysis match:program:sourcename ...
>>>>> --analysis match:blast:Rice_est,match:blast:Maize_cDNA,\
>>>>> mRNA:genscan:dummy, exon:genscan:dummy
>>>>>
>>>>> I guess the 'dummy' data sourcename need not be added; flybase  
>>>>> uses it
>>>>> to keep that field not-null, but it isn't required by the schema.
>>>>>
>>>>> Here are some snippets from the ChadoFC adaptor I modified
>>>>> from yours (will get into cvs.sf.net 'real soon'), showing that
>>>>> it isn't much work to add this as an analog to how cvterm types
>>>>> are used.
>>>>>
>>>>> -- Don
>>>>>
>>>>> ## Bio::DB::Das::ChadoFC.pm, part of new() - load analysis types
>>>>> ## treat similar to CV table types
>>>>>
>>>>> sub getAnalysisFeatureHash
>>>>> {
>>>>> my $self= shift;
>>>>>
>>>>> my $dbh= $self->dbh();
>>>>> my $sth = $dbh->prepare("select analysis_id,program,sourcename from
>>>>> analysis")
>>>>> or warn "unable to prepare select cvterms";
>>>>> $sth->execute or $self->throw("unable to select cvterms");
>>>>>
>>>>> my(%term2name,%name2term) = ({},{});
>>>>>
>>>>> while (my $hashref = $sth->fetchrow_hashref) {
>>>>>
>>>>> ## this is dgg syntax of analysis feature names for GFF
>>>>> ## all have generic 'match' method and program:source as 'source'
>>>>> ## a problem, want other main types: EST_match:xxx, mRNA:genie ..  
>>>>> etc.
>>>>> my $anfeat=  
>>>>> "match:".$hashref->{program}.":".$hashref->{sourcename};
>>>>>
>>>>> $term2name{ $hashref->{analysis_id} } = $anfeat;
>>>>> $name2term{ $anfeat } = $hashref->{analysis_id};
>>>>> }
>>>>> $self->an_term2name(\%term2name);
>>>>> $self->an_name2term(\%name2term);
>>>>> }
>>>>>
>>>>> ## Das::ChadoFC::Segment snippets
>>>>> sub features {
>>>>> $self->{has_anatype}=0;
>>>>> my $sql_range = '';
>>>>> my ($interbase_start,$rend,$srcfeature_id,$sql_types);
>>>>> unless ($feature_id) {
>>>>> $sql_range = $self->sql_range($rangetype);
>>>>>
>>>>> $sql_types = $self->sql_types($types, -1); # dgg
>>>>>
>>>>> $srcfeature_id = $self->{srcfeature_id};
>>>>> }
>>>>> ...
>>>>> elsif($self->{has_anatype}) {
>>>>> $from_part .= "left join analysisfeature af using (feature_id) ";
>>>>> }
>>>>>
>>>>>
>>>>> sub sql_types
>>>>> ..
>>>>> $valid_type = $factory->name2term($temp_type);
>>>>> $is_anatype= 0;
>>>>> unless ($valid_type) {
>>>>> $valid_type = $factory->an_name2term($temp_type);
>>>>> $self->{has_anatype}= $is_anatype= 1 if ($valid_type);
>>>>> }
>>>>> ..
>>>>> ## leave out extra invalid types
>>>>> if (!$valid_type) {
>>>>> ### skip
>>>>> } elsif ($temp_dbxref) {
>>>>> $sql_types .= $orsql."(f.type_id = $valid_type and fd.dbxref_id =
>>>>> $temp_dbxref)";
>>>>> } elsif($is_anatype) {
>>>>> $sql_types .= $orsql."(af.analysis_id = $valid_type)"; #<<<
>>>>> } else {
>>>>> $sql_types .= $orsql."(f.type_id = $valid_type)";
>>>>> }
>>>>>
>>>>>
>>>>> Lists of GFF feature type:source from some current MOD data
>>>>> where * are probably analysisfeature types (program:database)
>>>>>
>>>>> rice gff type:source
>>>>> ftp://ftp.gramene.org/pub/gramene/release17/data/ 
>>>>> sequence_annotation/
>>>>> gff3/
>>>>> --------------------
>>>>> CDS:known
>>>>> CDS:tigr
>>>>> EST:cmap
>>>>> EST_match:Barley (? might be EST_match:someprogram:Barley)
>>>>> EST_match:Maize
>>>>> EST_match:Millet
>>>>> EST_match:Rice
>>>>> EST_match:Sorghum
>>>>> EST_match:Wheat
>>>>> cDNA_match:Rice
>>>>> cross_genome_match:Maize
>>>>> cross_genome_match:Rice
>>>>> cross_genome_match:Sorghum
>>>>> * exon:FgenesH:Monocot
>>>>> exon:known
>>>>> exon:tigr
>>>>> five_prime_UTR:tigr
>>>>> gene:known
>>>>> gene:tigr
>>>>> * mRNA:FgenesH:Monocot
>>>>> mRNA:known
>>>>> mRNA:tigr
>>>>> microsatellite:cmap
>>>>> three_prime_UTR:known
>>>>> three_prime_UTR:tigr
>>>>> transposable_element_insertion_site:cmap
>>>>>
>>>>> worm gff type:source
>>>>> ftp://ftp.wormbase.org/pub/wormbase/species/elegans/
>>>>> genome_feature_tables/GFF3/
>>>>> ----------------------
>>>>> CDS:Coding_transcript
>>>>> * CDS:Genefinder
>>>>> CDS:Transposon_CDS
>>>>> CDS:history
>>>>> * CDS:twinscan
>>>>> * EST_match:BLAT_EST_BEST (~ EST_match:BLAT:EST_BEST)
>>>>> * EST_match:BLAT_EST_OTHER
>>>>> PCR_product:GenePair_STS
>>>>> PCR_product:Orfeome
>>>>> RNAi_reagent:RNAi_primary
>>>>> RNAi_reagent:RNAi_secondary
>>>>> SNP:Allele
>>>>> binding_site:binding_site
>>>>> * cDNA_match:BLAT_mRNA_BEST (~ cDNA_match:BLAT:mRNA_BEST )
>>>>> * cDNA_match:BLAT_mRNA_OTHER
>>>>> clone_end:.
>>>>> clone_start:.
>>>>> complex_substitution :Allele
>>>>> deletion:Allele
>>>>> exon:Coding_transcript
>>>>> * exon:Genefinder
>>>>> exon:Non_coding_transcript
>>>>> exon:Pseudogene
>>>>> exon:Transposon_CDS
>>>>> exon:history
>>>>> exon:miRNA
>>>>> exon:rRNA
>>>>> exon:scRNA
>>>>> exon:snRNA
>>>>> exon:snoRNA
>>>>> exon:tRNA
>>>>> * exon:tRNAscan-SE-1.23
>>>>> * exon:twinscan
>>>>> experimental_result_region:Expr_profile
>>>>> experimental_result_region:cDNA_for_RNAi
>>>>> * expressed_sequence_match:BLAT_OST_BEST (~
>>>>> expressed_sequence_match:BLAT:OST_BEST )
>>>>> * expressed_sequence_match:BLAT_OST_OTHER
>>>>> five_prime_UTR:Coding_transcript
>>>>> gene:Coding_transcript
>>>>> gene:gene
>>>>> gene:history
>>>>> gene:landmark
>>>>> insertion:Allele
>>>>> inverted_repeat:inverted
>>>>> mRNA:Coding_transcript
>>>>> * mRNA:Genefinder
>>>>> mRNA:Transposon_CDS
>>>>> mRNA:history
>>>>> * mRNA:twinscan
>>>>> miRNA:miRNA
>>>>> nc_primary_transcript:Non_coding_transcript
>>>>> * nucleotide_match:BLAT_EMBL_BEST (~  
>>>>> nucleotide_match:BLAT:EMBL_BEST )
>>>>> * nucleotide_match:BLAT_EMBL_OTHER
>>>>> * nucleotide_match:BLAT_TC1_BEST
>>>>> * nucleotide_match:BLAT_TC1_OTHER
>>>>> * nucleotide_match:BLAT_ncRNA_BEST
>>>>> * nucleotide_match:BLAT_ncRNA_OTHER
>>>>> * nucleotide_match:TEC_RED
>>>>> * nucleotide_match:waba_coding
>>>>> * nucleotide_match:waba_strong
>>>>> * nucleotide_match:waba_weak
>>>>> oligo:.
>>>>> operon:operon
>>>>> polyA_signal_sequence:polyA_signal_sequence
>>>>> polyA_site:polyA_site
>>>>> processed_transcript:gene
>>>>> protein_coding_primary_transcript:Coding_transcript
>>>>> * protein_match:wublastx
>>>>> pseudogene:Pseudogene
>>>>> pseudogene:history
>>>>> rRNA:rRNA
>>>>> reagent:Oligo_set
>>>>> region:.
>>>>> region:Genbank
>>>>> region:Genomic_canonical
>>>>> region:Link
>>>>> * repeat_region:RepeatMasker
>>>>> scRNA:scRNA
>>>>> sequence_variant:.
>>>>> sequence_variant:Allele
>>>>> snRNA:snRNA
>>>>> snoRNA:snoRNA
>>>>> substitution:Allele
>>>>> tRNA:tRNA
>>>>> * tRNA:tRNAscan-SE-1.23
>>>>> tandem_repeat:tandem
>>>>> three_prime_UTR:Coding_transcript
>>>>> trans_splice_acceptor_site:SL1
>>>>> trans_splice_acceptor_site:SL2
>>>>> transcript:SAGE_transcript
>>>>> * translated_nucleotide_match:BLAT_NEMATODE (~
>>>>> translated_nucleotide_match:BLAT:NEMATODE )
>>>>> transposable_element:Transposon
>>>>> transposable_element:Transposon_CDS
>>>>> transposable_element_insertion_site:Allele
>>>>> transposable_element_insertion_site:Mos_insertion_allele
>>>>>
>>>>>
>>>>> fly gff type:source
>>>>> ftp://ftp.flybase.net/genomes/dmel/current/gff/
>>>>> -----------------------
>>>>> BAC:.
>>>>> CDS:.
>>>>> aberration_junction:.
>>>>> chromosome:.
>>>>> chromosome_arm:.
>>>>> chromosome_band:.
>>>>> enhancer:.
>>>>> exon:.
>>>>> five_prime_UTR:.
>>>>> gene:.
>>>>> insertion_site:.
>>>>> intron:.
>>>>> mRNA:.
>>>>> * match:RNAiHDP
>>>>> * match:assembly:path
>>>>> * match:blastx:aa_SPTR.dmel
>>>>> * match:blastx:aa_SPTR.insect
>>>>> * match:blastx:aa_SPTR.othinv
>>>>> * match:blastx:aa_SPTR.othvert
>>>>> * match:blastx:aa_SPTR.plant
>>>>> * match:blastx:aa_SPTR.primate
>>>>> * match:blastx:aa_SPTR.rodent
>>>>> * match:blastx:aa_SPTR.worm
>>>>> * match:blastx:aa_SPTR.yeast
>>>>> * match:genscan
>>>>> * match:repeatmasker
>>>>> * match:sim4:na_ARGs.dros
>>>>> * match:sim4:na_ARGsCDS.dros
>>>>> * match:sim4:na_DGC_dros
>>>>> * match:sim4:na_dbEST.diff.dmel
>>>>> * match:sim4:na_dbEST.same.dmel
>>>>> * match:sim4:na_gadfly_dmel_r2
>>>>> * match:sim4:na_gb.dmel
>>>>> * match:sim4:na_gb.tpa.dmel
>>>>> * match:sim4:na_smallRNA.dros
>>>>> * match:sim4:na_transcript_dmel_r31
>>>>> * match:sim4:na_transcript_dmel_r32
>>>>> * match:tRNAscan-SE:.
>>>>> * match:tblastx:na_agambiae
>>>>> * match:tblastx:na_dbEST.insect
>>>>> * match:tblastx:na_dpse
>>>>> * match_part:RNAiHDP
>>>>> * match_part:assembly:path
>>>>> * match_part:blastx:aa_SPTR.dmel
>>>>> * match_part:blastx:aa_SPTR.insect
>>>>> * match_part:blastx:aa_SPTR.othinv
>>>>> * match_part:blastx:aa_SPTR.othvert
>>>>> * match_part:blastx:aa_SPTR.plant
>>>>> * match_part:blastx:aa_SPTR.primate
>>>>> * match_part:blastx:aa_SPTR.rodent
>>>>> * match_part:blastx:aa_SPTR.worm
>>>>> * match_part:blastx:aa_SPTR.yeast
>>>>> * match_part:genscan
>>>>> * match_part:repeatmasker
>>>>> * match_part:sim4:na_ARGs.dros
>>>>> * match_part:sim4:na_ARGsCDS.dros
>>>>> * match_part:sim4:na_DGC_dros
>>>>> * match_part:sim4:na_dbEST.diff.dmel
>>>>> * match_part:sim4:na_dbEST.same.dmel
>>>>> * match_part:sim4:na_gadfly_dmel_r2
>>>>> * match_part:sim4:na_gb.dmel
>>>>> * match_part:sim4:na_gb.tpa.dmel
>>>>> * match_part:sim4:na_smallRNA.dros
>>>>> * match_part:sim4:na_transcript_dmel_r31
>>>>> * match_part:sim4:na_transcript_dmel_r32
>>>>> * match_part:tRNAscan-SE:.
>>>>> * match_part:tblastx:na_agambiae
>>>>> * match_part:tblastx:na_dbEST.insect
>>>>> * match_part:tblastx:na_dpse
>>>>> mature_peptide:.
>>>>> ncRNA:.
>>>>> oligo:.
>>>>> point_mutation:.
>>>>> polyA_site:.
>>>>> protein_binding_site:.
>>>>> pseudogene:.
>>>>> region:.
>>>>> regulatory_region:.
>>>>> rescue_fragment:.
>>>>> scaffold:.
>>>>> sequence_variant:.
>>>>> snRNA:.
>>>>> snoRNA:.
>>>>> tRNA:.
>>>>> three_prime_UTR:.
>>>>> transcription_start_site:.
>>>>> transposable_element:.
>>>>> transposable_element_insertion_site:. 3116
>>>>>
>>>>>
>>>>> yeast gff type:source count
>>>>> ftp://genome-ftp.stanford.edu/pub/yeast/data_download/
>>>>> chromosomal_feature/saccharomyces_cerevisiae.gff
>>>>> -------------------------
>>>>> ARS:SGD
>>>>> CDS:SGD
>>>>> binding_site:SGD
>>>>> centromere:SGD
>>>>> chromosome:SGD
>>>>> gene:SGD
>>>>> insertion:SGD
>>>>> intron:SGD
>>>>> ncRNA:SGD
>>>>> nc_primary_transcript:SGD
>>>>> nucleotide_match:SGD
>>>>> pseudogene:SGD
>>>>> rRNA:SGD
>>>>> region:SGD
>>>>> region:landmark
>>>>> repeat_family:SGD
>>>>> repeat_region:SGD
>>>>> snRNA:SGD
>>>>> snoRNA:SGD
>>>>> tRNA:SGD
>>>>> telomere:SGD
>>>>> transposable_element:SGD
>>>>> transposable_element_gene:SGD
>>>>>
>>>>> -- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405
>>>>> -- gilbertd@indiana.edu -- http://marmot.bio.indiana.edu/
>>>>>
>>>>>
>>>>>
>>>>> -------------------------------------------------------
>>>>> This SF.Net email is sponsored by the 'Do More With Dual!' webinar
>>>>> happening
>>>>> July 14 at 8am PDT/11am EDT. We invite you to explore the latest  
>>>>> in dual
>>>>> core and dual graphics technology at this free one hour event  
>>>>> hosted
>>>>> by HP, AMD, and NVIDIA. To register visit
>>>>> http://www.hp.com/go/dualwebinar
>>>>> _______________________________________________
>>>>> Gmod-gbrowse mailing list
>>>>> Gmod-gbrowse@lists.sourceforge.net
>>>>> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
>>>>>
>>>>
>>>>
>>> --
>>> --------------------------------------------------------------------- 
>>> ---
>>> Scott Cain, Ph. D.                                          
>>> cain@cshl.edu
>>> GMOD Coordinator (http://www.gmod.org/)                      
>>> 216-392-3087
>>> Cold Spring Harbor Laboratory
>>>
>>>
>>>
>>> -------------------------------------------------------
>>> SF.Net email is Sponsored by the Better Software Conference & EXPO  
>>> September
>>> 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
>>> Agile & Plan-Driven Development * Managing Projects & Teams *  
>>> Testing & QA
>>> Security * Process Improvement & Measurement *  
>>> http://www.sqe.com/bsce5sf
>>> _______________________________________________
>>> Gmod-devel mailing list
>>> Gmod-devel@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/gmod-devel
>>>
>>
>>
>>
>>
>> -------------------------------------------------------
>> SF.Net email is Sponsored by the Better Software Conference & EXPO  
>> September
>> 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
>> Agile & Plan-Driven Development * Managing Projects & Teams * Testing  
>> & QA
>> Security * Process Improvement & Measurement *  
>> http://www.sqe.com/bsce5sf
>> _______________________________________________
>> Gmod-devel mailing list
>> Gmod-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/gmod-devel
> -- 
> ----------------------------------------------------------------------- 
> -
> Scott Cain, Ph. D.                                          
> cain@cshl.edu
> GMOD Coordinator (http://www.gmod.org/)                      
> 216-392-3087
> Cold Spring Harbor Laboratory
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From mayagao1999 at yahoo.com  Fri Jul 29 20:37:35 2005
From: mayagao1999 at yahoo.com (Alex Zhang)
Date: Fri Jul 29 20:29:59 2005
Subject: [Bioperl-l] A problem about a subroutin in my code
Message-ID: <20050730003735.48916.qmail@web53506.mail.yahoo.com>


Dear all,
 
Sorry to bother you. I need some help on my code. I have an input file named
"origin8.txt" which holds 200 short sequences of width 8. My code is to use each
short sequence from "origin8.txt" as a template to generate 100 short sequences of the same
width and store them in a txt file A. 

Then the code will read 100 short sequences from the txt file A and 100 long sequences of width 200 from a txt file B , and then replaced a substring of each long sequence using each short sequence. This code will lead to two txt files C and D. File C will hold 100 replaced long sequences. 

In other words, I want to input "origin8.txt" to get 200 File D. 

My code can generates 200 File D but each of them holds nothing. So I guess the problem is caused by a failure of passing the data to a subroutine named "make_file". 

Can anybody suggest me how to modify that? Thank you very much in advance!

Sincerely,

     Alex

 
My code:

 
*******************************************************************

 #!/usr/bin/perl
use strict;
use warnings;
my (@origin, $y);
my $N_Sequences = 100; 
my @Alphabet = split(//,'ACGT');      
my $P_Consensus = 0.85;               # This is the probability of dominant letter
# ====== Globals ==========================
my @Probabilities;                    # Stores the probability of each character


# ====== Program ==========================

open (ORIGIN, "< origin8.txt");       # This file holds 200 sequences used for motif template
chomp (@origin = <ORIGIN>);
close ORIGIN;

for ($y=0; $y<=$#origin; $y++) {
  

    my @Motif = split(//,'$origin[$y]');     # This is a loop to get the motif template from origin8
    open (OUT_NORM, ">short_sequences8_[$y].txt") or die "Unable to open file :$!";
        for (my $i=0; $i < $N_Sequences; $i++) {
            for (my $j=0; $j < scalar(@Motif); $j++) {
                 loadConsensusCharacter($Motif[$j]);    
                 addNoiseToDistribution();             
                 convertToIntervals();
                 print OUT_NORM (getRandomCharacter(rand(1.0)));
                                                     }
            print OUT_NORM "\n";
            make_files();
                                               }
                              }

exit();

# ====== Subroutines =======================
#
sub loadConsensusCharacter {
    my ($char) = @_;
    my $Found = 'FALSE';

    for (my $i=0; $i < scalar(@Alphabet); $i++) {
        if ( $char eq $Alphabet[$i]) {
            $Probabilities[$i] = 1.0;
            $Found = 'TRUE';
        } else {
            $Probabilities[$i] = 0.0;
        }
    }
    if ($Found eq 'FALSE') {
    die("Panic: Motif-Character\"$char\" was not found in Alphabet.
Aborting.\n");
    }

return();
}

# ==========================================
sub addNoiseToDistribution {


    my $P_NonConsensus = ( 1.0-$P_Consensus) / (scalar(@Alphabet) - 1);

    for (my $i=0; $i < scalar(@Probabilities); $i++) {
        if ( $Probabilities[$i] == 1.0 ) {     
            $Probabilities[$i] = $P_Consensus;
        } else {
            $Probabilities[$i] = $P_NonConsensus;
        }
    }

    return();
}

# ==========================================
sub convertToIntervals {

    my $Sum = 0;

    for (my $i=1; $i < scalar(@Probabilities); $i++) {
        $Probabilities[$i] += $Probabilities[$i-1];
    }

    return();
}

# ==========================================
sub getRandomCharacter {

    my ($RandomNumber) = @_;
    my $i=0;
    for ($i=0; $i < scalar(@Probabilities); $i++) {
        if ($Probabilities[$i] > $RandomNumber) { last; }
    }

    return($Alphabet[$i]);
}

# ==========================================
sub make_files {
my (@short, @long,$x,$r, $output_norm);

open (SHORT, "< short_sequences8_[$y].txt");
chomp (@short = <SHORT>);
close SHORT;

open (LONG, "< long_sequences.txt");
chomp (@long = <LONG>);
close LONG;

open (OUT_INITIAL,  "> output8_[$y]1.txt");
open (OUT_REPLACED, "> output8_[$y]2.txt");

for ($x=0; $x<=$#short; $x++) {
  $r=2;
  print OUT_INITIAL ">SeqName$x\n$long[$x]\n";
  print OUT_REPLACED "SeqName$x\n" . substr($long[$x], $r, length $short[$x]) . "\n";}


close OUT_INITIAL;
close OUT_REPLACED;

} 

*******************************************************************

 
Input file "origin8.txt" holds 200 sequences as:

 
 TTTATAAT
TGTCAATG
CGTTGATG
CGTCCTAG
GGCTTCCA
ATTAGCCT
GTCCTGAT
TGTAAATC
CGCTTATT
TTGACATA
CCTGATAT
ATGAATCG
CGTCCGAT
TGGCCCAT
ATCCTGAT
TGCCCATT
CCCTAACT
AAAAAAAA
TTTTTTTT
CCCCCCCC
GGGGGGGG
AAAAAAAT
AAAAAAAG
AAAAAAAC
AAAAAACC
AAAAAATT
AAAAAAGG
AAAAAACT
AAAAAACG
AAAAAACA
AAAAACAA
AAAACAAA
AAACAAAA
AACAAAAA
ACAAAAAA
CAAAAAAA
AAAAAATA
AAAAATAA
AAAATAAA
AAATAAAA
AATAAAAA
ATAAAAAA
TAAAAAAA
AAAAAAGA
AAAAAGAA
AAAAGAAA
AAAGAAAA
AAGAAAAA
AGAAAAAA
GAAAAAAA
AAAACCAA
AACCAAAA
CCAAAAAA
AAAATTAA
AATTAAAA
TTAAAAAA
AAAAACCC
AAAACCCA
AAACCCAA
AACCCAAA
ACCCAAAA
CCCAAAAA
AAAAATTT
AAAATTTA
AAATTTAA
AATTTAAA
ATTTAAAA
TTTAAAAA
AAAAAGGG
AAAAGGGA
AAAGGGAA
AAGGGAAA
AGGGAAAA
GGGAAAAA
AAAACCCC
AAACCCCA
AACCCCAA
ACCCCAAA
CCCCAAAA
AAAATTTT
AAATTTTA
AATTTTA A
ATTTTAAA
TTTTAAAA
AAAAGGGG
AAAGGGGA
AAGGGGAA
AGGGGAAA
GGGGAAAA
AAACCCCC
AACCCCCA
ACCCCCAA
CCCCCAAA
AAATTTTT
AATTTTTA
ATTTTTAA
TTTTTAAA
AAAGGGGG
AAGGGGGA
AGGGGGAA
GGGGGAAA
AAGGGGGG
AGGGGGGA
GGGGGGAA
AACCCCCC
ACCCCCCA
CCCCCCAA
AATTTTTT
ATTTTTTA
TTTTTTAA
ATTTTTTT
TTTTTTTA
ACCCCCCC
CCCCCCCA
AGGGGGGG
GGGGGGGA
ATTTTTTT
TTTTTTTA
ATAAAATA
AATAAATA
AAATAATA
AAAATATA
ACAAAACA
AACAAACA
AAACAACA
AAAACACA
AGAAAAGA
AAGAAAGA
AAAGAAGA
AAAAGAGA
ATAAAAGA
ATAAAACA
AGAAAATA
AGAAAACA
ACAAAAGA
ACAAAATA
ATTAAATA
AATTAATA
AAATTATA
ACCAAACA
AACCAACA
AAACCACA
AGGAAAGA
AAGGAAGA
AAAGGAGA
ATTTAATA
AATTTATA
ACCCAACA
AACCCACA
AGGGAAGA
AAGGGAGA
ATTTAACA
ATTTAAGA
AATTTACA
AATTTAGA
ACCCAATA
ACCCAAGA
AACCCATA
AACCCAGA
AGGGAACA
AGGGAATA
AAGGGATA
AAGGGACA
TTGGGACA
CCGGGACA< BR>AGAAGGGA
TGCCCATA
TAAAAAAT
TGCCTATA
CCGTAGTC
ACTTGACT
CTGATCCC
TGTGACTA
CCTGATCC
CCTGAACC
TGATCACG
GGGTAACC
CTTTTGAA
TTGTATGA
CCTGATAA
CTGGTTAG
CCCCGACC
TTGGGGAC
GGTTTGAC
GCTTAGAC
GTTACACC
TTGTACCA
TGGTACCA
CCGTACAT
CCCTTGCC
GTGTTGGT
ATCGATCG
ACGTACGT
TCAGTCAG
GCTATACG
GTCCATAC
CCGTCCGT
ATATATCC
GTGTCCCC 


---------------------------------
Yahoo! Mail for Mobile
Take Yahoo! Mail with you! Check email on your mobile phone.

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
From jason.stajich at duke.edu  Sat Jul 30 02:20:44 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Sat Jul 30 02:12:42 2005
Subject: [Bioperl-l] constructing a tree object
In-Reply-To: <003201c59485$f68500c0$ab05a8c0@bradleydell>
References: <003201c59485$f68500c0$ab05a8c0@bradleydell>
Message-ID: <5087f484f5a8d9088172fa4b1d26fba9@duke.edu>

See the FAQ on IO::String and SeqIO same thing applies.

-jason
On Jul 29, 2005, at 2:39 PM, Michael Bradley wrote:

> Can anyone tell me how to do $treeObj = Bio::TreeIO->new(-file
> "somefile", -format 'newick' ) from a variable instead of a file?
>
> Suppose that my tree is stored in $treestring. I would like to do
> something like : $treeObj = Bio::TreeIO->new(-$treestring, -format
> 'newick' ) .
>
> Thanks,
>
> Mike Bradley
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
--
Jason Stajich
http://www.duke.edu/~jes12
jason.stajich -at- duke.edu

From rvosa at sfu.ca  Sat Jul 30 10:01:30 2005
From: rvosa at sfu.ca (Rutger Vos)
Date: Sat Jul 30 09:51:45 2005
Subject: [Bioperl-l] Bio:: namespace question
Message-ID: <42EB883A.7060605@sfu.ca>

Dear fellow perl-using-biologists,

I want to submit to CPAN a module for phylogenetic analysis. I am trying 
to decide what top-level namespace to use. Can I use Bio::Phylo or 
something like that? Or is Bio:: reserved for BioPerl?

Thanks!

Rutger

-- 
++++++++++++++++++++++++++++++++++++++++++++
Rutger Vos, PhD. candidate
Department of Biological Sciences
Simon Fraser University
8888 University Drive
Burnaby, BC, V5A1S6
Phone: 604-291-5625 
Fax: 604-291-3496
Personal site: http://www.sfu.ca/~rvosa
FAB* lab: http://www.sfu.ca/~fabstar
++++++++++++++++++++++++++++++++++++++++++++


From jason.stajich at duke.edu  Sat Jul 30 12:28:37 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Sat Jul 30 12:19:53 2005
Subject: [Bioperl-l] Bio:: namespace question
In-Reply-To: <42EB883A.7060605@sfu.ca>
References: <42EB883A.7060605@sfu.ca>
Message-ID: <40c2ad0d56c965bb329ed08ef934f209@duke.edu>

You'll see there are other non-bioperl modules in CPAN under the Bio:: 
namespace so we haven't got any reservations on what people can submit 
to CPAN.  We have some phylogenetics related modules scattered in 
bioperl to logically deal with data parsing mostly (Bio::Tree, 
Bio::TreeIO, Bio::Tools::Phylo) and running (Bio::Tools::Run::Phylo).

My concern is mostly with confusing people about how things interrelate 
and can be used together and not having a namespace clash, but we are 
not currently using Bio::Phylo.

I don't know if any other folks have opinions -- I suppose waiting to 
see what I'd say?

-jason

On Jul 30, 2005, at 7:01 AM, Rutger Vos wrote:

> Dear fellow perl-using-biologists,
>
> I want to submit to CPAN a module for phylogenetic analysis. I am 
> trying to decide what top-level namespace to use. Can I use Bio::Phylo 
> or something like that? Or is Bio:: reserved for BioPerl?
>
> Thanks!
>
> Rutger
>
> -- 
> ++++++++++++++++++++++++++++++++++++++++++++
> Rutger Vos, PhD. candidate
> Department of Biological Sciences
> Simon Fraser University
> 8888 University Drive
> Burnaby, BC, V5A1S6
> Phone: 604-291-5625 Fax: 604-291-3496
> Personal site: http://www.sfu.ca/~rvosa
> FAB* lab: http://www.sfu.ca/~fabstar
> ++++++++++++++++++++++++++++++++++++++++++++
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
--
Jason Stajich
http://www.duke.edu/~jes12
jason.stajich -at- duke.edu

From hlapp at gmx.net  Sat Jul 30 18:56:20 2005
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat Jul 30 18:46:46 2005
Subject: [Bioperl-l] Bio:: namespace question
In-Reply-To: <40c2ad0d56c965bb329ed08ef934f209@duke.edu>
References: <42EB883A.7060605@sfu.ca>
	<40c2ad0d56c965bb329ed08ef934f209@duke.edu>
Message-ID: <3d1bb154cc3225a57c9e9ad031b57fc1@gmx.net>


On Jul 30, 2005, at 9:28 AM, Jason Stajich wrote:

> I don't know if any other folks have opinions -- I suppose waiting to 
> see what I'd say?
>

Right - egoistically I'd reserve Bio::Phylo for future Bioperl use, but 
that's not fair really ;-)

-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From jim.hu.biobio at gmail.com  Sat Jul 30 16:05:42 2005
From: jim.hu.biobio at gmail.com (Jim Hu)
Date: Sun Jul 31 13:13:56 2005
Subject: [Bioperl-l] Newbie gbrowse help - script to make gff from fasta
In-Reply-To: <42E96847.1060900@ebi.ac.uk>
References: <9331C217-F039-11D9-A447-000393B8D01C@indiana.edu>
	<42E909E3.2030102@infobiogen.fr>	<1122570166.3288.10.camel@localhost.localdomain>
	<Pine.OSX.4.58.0507281113390.8894@skerryvore.dhcp.lbl.gov>
	<42E96847.1060900@ebi.ac.uk>
Message-ID: <1AC69124-28AD-48B2-B910-7C5D8057908E@gmail.com>

1) Is there an existing script to convert a refseq fasta into a gff  
flatfile compatible with gbrowse 1.62?

        bp_genbank2gff.pl --accession NC_001416  --stdout > lambda.gff

requires some additional tweaking/parsing as far as I can tell.  I  
know that I'll probably eventually load these into mySQL (but for  
phage genomes, is it worth it?), but I wanted to learn via the  
flatfiles first.

2) Is there a repository of standard track stanzas and aggregators  
that match the feature types generated by such scripts?

3) Is there a FAQ I missed that I should have consulted first?

4) Is this even the right listserv for these questions?

Didn't want to reinvent any wheels if possible.  Sorry if this is off  
topic.  Thanks!

Jim Hu