From elia at tigem.it Tue Mar 1 06:51:16 2005 From: elia at tigem.it (Elia Stupka) Date: Tue Mar 1 07:51:08 2005 Subject: [Bioperl-l] Proposal for bio-perl updates: ACE assembly file In-Reply-To: <200502281005.06990.jswanson@iastate.edu> References: <200502141205.52256.jswanson@iastate.edu> <200502281005.06990.jswanson@iastate.edu> Message-ID: <4f12c65ac919697fd8a7e9220db182fd@tigem.it> Hi Jordan, I have been doing some work on Contig::Assembly myself recently, and have also been in touch with the author (Robson) about it. Perhaps the best thing would be for the three of us to have a chat about this object, try to revamp it a little with our improvements, and then Robson or I can check it in? regards, Elia On Feb 28, 2005, at 5:05 PM, Jordan Swanson wrote: > On Monday 14 February 2005 12:05 pm, Jordan Swanson wrote: >> Hi, >> I am new to bioperl, but I have a proposal for updating bioperl with >> some >> of the code I have been using. >> >> Bioperl packages currently exist that open ACE assembly files (output >> by >> phrap/cap3, and other assembly program). However, the current code >> brings >> in the entire file in one call: >> >> my $assembly_in = >> Bio::Assembly::IO->new(-file=>"input.ace", >> -format=>'ace'); >> >> my $assembly = $assembly_in->next_assembly; >> >> I am working on a large EST assembly project(roughly 150K) and our >> assembly >> files have been around 200 MB in size. For many of our applications, >> we >> only need to process one contig at a time, not to mention that >> reading the >> entire assembly at once requires a large amount of memory and/or disc >> space. >> >> I have developed some code that reads in contigs one at a time, >> therefore >> using only the amount of space needed for one contig object. A brief >> synopsis: >> >> my $contig_in = ContigIO->new(-file=>$filename, -format=>'ace'); >> while( my $contig = $contig_in->next_contig) >> { >> do_stuff_with_contig(); >> } >> >> Furthermore, there is no code that currently writes out ACE files or >> reverses the contigs orientation. I have developed some code that >> implements both, and if you would have it, I would like to submit this >> code. I have been working on converting this code to a more bioperl >> friendly format >> ( inheriting from bioseq objects, using the bioperl IO system, bioperl >> style warnings and so forth) >> >> I would appreciate some advice on how to proceed, specifically on >> inheriting from the correct classes and avoiding duplication of code. >> My >> initial thoughts: >> >> * Pull out the parsing code from Assembly::IO::ace.pm and into a new >> ContigIO::ace.pm, (possibly inherited from AlignIO, since the contig >> object >> is an AssemblyI) >> * Alter Assembly::IO.ace.pm to use the ContigIO.pm to load the entire >> contig into, and to output the assembly >> * Incorporate somewhere, my reverse_contig function ( which is like >> revcom >> for Bio::SeqI, so possibly in the ContigI.pm file) >> >> Thoughts? > > I've gone ahead and incorporated my changes into bioperl compliant > objects. > > *Bio/Assembly/ContigIO.pm created > *Bio/Assembly/ContigIO directory created > *Bio/Assembly/ContigIO/ace.pm created > *Bio/Assembly/IO/ace.pm modified to use Bio::Assembly::Contig > *Bio/Assembly/Contig.pm modified to allow base segments and to add a > revcom > method > *t/ContigIO.t created > > How does one submit their code for inspection/review/incorporation? I > used > cvs to check out the code I've been using, but "cvs add" is not > working at my > permission level. > > > > > -- > Jordan M Swanson > Department of Ecology, Evolution, and Organismal Biology > Iowa State University > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > --- Telethon Institute of Genetics and Medicine Via Pietro Castellino, 111 80131 Napoli Tel. +39 081 6132 335 Fax. +39 081 560 98 77 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 3488 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050301/67c65329/attachment.bin From jswanson at iastate.edu Tue Mar 1 10:13:23 2005 From: jswanson at iastate.edu (Jordan Swanson) Date: Tue Mar 1 10:13:34 2005 Subject: [Bioperl-l] Proposal for bio-perl updates: ACE assembly file In-Reply-To: <4f12c65ac919697fd8a7e9220db182fd@tigem.it> References: <200502141205.52256.jswanson@iastate.edu> <200502281005.06990.jswanson@iastate.edu> <4f12c65ac919697fd8a7e9220db182fd@tigem.it> Message-ID: <200503010913.23399.jswanson@iastate.edu> On Tuesday 01 March 2005 05:51 am, Elia Stupka wrote: > Hi Jordan, > > I have been doing some work on Contig::Assembly myself recently, and > have also been in touch with the author (Robson) about it. Perhaps the > best thing would be for the three of us to have a chat about this > object, try to revamp it a little with our improvements, and then > Robson or I can check it in? Good idea, you and Robson can expect a copy of my changes very soon. -- Jordan M Swanson Department of Ecology, Evolution, and Organismal Biology 431 Bessey Hall Iowa State University Ames, IA 50011 Lab 515 294-7098 FAX: 515-294-1337 From s_waechter at gmx.net Tue Mar 1 11:05:12 2005 From: s_waechter at gmx.net (=?ISO-8859-1?Q?Stefan_W=E4chter?=) Date: Tue Mar 1 11:00:29 2005 Subject: [Bioperl-l] which one and how to configure(blastall) In-Reply-To: <42234138.4020903@csit.fsu.edu> References: <42234138.4020903@csit.fsu.edu> Message-ID: <422492B8.8060905@gmx.net> Hi Yanfeng, Try this: (I make the assumption, that your blast installation is in /home/yanfeng/blast-2.2.10-amd64-linux/blast-2.2.10). Create a file named .ncbirc (don't forget the little dot) in /home/yanfeng and write the following in this file: [NCBI] DATA="/home/yanfeng/blast-2.2.10-amd64-linux/blast-2.2.10/data" save file. In the data dir, BLAST will find the BLOSSUM tables. In your blast installation dir you will find also db dir. That's a good place to store your blast databases. Set the environment variables in your .bashrc (.profile, .cshrc...... - depends on your shell) . I know it's trivial, but..... . something like [bash]: <>BLASTDIR=/home/yanfeng/blast-2.2.10-amd64-linux/blast-2.2.10 <>export BLASTDIR <>BLASTDB=/home/yanfeng/blast-2.2.10-amd64-linux/blast-2.2.10/db <>export BLASTDB Additionally it is an good idea to add /home/yanfeng/blast-2.2.10-amd64-linux/blast-2.2.10 in your PATH variabel. In a last step you have to install one of the NCBI databases in the BLASTDB dir or create one with formatdb Hope I could help Cheers Stefan yanfeng wrote: > Hi, Sorry to bother you again. > I want to download blast program now > I want to run blast and get blast report. > I donot know which one I should install and how to configure it( is > that like " export BLASTDIR=/ data1/blast/ " ) > I use > BEGIN { > $ENV{'BLASTDIR'} = > '/home/yanfeng/blast-2.2.10-amd64-linux/blast-2.2.10/'; > } > > but it doesnot work. > > blast-2.2.10-amd64-linux.tar.gz > > > blast-2.2.10-ia32-linux.tar.gz > > > blast-2.2.10-ia64-linux.tar.gz > > > > > My perl script > // > use Bio::SeqIO; > use Bio::Seq; > use Bio::Tools::Run::StandAloneBlast; > $seqio_obj = Bio::SeqIO->new(-file => 'mun_lab.fasta', > -format => 'fasta' ); # to wrtie the > sequence to afasta file > $seq_obj = $seqio_obj->next_seq; > #print $seq_obj->seq,"\n"; > @params = (program => 'blastn', > database => 'db.fa' ); > $blast_obj = Bio::Tools::Run::StandAloneBlast->new(@params); > $report_obj = $blast_obj->blastall($seq_obj); > $result_obj = $report_obj->next_result; > print $result_obj->num_hits; > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > From biolist at brinkman.mbb.sfu.ca Wed Mar 2 13:36:43 2005 From: biolist at brinkman.mbb.sfu.ca (Matthew Laird) Date: Wed Mar 2 13:31:44 2005 Subject: [Bioperl-l] blastall & StandAloneBlast Message-ID: Hi all, I'm yet again being faced with a mysterious crash in blastall and Bioperl that has been occuring for the past year. I'm receiving more reports from people around the world using our software also experiencing this problem, and the only answer I once received about the problem was, "That shouldn't be possible." Anyhow, the error occurs when blastall is called from StandAloneBlast.pm, Blast returns with a -1 error code which causes the Bioperl module to throw an exception. We've had reports of this occurring on multiple Linux distributions as well as on Solaris and OS X. But it doesn't happen on all machine even if they're running the same distribution. The crash output is below.... Fatal error: ------------- EXCEPTION ------------- MSG: blastall call crashed: -1 /usr/local/blast/blastall -p blastp -d "/usr/local/psort/conf/analysis/sclblast/gramneg/sclblast" -i /var/tmp/6m0QxSirC3 -e 1e-09 -o /var/tmp/AKCDNMCTyo -F F STACK Bio::Tools::Run::StandAloneBlast::_runblast /usr/lib/perl5/site_perl/5.8.3/Bio/Tools/Run/StandAloneBlast.pm:751 STACK Bio::Tools::Run::StandAloneBlast::_generic_local_blast /usr/lib/perl5/site_perl/5.8.3/Bio/Tools/Run/StandAloneBlast.pm:698 STACK Bio::Tools::Run::StandAloneBlast::blastall /usr/lib/perl5/site_perl/5.8.3/Bio/Tools/Run/StandAloneBlast.pm:553 STACK Bio::Tools::Run::SCLBlast::blast /usr/lib/perl5/site_perl/5.8.3/Bio/Tools/Run/SCLBlast.pm:135 STACK Bio::Tools::PSort::Module::SCLBlast::run /usr/lib/perl5/site_perl/5.8.3/Bio/Tools/PSort/Module/SCLBlast.pm:72 STACK Bio::Tools::PSort::Pathway::__ANON__ /usr/lib/perl5/site_perl/5.8.3/Bio/Tools/PSort/Pathway.pm:194 STACK Bio::Tools::PSort::Pathway::traverse /usr/lib/perl5/site_perl/5.8.3/Bio/Tools/PSort/Pathway.pm:157 STACK Bio::Tools::PSort::classify /usr/lib/perl5/site_perl/5.8.3/Bio/Tools/PSort.pm:160 STACK (eval) /usr/local/psort/bin/psort:320 STACK toplevel /usr/local/psort/bin/psort:320 -------------------------------------- The line in StandAloneBlast.pm we track the problem back to is: $self->throw("$executable call crashed: $? $commandstring\n")unless ($status==0) ; Odd thing is, Blast DOES run. If one comments out this line in StandAloneBlast.pm, the execution succeeds perfectly fine. When I've editted the error message being thrown to give more details, perl says the error is related to a process not being able to be created, which is even weirder. So, for some odd reason either blastall is passing back this -1 or perl is giving it back to bioperl for whatever reason. The only advice we have for people is to comment out this line in StandAloneBlast.pm. Anyone have any thoughts of advice on where this problem is coming from? Thanks. -- Matthew Laird SysAdmin/Developer, Brinkman Laboratory, MBB Dept. Simon Fraser University From dcj at sanger.ac.uk Wed Mar 2 08:27:55 2005 From: dcj at sanger.ac.uk (Daniel Jeffares) Date: Wed Mar 2 14:16:18 2005 Subject: [Bioperl-l] Bio::LiveSeq::Transcript query from new bioperl user Message-ID: This is a request for help from a *very* new bioperl user. IM also pretty new to perl.... I want to use the Bio::LiveSeq::Transcript->new method to make a transcript object from an .embl file. I then want to use the $frame = $transcript->frame($label) method so that I can trim sub-sections of the transcript to include only complete codons. So, in other words, I want to collect subsets of a transcript (coordinates that I have defined earlier), and then trim those coordinates to the nearest complete codons. And the get the sequence of those coordinates. ____________________________ Daniel Jeffares Wellcome Trust Sanger Institute Wellcome Trust Genome Campus Hinxton, Cambridge, CB10 1SA, UK Phone: +44(0)1223 834244 x 7297 ____________________________ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 796 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050302/5ec22dbc/attachment.bin From elia at tigem.it Wed Mar 2 14:46:30 2005 From: elia at tigem.it (Elia Stupka) Date: Wed Mar 2 14:41:25 2005 Subject: [Bioperl-l] blastall & StandAloneBlast In-Reply-To: References: Message-ID: Could you send the output you get when you comment the throw and print out the errors you mention below? Elia On 2 Mar 2005, at 19:36, Matthew Laird wrote: > When I've > editted the error message being thrown to give more details, perl says > the > error is related to a process not being able to be created, which is > even > weirder. From lairdm at sfu.ca Wed Mar 2 14:51:58 2005 From: lairdm at sfu.ca (Matthew Laird) Date: Wed Mar 2 14:46:44 2005 Subject: [Bioperl-l] blastall & StandAloneBlast In-Reply-To: Message-ID: When I comment out the throw there is no output because the program executes correctly. blastall runs and returns the results through bioperl. That's the mysterious part of this. On Wed, 2 Mar 2005, Elia Stupka wrote: > Could you send the output you get when you comment the throw and print > out the errors you mention below? > > Elia > > On 2 Mar 2005, at 19:36, Matthew Laird wrote: > > > When I've > > editted the error message being thrown to give more details, perl says > > the > > error is related to a process not being able to be created, which is > > even > > weirder. > > -- Matthew Laird SysAdmin/Developer, Brinkman Laboratory, MBB Dept. Simon Fraser University From elia at tigem.it Wed Mar 2 15:03:28 2005 From: elia at tigem.it (Elia Stupka) Date: Wed Mar 2 14:58:20 2005 Subject: [Bioperl-l] blastall & StandAloneBlast In-Reply-To: References: Message-ID: <37c1c72203e2dcd41eb3cacae33146e7@tigem.it> Sorry, wrote my answer badly, I meant when you mentioned that printing more details about the error it gave you something weird about process not being able to be created: > editted the error message being thrown to give more details, perl says > the error is related to a process not being able to be created, which > is even weirder. > Elia From lairdm at sfu.ca Wed Mar 2 18:56:07 2005 From: lairdm at sfu.ca (Matthew Laird) Date: Wed Mar 2 18:51:13 2005 Subject: [Bioperl-l] blastall & StandAloneBlast In-Reply-To: <37c1c72203e2dcd41eb3cacae33146e7@tigem.it> Message-ID: Alas no. I no longer have any machines around I had to do the hack on. I just tried to install it on two other machines and it sadly ran fine.... I'm reluctant to harass any of the users who have emailed us and ask them to intentionally "break" their install (by uncommenting that line) to help us test this. Anyhow, I had just added $! to the error message and it said something along the lines of "Process can not be created." On Wed, 2 Mar 2005, Elia Stupka wrote: > Sorry, wrote my answer badly, I meant when you mentioned that printing > more details about the error it gave you something weird about process > not being able to be created: > > > editted the error message being thrown to give more details, perl says > > the error is related to a process not being able to be created, which > > is even weirder. > > > > Elia > > -- Matthew Laird SysAdmin/Developer, Brinkman Laboratory, MBB Dept. Simon Fraser University From heikki at nildram.co.uk Thu Mar 3 02:50:04 2005 From: heikki at nildram.co.uk (Heikki Lehvaslaiho) Date: Thu Mar 3 02:45:10 2005 Subject: [Bioperl-l] Bio::LiveSeq::Transcript query from new bioperl user In-Reply-To: References: Message-ID: <200503030750.04401.heikki@nildram.co.uk> Daniel, While LiveSeq can be used for this, there is quite a lot of overhead in creating those objects. If you are going to apply this in a highthroughput pipeline, I recommend you retrieve the CDS feature from the standard SeqIO-produced sequence object and determine the frame yourself. Let me know in more detail what you want to do and I'll help you. -Heikki On Wednesday 02 March 2005 13:27, Daniel Jeffares wrote: > This is a request for help from a *very* new bioperl user. IM also > pretty new to perl.... > > I want to use the Bio::LiveSeq::Transcript->new method to make a > transcript object from an .embl file. > > I then want to use the $frame = $transcript->frame($label) method so > that I can trim sub-sections of the transcript to include only complete > codons. > > So, in other words, I want to collect subsets of a transcript > (coordinates that I have defined earlier), and then trim those > coordinates to the nearest complete codons. And the get the sequence of > those coordinates. > > ____________________________ > Daniel Jeffares > Wellcome Trust Sanger Institute > Wellcome Trust Genome Campus > Hinxton, Cambridge, CB10 1SA, UK > Phone: +44(0)1223 834244 x 7297 > ____________________________ -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki at_ebi _ac _uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambridge, CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From heikki at nildram.co.uk Thu Mar 3 02:50:04 2005 From: heikki at nildram.co.uk (Heikki Lehvaslaiho) Date: Thu Mar 3 02:45:23 2005 Subject: [Bioperl-l] Bio::LiveSeq::Transcript query from new bioperl user In-Reply-To: References: Message-ID: <200503030750.04401.heikki@nildram.co.uk> Daniel, While LiveSeq can be used for this, there is quite a lot of overhead in creating those objects. If you are going to apply this in a highthroughput pipeline, I recommend you retrieve the CDS feature from the standard SeqIO-produced sequence object and determine the frame yourself. Let me know in more detail what you want to do and I'll help you. -Heikki On Wednesday 02 March 2005 13:27, Daniel Jeffares wrote: > This is a request for help from a *very* new bioperl user. IM also > pretty new to perl.... > > I want to use the Bio::LiveSeq::Transcript->new method to make a > transcript object from an .embl file. > > I then want to use the $frame = $transcript->frame($label) method so > that I can trim sub-sections of the transcript to include only complete > codons. > > So, in other words, I want to collect subsets of a transcript > (coordinates that I have defined earlier), and then trim those > coordinates to the nearest complete codons. And the get the sequence of > those coordinates. > > ____________________________ > Daniel Jeffares > Wellcome Trust Sanger Institute > Wellcome Trust Genome Campus > Hinxton, Cambridge, CB10 1SA, UK > Phone: +44(0)1223 834244 x 7297 > ____________________________ -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki at_ebi _ac _uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambridge, CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From venancio at iq.usp.br Thu Mar 3 08:11:27 2005 From: venancio at iq.usp.br (Thiago Motta Venancio) Date: Thu Mar 3 09:21:00 2005 Subject: [Bioperl-l] GFF question Message-ID: <42270CFF.1080502@iq.usp.br> Hi folks. I would like to get a more detailed explanation about how to construct GFF files with the outputs of several programs, like genescan, repeatmasker... thanks in advance. Thiago -- Thiago Motta Venancio - PhD student in Bioinformatics From jrm at compbio.dundee.ac.uk Thu Mar 3 10:26:00 2005 From: jrm at compbio.dundee.ac.uk (Jon manning) Date: Thu Mar 3 10:23:31 2005 Subject: [Bioperl-l] gap/ambiguous character only sequences: Bio::PrimarySeq Message-ID: <1109863560.20641.154.camel@tick.compbio.dundee.ac.uk> Hi All, For a lot of the stuff I'm doing at the moment I'm chopping up alignments and playing with the bits etc. I've had to nobble Bio::PrimarySeq to allow the resulting gap-only sequences in Bio::LocatableSeq- I understand the rationale behind this check, and it's a useful default, but could we perhaps have an option to allow tolerance instead? If such exists, I'd be grateful if someone could point me in the right direction! Thanks, Jon From ak at ebi.ac.uk Thu Mar 3 16:00:44 2005 From: ak at ebi.ac.uk (Andreas Kahari) Date: Thu Mar 3 15:55:35 2005 Subject: [Bioperl-l] FYI: BioPerl port for OpenBSD Message-ID: <20050303210044.GA8592@ebi.ac.uk> List, Just for your information, I noticed that a port of BioPerl 1.5.0 recently got committed to the OpenBSD ports tree as "biology/bioperl". So if there is anyone out there doing bioinformatics on OpenBSD (I know only of myself), this might be mildly interesting to investigate. I haven't had the time to try the port out yet, and since I tend to go with bioperl-live from CVS anyway it might take some time before I do. Something that might possibly be interesting to others is that the port apparently patches the code to use Text::ParseWords in place of Text::Shellwords. The Text::ParseWords is part of the standard Perl 5.8.6 installation on OpenBSD systems, so that kinda makes sense, and it gets rid of a dependency. OpenBSD users following the CURRENT development branch knows where to go if they are intrigued... Cheers, Andreas ps: I don't have anything to do with this, really. -- Andreas K?h?ri EMBL-EBI/ensembl 1024D/C2E163CB From jason.stajich at duke.edu Fri Mar 4 11:53:39 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Fri Mar 4 15:59:49 2005 Subject: [Bioperl-l] Re: [Gmod-gbrowse] Parser In-Reply-To: <42289EC0.1020505@ime.usp.br> References: <42289EC0.1020505@ime.usp.br> Message-ID: <0803522c2b529e44030542e66b046c07@duke.edu> Bio::SearchIO::psl will pretty much do this for you. There is a search2table script which may work out of the box or you may have to tweak a little to get the right fields to the right place. It is in scripts/utilities/search2gff.pl. There had been some off-by-one errors a while ago with the SearchIO psl parser, I *think* that is fixed now. I don't know that anyone has contributed a RepeatMasker parser to Bioperl. -jason -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ On Mar 4, 2005, at 12:45 PM, Thiago Motta Venancio wrote: > Hi folks. > Anyone here knows where can i find parsers that build GFF? > I have one parser for RepeatMasker output and one for Blast output. > I need a parser that converts PSL (Blat) output to GFF. > Thanks in advance. > Thiago > > -- > Thiago Motta Venancio - PhD student in Bioinformatics > > > > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from real > users. > Discover which products truly live up to the hype. Start reading now. > http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click > _______________________________________________ > Gmod-gbrowse mailing list > Gmod-gbrowse@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse > -------------- next part -------------- A non-text attachment was scrubbed... Name: PGP.sig Type: application/pgp-signature Size: 186 bytes Desc: This is a digitally signed message part Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050304/4e50a2a5/PGP.bin From jason.stajich at duke.edu Fri Mar 4 16:06:35 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Fri Mar 4 16:04:55 2005 Subject: [Bioperl-l] Re: [Gmod-gbrowse] Parser In-Reply-To: <4228A3ED.7020501@ime.usp.br> References: <42289EC0.1020505@ime.usp.br> <0803522c2b529e44030542e66b046c07@duke.edu> <4228A3ED.7020501@ime.usp.br> Message-ID: You'll have to read-up on the SearchIO system for it to make any sense. http://bioperl.org/HOWTOs/SearchIO/index.html The HSPs are the "features" which are written back out with the Bio::Tools::GFF module. Most of the work is already done in the search2gff.PLS script for you -- there is a lot of code in there to handle asking for query or hit strand (you can only output one or the other) and filtering. The code is in bioperl scripts/utilities directory or you can pull it down here: http://bioperl.org/SRC/bioperl-live/scripts/utilities/search2gff.PLS So you want to run it like this (argument order doesn't matter) perl search2gff -o myout.gff -i myinput.psl -f psl -jason -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ On Mar 4, 2005, at 1:07 PM, Thiago Motta Venancio wrote: > Dear Jason. > Thanks for replying me. > I saw the documentation of your package before writing to the list, > but i did not understand how to use it. > Sorry about my low knowledge in Bioperl. > Here is your code: > > use Bio::SearchIO; > my $parser = new Bio::SearchIO(-file => 'file.psl', > -format => 'psl'); > while( my $result = $parser->next_result ) { > } > > > The question is where to specify the GFF format... > Regards. > Thiago > > -- > Thiago Motta Venancio - PhD student in Bioinformatics > > -------------- next part -------------- A non-text attachment was scrubbed... Name: PGP.sig Type: application/pgp-signature Size: 186 bytes Desc: This is a digitally signed message part Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050304/107cffd0/PGP.bin From gowribio2004 at yahoo.co.in Mon Mar 7 02:03:20 2005 From: gowribio2004 at yahoo.co.in (Gowri Karthik) Date: Mon Mar 7 01:58:06 2005 Subject: [Bioperl-l] regarding bioperl project Message-ID: <20050307070320.21704.qmail@web8510.mail.in.yahoo.com> sir/madam I am doing my PGDBI . I want to my project in bioperl . can anyone help me in suggesting topic which will be helpful for my career. thank you. regards, gowri --------------------------------- Celebrate Yahoo!'s 10th Birthday! Yahoo! Netrospective: 100 Moments of the Web From jaiswal at iitk.ac.in Mon Mar 7 02:00:51 2005 From: jaiswal at iitk.ac.in (jaiswal@iitk.ac.in) Date: Mon Mar 7 13:37:27 2005 Subject: [Bioperl-l] need some help about pqs Message-ID: <3481.172.28.124.134.1110178851.squirrel@nwebmail.iitk.ac.in> Dear sir, Im student of Bioinformatics ,sir im sending u ,a problem which im facing at this time , in PQS, it is attached with this mail, the problem is .. 1 . i have 6000 proteins which i selected for my research ( see attachment as list_id ), in first case i run it on pqs page of pdb id ,which give me out put in 2. for mate & then on going to 3ed step it will give me result in .mol file , which was i needed . It is all correct , but it is good for 100 or 200 proteins , it can be done manually , but for more than 6000 proteins it is ,tedious job , so , can u help me to do this job by any other method other than manually , or is their any script for downloading all these files. waiting for reply .. thanks.. ---------------------------------------------------- Ashish Kumar Jaiswal MScBioinformatics c/o Dr. Balaji Prakash Structural Biology Lab, Department of Biological Sciences and Bioengineering, Indian Institute of Technology, Kanpur, UP-208016, INDIA Ph: +91-512-2594024 FAX: +91-512-2594010 Email: jaiswal@iitk.ac.in ---------------------------------------------------- -------------- next part -------------- 12AS 153L 16PK 16VP 1A04 1A05 1A0C 1A0I 1A0P 1A12 1A17 1A1X 1A26 1A2P 1A2V 1A2X 1A2Z 1A38 1A3A 1A3C 1A3W 1A41 1A44 1A49 1A4E 1A4I 1A4M 1A4S 1A4Y 1A53 1A58 1A59 1A5C 1A5T 1A5Z 1A62 1A6C 1A6D 1A6F 1A6J 1A6M 1A6Q 1A6Z 1A76 1A78 1A79 1A7J 1A7T 1A81 1A87 1A88 1A8B 1A8D 1A8H 1A8I 1A8L 1A8P 1A8Q 1A8R 1A8S 1A8Y 1A99 1A9X 1A9X 1AA6 1AA7 1AB4 1AC6 1ACC 1ACF 1AD1 1AD2 1AD3 1ADE 1ADO 1ADW 1AE1 1AE7 1AE9 1AEP 1AF5 1AF6 1AF7 1AFW 1AG9 1AGI 1AGJ 1AGQ 1AGR 1AGX 1AH7 1AHS 1AIH 1AJ2 1AJ8 1AJS 1AK2 1AKO 1AKY 1AKZ 1AL3 1ALU 1ALY 1AM2 1AM5 1AM7 1AMF 1AMK 1AMU 1AMW 1AMX 1AN7 1AN8 1ANF 1ANV 1AOA 1AOC 1AOE 1AOG 1AOH 1AOP 1AOR 1AQC 1AQE 1AQM 1AQU 1AQZ 1AR1 1ARB 1ARO 1AS4 1ASH 1ASS 1AT0 1AT3 1ATG 1ATI 1ATL 1ATZ 1AUA 1AUI 1AUN 1AUO 1AUT 1AUY 1AVA 1AVG 1AVO 1AVQ 1AVW 1AW1 1AW9 1AX4 1AX8 1AXC 1AXD 1AXI 1AXN 1AYA 1AYF 1AYL 1AYM 1AYX 1AYZ 1AZ9 1AZO 1AZS 1AZW 1AZZ 1AZZ 1B00 1B04 1B06 1B09 1B0A 1B0B 1B0N 1B0U 1B12 1B16 1B1U 1B1Y 1B24 1B25 1B2P 1B33 1B34 1B35 1B35 1B35 1B3U 1B43 1B48 1B4P 1B56 1B5E 1B5L 1B5P 1B5T 1B63 1B65 1B66 1B6A 1B6E 1B6G 1B6R 1B74 1B77 1B78 1B79 1B7B 1B7E 1B7G 1B7Y 1B7Y 1B80 1B8A 1B8D 1B8D 1B8M 1B8M 1B8O 1B8P 1B93 1B9B 1B9H 1B9L 1B9O 1B9V 1BAM 1BAW 1BB9 1BBT 1BBT 1BBT 1BCF 1BCO 1BD0 1BD2 1BD2 1BD2 1BD2 1BD3 1BD8 1BDB 1BDF 1BDG 1BDM 1BDY 1BE9 1BEA 1BEB 1BEC 1BED 1BEF 1BEV 1BEV 1BEV 1BF2 1BF6 1BFG 1BFT 1BG2 1BG6 1BG7 1BGC 1BGF 1BGP 1BGV 1BGX 1BGX 1BGX 1BHD 1BHE 1BHT 1BI0 1BI5 1BI7 1BI7 1BIA 1BIF 1BIH 1BIO 1BJ7 1BJF 1BJJ 1BJN 1BJT 1BKB 1BKC 1BKC 1BKC 1BKF 1BKJ 1BKP 1BKR 1BKZ 1BLE 1BLX 1BLX 1BM9 1BMP 1BN7 1BN8 1BND 1BO1 1BO4 1BOB 1BOL 1BOO 1BOU 1BOU 1BOW 1BPB 1BPO 1BQB 1BQC 1BQK 1BQU 1BQY 1BR2 1BR9 1BRT 1BRU 1BRW 1BS0 1BS2 1BSG 1BSM 1BT3 1BTN 1BU2 1BU6 1BUC 1BUD 1BUE 1BUN 1BUO 1BUP 1BUV 1BUV 1BVP 1BVS 1BVY 1BVY 1BW0 1BWD 1BWV 1BWV 1BWW 1BX4 1BXB 1BXN 1BXN 1BXT 1BY5 1BYB 1BYF 1BYG 1BYI 1BYK 1BYR 1BYW 1BZY 1C02 1C0D 1C0P 1C1D 1C1K 1C1L 1C1Y 1C25 1C2P 1C2Y 1C3A 1C3A 1C3C 1C3G 1C3H 1C3M 1C3P 1C41 1C44 1C4K 1C4O 1C4R 1C4X 1C4Z 1C4Z 1C52 1C5C 1C5C 1C6V 1C7G 1C7K 1C7Q 1C8B 1C8N 1C8U 1C8Z 1C9K 1CBF 1CBG 1CBK 1CBS 1CBY 1CC1 1CC1 1CC3 1CCR 1CCW 1CCW 1CD1 1CD3 1CD3 1CD3 1CD3 1CD8 1CDD 1CDO 1CDP 1CDY 1CEO 1CER 1CEW 1CEX 1CF1 1CF2 1CFR 1CFY 1CFZ 1CG2 1CG5 1CGH 1CHD 1CHK 1CHM 1CHU 1CI0 1CI3 1CI9 1CID 1CII 1CIY 1CJA 1CJB 1CJC 1CJW 1CJX 1CKE 1CKI 1CKM 1CL1 1CLC 1CLI 1CLV 1CM0 1CM4 1CM8 1CMB 1CMX 1CN3 1CNT 1CNU 1CNV 1CNZ 1CO6 1COJ 1COL 1COT 1COV 1COV 1COV 1COZ 1CP2 1CP9 1CP9 1CPN 1CPT 1CPY 1CQ3 1CQQ 1CQX 1CR1 1CR5 1CRB 1CRU 1CRZ 1CS1 1CS6 1CSE 1CSH 1CSN 1CT5 1CT9 1CTQ 1CTT 1CUK 1CUO 1CV8 1CVL 1CVR 1CVS 1CVS 1CWN 1CWV 1CX4 1CXC 1CXQ 1CXZ 1CYD 1CYG 1CYX 1CZA 1CZT 1CZY 1D09 1D09 1D0C 1D0Q 1D1G 1D1J 1D1Q 1D2E 1D2N 1D2O 1D2S 1D2T 1D2Z 1D2Z 1D3G 1D3L 1D3V 1D3Y 1D4A 1D4D 1D4O 1D4T 1D4V 1D4V 1D5C 1D5N 1D5R 1D5T 1D6A 1D6M 1D6R 1D7M 1D7P 1D7U 1D7Y 1D8C 1D8I 1D8U 1D8W 1D9C 1DAB 1DAR 1DB3 1DBF 1DBQ 1DBW 1DBX 1DCE 1DCE 1DCF 1DCI 1DCO 1DCQ 1DCS 1DCU 1DD3 1DD5 1DD9 1DDG 1DDJ 1DDW 1DDZ 1DEE 1DEE 1DEK 1DEU 1DEV 1DF7 1DFC 1DFO 1DFX 1DG6 1DG9 1DGF 1DGJ 1DGS 1DHN 1DHR 1DHS 1DI0 1DI1 1DI6 1DIN 1DIQ 1DIV 1DJ0 1DJ2 1DJ7 1DJX 1DK0 1DK8 1DKG 1DKG 1DKI 1DKL 1DKU 1DKZ 1DL2 1DL5 1DLC 1DLF 1DLF 1DLJ 1DLP 1DLW 1DLY 1DM5 1DM9 1DMG 1DMH 1DML 1DN0 1DN0 1DN1 1DN1 1DNP 1DO5 1DOF 1DOI 1DOS 1DOW 1DOZ 1DPE 1DPG 1DPI 1DPS 1DPT 1DQ3 1DQA 1DQE 1DQG 1DQI 1DQN 1DQT 1DQU 1DQV 1DQZ 1DRW 1DRY 1DSS 1DSY 1DT0 1DT9 1DTD 1DTL 1DTO 1DU5 1DUG 1DUN 1DUS 1DUV 1DV1 1DV8 1DVK 1DVO 1DVP 1DW0 1DWK 1DWN 1DX5 1DX5 1DXE 1DXH 1DXJ 1DXL 1DXR 1DXR 1DXR 1DXR 1DXY 1DY2 1DY5 1DY9 1DYN 1DYO 1DYP 1DYQ 1DYR 1DYS 1DYT 1DZ3 1DZB 1DZB 1DZF 1DZI 1DZK 1DZL 1DZO 1DZR 1E0C 1E0F 1E0T 1E1H 1E1H 1E1O 1E2K 1E2T 1E2W 1E2Y 1E3D 1E3I 1E3J 1E3P 1E4C 1E4E 1E4F 1E4I 1E4Y 1E5D 1E5E 1E5K 1E5M 1E5P 1E5Q 1E5R 1E5X 1E6B 1E6C 1E6I 1E6U 1E6V 1E6V 1E6V 1E6W 1E6Y 1E6Y 1E6Y 1E7L 1E7N 1E7W 1E8C 1E8G 1E8Y 1E9G 1E9I 1E9L 1E9M 1E9R 1E9Z 1E9Z 1EA0 1EA7 1EA9 1EAF 1EAG 1EAI 1EAJ 1EAQ 1EAR 1EAX 1EAZ 1EB6 1EB7 1EBA 1EBD 1EBF 1EC7 1ECA 1ECF 1ECM 1ECS 1ED1 1ED9 1EDG 1EDO 1EDQ 1EDT 1EDY 1EDZ 1EE0 1EE2 1EE6 1EE8 1EEJ 1EEM 1EEQ 1EER 1EER 1EEX 1EEX 1EEX 1EF1 1EF8 1EFD 1EFH 1EFN 1EFP 1EFP 1EFU 1EFU 1EFV 1EFV 1EG2 1EG3 1EG5 1EG7 1EGA 1EGI 1EGU 1EGZ 1EH1 1EH9 1EHI 1EHK 1EHK 1EHW 1EHY 1EI1 1EI5 1EI6 1EI7 1EIA 1EJ0 1EJ2 1EJ8 1EJA 1EJB 1EJD 1EJE 1EJF 1EJX 1EJX 1EJX 1EK0 1EK6 1EK9 1EKB 1EKE 1EKG 1EKJ 1EKQ 1EKR 1EL1 1EL5 1EL6 1ELK 1ELR 1ELT 1ELU 1ELW 1EM2 1EM8 1EM8 1EM9 1EMS 1ENF 1ENP 1ENY 1EO6 1EO9 1EO9 1EOK 1EOV 1EP0 1EP3 1EP3 1EP5 1EP7 1EPA 1EPF 1EPU 1EPW 1EPX 1EQ2 1EQ9 1EQC 1EQF 1EQR 1EQW 1ERJ 1ERV 1ES0 1ES0 1ES5 1ES6 1ES8 1ES9 1ESC 1ESL 1ESO 1ESW 1ET9 1ETE 1ETU 1EU1 1EU3 1EU8 1EUA 1EUD 1EUD 1EUH 1EUM 1EUP 1EUV 1EUW 1EV1 1EV1 1EV1 1EV2 1EV2 1EV7 1EVH 1EVS 1EVX 1EVY 1EW0 1EW2 1EW3 1EW4 1EW6 1EWF 1EWR 1EX0 1EX2 1EX9 1EXB 1EXB 1EXM 1EXQ 1EXR 1EXS 1EXT 1EXU 1EYB 1EYE 1EYH 1EYL 1EYQ 1EYS 1EYS 1EYS 1EYS 1EYV 1EZ0 1EZ3 1EZ4 1EZF 1EZI 1EZJ 1EZW 1F00 1F02 1F05 1F06 1F07 1F08 1F0I 1F0K 1F0L 1F0X 1F0Y 1F15 1F1C 1F1E 1F1G 1F1J 1F1M 1F1O 1F1S 1F1U 1F20 1F28 1F2D 1F2E 1F2N 1F2T 1F2T 1F2V 1F32 1F35 1F39 1F3G 1F3H 1F3L 1F3M 1F3U 1F3U 1F3V 1F3V 1F46 1F4P 1F52 1F58 1F58 1F5J 1F5M 1F5N 1F5Q 1F5Q 1F5V 1F60 1F6B 1F6D 1F6F 1F6F 1F6Y 1F74 1F75 1F76 1F7C 1F7D 1F7L 1F7S 1F80 1F83 1F86 1F89 1F8F 1F8M 1F97 1F9A 1F9M 1F9V 1F9Y 1F9Z 1FA2 1FAO 1FB6 1FC3 1FC4 1FC6 1FCD 1FCD 1FCH 1FCJ 1FCQ 1FCY 1FD9 1FDQ 1FDR 1FEC 1FEH 1FEP 1FF3 1FFG 1FFG 1FFT 1FFT 1FFT 1FFT 1FFV 1FFV 1FFV 1FGJ 1FGK 1FGU 1FGV 1FGV 1FGY 1FH0 1FH9 1FHF 1FHG 1FI2 1FI4 1FI8 1FIO 1FIT 1FJ2 1FJH 1FJJ 1FKM 1FKN 1FL0 1FL1 1FL2 1FLE 1FLG 1FLJ 1FLK 1FLL 1FLM 1FM0 1FM4 1FM9 1FM9 1FMB 1FMC 1FMD 1FMD 1FMD 1FMJ 1FMT 1FMU 1FN9 1FNH 1FNL 1FNN 1FNO 1FNT 1FNT 1FNT 1FNT 1FNT 1FNT 1FNT 1FNT 1FNT 1FNT 1FNT 1FNT 1FNT 1FNT 1FNT 1FNU 1FNY 1FO0 1FO0 1FO0 1FO1 1FO3 1FO8 1FOB 1FOE 1FOE 1FON 1FOT 1FP1 1FP2 1FP3 1FP5 1FP6 1FPO 1FPR 1FPZ 1FQI 1FQJ 1FQJ 1FQT 1FR2 1FRB 1FRF 1FRF 1FS0 1FS0 1FS1 1FS5 1FS7 1FSG 1FSL 1FT5 1FT9 1FTP 1FTR 1FTS 1FUE 1FUI 1FUK 1FUR 1FUS 1FUX 1FV1 1FV1 1FVG 1FVI 1FVK 1FVP 1FVR 1FVU 1FVU 1FW1 1FWX 1FX2 1FX3 1FX7 1FX8 1FXK 1FXK 1FXK 1FXO 1FXW 1FXW 1FXX 1FXY 1FXZ 1FY7 1FYE 1FYH 1FYH 1FYV 1FYX 1FZQ 1FZV 1FZY 1G0C 1G0D 1G0H 1G0O 1G0S 1G0Y 1G16 1G1K 1G1Q 1G24 1G29 1G2A 1G2I 1G2N 1G2O 1G2Q 1G2R 1G31 1G3K 1G3N 1G3N 1G3N 1G3P 1G3Q 1G40 1G41 1G43 1G4I 1G4M 1G4U 1G4U 1G4Y 1G4Y 1G55 1G57 1G5A 1G5B 1G5H 1G5Q 1G5T 1G5Z 1G60 1G61 1G62 1G66 1G6A 1G6G 1G6H 1G6O 1G6Q 1G6S 1G72 1G73 1G73 1G7N 1G7S 1G87 1G8A 1G8E 1G8I 1G8K 1G8K 1G8L 1G8M 1G8P 1G8S 1G99 1G9G 1G9K 1GA6 1GA8 1GAD 1GAK 1GC5 1GCA 1GCI 1GCV 1GCV 1GCY 1GD0 1GD1 1GD6 1GD7 1GD8 1GDE 1GDH 1GEE 1GEF 1GEG 1GEH 1GEN 1GEQ 1GES 1GFF 1GFF 1GG2 1GG2 1GG3 1GG4 1GG6 1GGL 1GGP 1GGP 1GGX 1GH2 1GH6 1GH6 1GHE 1GHP 1GHQ 1GHQ 1GHR 1GHS 1GIQ 1GJ7 1GJW 1GK8 1GK8 1GK9 1GK9 1GKA 1GKA 1GKD 1GKL 1GKM 1GKP 1GKR 1GKU 1GKZ 1GL0 1GL1 1GL4 1GM6 1GME 1GMI 1GMM 1GMU 1GMX 1GMY 1GMZ 1GNG 1GNK 1GNL 1GNT 1GNU 1GNW 1GNX 1GO3 1GO3 1GO4 1GO4 1GOI 1GOJ 1GOT 1GOT 1GOX 1GP0 1GP1 1GP6 1GPC 1GPH 1GPJ 1GPL 1GPM 1GPP 1GPQ 1GPQ 1GPR 1GQ6 1GQ8 1GQE 1GQN 1GQO 1GQP 1GQV 1GQZ 1GR0 1GR3 1GRH 1GRJ 1GS0 1GS5 1GS9 1GSA 1GSK 1GSM 1GSO 1GSU 1GT1 1GTE 1GTK 1GTM 1GTT 1GTZ 1GU2 1GU6 1GU7 1GUD 1GUL 1GUQ 1GUX 1GUX 1GUZ 1GV3 1GV9 1GVE 1GVH 1GVJ 1GVK 1GVN 1GVZ 1GW5 1GW5 1GW5 1GW5 1GWC 1GWE 1GWI 1GWJ 1GWK 1GWS 1GWU 1GWY 1GX1 1GX3 1GXC 1GXJ 1GXM 1GXQ 1GXR 1GXY 1GYG 1GYH 1GYO 1GYT 1GYV 1GZ2 1GZ6 1GZG 1GZQ 1GZQ 1GZS 1GZS 1H03 1H05 1H09 1H0B 1H0C 1H0H 1H0H 1H0P 1H0X 1H12 1H16 1H1A 1H1D 1H1N 1H1O 1H1Y 1H21 1H2B 1H2E 1H2I 1H2K 1H2S 1H30 1H32 1H32 1H3D 1H3F 1H3G 1H3N 1H3Q 1H41 1H4A 1H4G 1H4R 1H4V 1H4X 1H54 1H5B 1H5Q 1H5W 1H5Y 1H65 1H6D 1H6G 1H6H 1H6K 1H6L 1H6O 1H6P 1H6T 1H6U 1H6V 1H6W 1H6Z 1H70 1H72 1H7C 1H7E 1H7M 1H7S 1H7Z 1H80 1H8D 1H8E 1H8T 1H8T 1H8T 1H8U 1H97 1H99 1H9H 1H9M 1H9S 1HA1 1HBN 1HBN 1HBN 1HC1 1HC7 1HCB 1HCV 1HCZ 1HD2 1HD7 1HDC 1HDF 1HDG 1HDH 1HDI 1HDK 1HDM 1HDM 1HDO 1HE1 1HE1 1HEK 1HEU 1HF2 1HF8 1HFC 1HFE 1HFE 1HFO 1HFU 1HFX 1HG3 1HG4 1HG8 1HGX 1HH1 1HH2 1HH8 1HHS 1HHU 1HHY 1HI9 1HIW 1HIX 1HJ8 1HJ9 1HJR 1HK8 1HKF 1HKG 1HKH 1HKK 1HKQ 1HKW 1HKX 1HL2 1HL9 1HLB 1HLC 1HLE 1HLM 1HLW 1HM6 1HM9 1HMC 1HMT 1HMY 1HN0 1HNE 1HNJ 1HNN 1HO8 1HPG 1HQ0 1HQ8 1HQS 1HQV 1HQZ 1HR6 1HR6 1HR8 1HR8 1HRK 1HRO 1HRU 1HS6 1HSB 1HSK 1HSS 1HT6 1HT8 1HTJ 1HTM 1HTP 1HTQ 1HTR 1HTW 1HU3 1HUF 1HUL 1HUP 1HUS 1HUW 1HUX 1HV5 1HV8 1HV9 1HVX 1HVY 1HW1 1HW5 1HW6 1HW7 1HWX 1HX0 1HX1 1HX1 1HX6 1HX8 1HXH 1HXI 1HXM 1HXM 1HXN 1HXR 1HXX 1HY5 1HY7 1HYE 1HYH 1HYN 1HYO 1HYQ 1HZ4 1HZD 1HZF 1HZI 1HZP 1HZT 1I0D 1I0R 1I0Z 1I12 1I19 1I1G 1I1I 1I1J 1I1N 1I1R 1I1R 1I1W 1I24 1I2A 1I2K 1I2M 1I2M 1I2S 1I31 1I36 1I39 1I3C 1I3U 1I3Z 1I40 1I4A 1I4D 1I4D 1I4J 1I4M 1I4N 1I4O 1I4O 1I4U 1I4W 1I52 1I58 1I5E 1I5G 1I5N 1I5P 1I60 1I6A 1I6P 1I6V 1I6V 1I6V 1I76 1I78 1I7G 1I7H 1I7K 1I7N 1I7Q 1I7Q 1I7W 1I7W 1I8A 1I8D 1I8J 1I8K 1I8K 1I8L 1I8L 1I8N 1I8O 1I8T 1I9G 1I9S 1I9W 1I9Z 1IA6 1IA8 1IA9 1IAE 1IAP 1IAR 1IAR 1IAT 1IAY 1IAZ 1IBJ 1IBY 1IC6 1ICP 1ICR 1ICX 1ID0 1ID1 1ID2 1IDK 1IDP 1IDR 1IDS 1IE9 1IEJ 1IFC 1IFQ 1IFR 1IG0 1IG3 1IG8 1IGM 1IGM 1IGW 1IH7 1IHB 1IHG 1IHK 1IHM 1IHN 1IHO 1IHP 1IHS 1IHU 1II2 1II5 1II7 1IIB 1IIC 1IIR 1IJ5 1IJB 1IJQ 1IJT 1IJX 1IJY 1IK6 1IK9 1IKN 1IKN 1IKN 1IKP 1IKT 1ILR 1IM3 1IM3 1IM4 1IM5 1IM8 1IMJ 1IN0 1IN4 1INL 1INP 1IO0 1IO1 1IO2 1IO7 1IOD 1IOD 1IOF 1IOM 1IOW 1IQ0 1IQ4 1IQ5 1IQ6 1IQ8 1IQA 1IQC 1IQP 1IQR 1IQV 1IR6 1IRD 1IRD 1IRJ 1IRU 1IRU 1IRU 1IRU 1IRU 1IRU 1IRU 1IRU 1IRU 1IRU 1IRU 1IRU 1IRU 1IS1 1IS2 1IS3 1IS8 1IS9 1ISC 1ISE 1ISP 1ISS 1IST 1IT2 1ITB 1ITB 1ITH 1ITK 1ITV 1ITW 1ITX 1ITZ 1IU4 1IU8 1IUG 1IUH 1IUJ 1IUK 1IUQ 1IV3 1IV8 1IVH 1IW0 1IWD 1IWE 1IWH 1IWH 1IWL 1IWM 1IWP 1IWP 1IWP 1IX9 1IXC 1IXH 1IXK 1IXL 1IXM 1IXS 1IXV 1IXZ 1IY8 1IY9 1IYB 1IYE 1IYH 1IYK 1IYN 1IYS 1IYX 1IZ0 1IZ5 1IZ6 1IZC 1IZM 1IZN 1IZN 1IZO 1J05 1J05 1J09 1J0A 1J0H 1J0M 1J0P 1J0W 1J1B 1J1D 1J1D 1J1D 1J1I 1J1J 1J1L 1J1M 1J1T 1J1Y 1J20 1J24 1J27 1J2G 1J2J 1J2P 1J2Q 1J2Q 1J2R 1J2Y 1J2Z 1J30 1J31 1J32 1J33 1J34 1J34 1J36 1J3A 1J3B 1J3K 1J3K 1J3L 1J3N 1J3U 1J3V 1J3W 1J48 1J4N 1J4T 1J54 1J58 1J5P 1J5S 1J5U 1J5V 1J5W 1J5X 1J5Y 1J6O 1J6R 1J6U 1J6W 1J6X 1J71 1J72 1J77 1J79 1J7D 1J7D 1J7G 1J7J 1J7N 1J7X 1J83 1J8B 1J8F 1J8M 1J8S 1J8U 1J93 1J97 1J98 1J9A 1J9B 1J9L 1JA1 1JA9 1JAD 1JAE 1JAG 1JAK 1JAL 1JAT 1JAT 1JAY 1JB0 1JB0 1JB0 1JB0 1JB0 1JB2 1JB3 1JB9 1JBE 1JBG 1JBK 1JBO 1JBO 1JBW 1JC4 1JC9 1JCF 1JD0 1JD1 1JD5 1JDH 1JDL 1JDR 1JDW 1JE5 1JE6 1JEB 1JEB 1JEH 1JEO 1JEQ 1JEQ 1JER 1JET 1JF8 1JFB 1JFL 1JFM 1JFR 1JFU 1JFX 1JFZ 1JG1 1JGC 1JGS 1JGT 1JH6 1JHD 1JHF 1JHG 1JHJ 1JHL 1JHL 1JHL 1JHN 1JHS 1JI0 1JI1 1JI2 1JI4 1JI5 1JI6 1JIA 1JIG 1JIH 1JIL 1JIW 1JIW 1JIX 1JJ7 1JJF 1JJI 1JJO 1JJT 1JJV 1JK0 1JK0 1JK3 1JK7 1JKE 1JKG 1JKG 1JKM 1JKM 1JKS 1JKX 1JL0 1JL1 1JL3 1JL5 1JLJ 1JLN 1JLT 1JLV 1JLW 1JLY 1JM1 1JM6 1JMK 1JMM 1JMS 1JMT 1JMU 1JMU 1JMV 1JMX 1JMX 1JNI 1JNP 1JNR 1JNR 1JNU 1JOC 1JOG 1JOP 1JOS 1JOT 1JOV 1JPA 1JPD 1JPM 1JPZ 1JQ5 1JQB 1JQE 1JQG 1JQI 1JQK 1JQL 1JQL 1JQN 1JQO 1JR0 1JR1 1JR2 1JR7 1JR8 1JR9 1JRL 1JRO 1JRO 1JRR 1JS1 1JS3 1JS9 1JSF 1JSG 1JSS 1JSU 1JSU 1JSW 1JSX 1JT6 1JTD 1JTD 1JTG 1JTG 1JTV 1JU3 1JUB 1JUG 1JUO 1JUQ 1JUV 1JV1 1JVB 1JVN 1JVQ 1JVQ 1JVW 1JW7 1JW9 1JWI 1JWI 1JWQ 1JX2 1JX2 1JX6 1JX7 1JXG 1JXH 1JXN 1JY1 1JY5 1JYA 1JYE 1JYH 1JYK 1JYO 1JYO 1JZ8 1JZN 1JZT 1K04 1K07 1K0D 1K0G 1K0M 1K0R 1K0W 1K0Z 1K12 1K1B 1K1D 1K1E 1K1X 1K20 1K24 1K28 1K28 1K2E 1K2F 1K2W 1K2X 1K2X 1K32 1K38 1K3E 1K3I 1K3P 1K3R 1K3S 1K3T 1K3V 1K3Y 1K44 1K47 1K4I 1K4M 1K4N 1K4Z 1K55 1K5D 1K5D 1K5D 1K5J 1K5N 1K5N 1K66 1K68 1K6K 1K75 1K77 1K7H 1K7I 1K7J 1K7K 1K87 1K8F 1K8K 1K8K 1K8K 1K8K 1K8K 1K8K 1K8K 1K8R 1K8R 1K8T 1K92 1K94 1K9V 1K9X 1KA1 1KA9 1KA9 1KAC 1KAC 1KAF 1KAG 1KAM 1KAO 1KAP 1KAS 1KB0 1KB5 1KB5 1KB5 1KB5 1KB9 1KB9 1KB9 1KB9 1KB9 1KB9 1KB9 1KB9 1KBL 1KBV 1KCF 1KCG 1KCG 1KCM 1KCQ 1KCV 1KCV 1KCX 1KCZ 1KDJ 1KEA 1KEK 1KEQ 1KEW 1KEX 1KEZ 1KF6 1KF6 1KF6 1KF6 1KFI 1KFW 1KG0 1KG0 1KG0 1KG2 1KGA 1KGC 1KGC 1KGD 1KGN 1KGS 1KHB 1KHC 1KHD 1KHI 1KHQ 1KHT 1KHV 1KHX 1KHY 1KI0 1KIC 1KID 1KIG 1KIJ 1KIY 1KJ1 1KJN 1KJQ 1KJW 1KJY 1KK1 1KKC 1KKE 1KKH 1KKM 1KKM 1KKO 1KL1 1KL7 1KL9 1KLF 1KLF 1KLI 1KLL 1KLO 1KLT 1KLU 1KLU 1KLU 1KLX 1KM4 1KM8 1KMH 1KMH 1KMI 1KMI 1KMJ 1KMM 1KMO 1KMQ 1KMT 1KMV 1KN1 1KN1 1KNB 1KNC 1KNG 1KNV 1KNW 1KNX 1KNY 1KO3 1KO6 1KO7 1KO9 1KOA 1KOB 1KOE 1KOL 1KON 1KOP 1KP0 1KPF 1KPG 1KPI 1KPS 1KPS 1KPT 1KQ3 1KQ6 1KQF 1KQF 1KQF 1KQN 1KQP 1KQR 1KQW 1KR4 1KR7 1KRH 1KRQ 1KRR 1KS5 1KS8 1KS9 1KSH 1KSH 1KSK 1KSO 1KT1 1KT6 1KTE 1KTG 1KTK 1KTK 1KU0 1KU1 1KU9 1KUF 1KUT 1KV3 1KV5 1KV7 1KV9 1KVK 1KW3 1KWG 1KWH 1KWI 1KWM 1KWS 1KXG 1KXO 1KXP 1KXP 1KXU 1KXV 1KXV 1KY3 1KY9 1KYA 1KYF 1KYH 1KYQ 1KYZ 1KZ1 1KZ7 1KZ7 1KZH 1KZL 1KZQ 1KZY 1KZY 1L0B 1L0O 1L0O 1L0Q 1L0W 1L1D 1L1J 1L1L 1L1N 1L1O 1L1O 1L1O 1L1Q 1L1Y 1L2H 1L2L 1L2T 1L2W 1L3I 1L3P 1L4I 1L4U 1L5J 1L5O 1L5V 1L5X 1L6J 1L6M 1L6P 1L6R 1L6W 1L7A 1L7D 1L7L 1L7V 1L7V 1L8A 1L8D 1L8K 1L8N 1L8Q 1L8W 1L9K 1L9V 1L9X 1LA1 1LA6 1LA6 1LAM 1LAR 1LB3 1LB6 1LBA 1LBQ 1LBU 1LBV 1LC0 1LC5 1LCI 1LCT 1LCY 1LDJ 1LDM 1LDN 1LDT 1LE6 1LEH 1LF2 1LF6 1LF7 1LFD 1LFK 1LFO 1LFP 1LFW 1LG7 1LGP 1LGY 1LH0 1LHP 1LHT 1LI4 1LI5 1LII 1LIT 1LIU 1LJ2 1LJ5 1LJ8 1LJ9 1LK5 1LKF 1LKI 1LKK 1LKP 1LKT 1LL2 1LL7 1LLA 1LLC 1LLD 1LLN 1LLU 1LM4 1LM5 1LM6 1LM7 1LM8 1LM8 1LME 1LMI 1LML 1LMO 1LNL 1LNQ 1LNS 1LNW 1LNZ 1LO6 1LO7 1LOP 1LOX 1LP3 1LP9 1LP9 1LP9 1LP9 1LPB 1LPG 1LPG 1LPJ 1LQ9 1LQA 1LQL 1LQS 1LQS 1LQT 1LQY 1LRV 1LRW 1LRZ 1LS1 1LS6 1LSH 1LSH 1LSS 1LST 1LSU 1LSW 1LT7 1LTK 1LTL 1LTO 1LTZ 1LU1 1LU4 1LUA 1LUC 1LUC 1LUF 1LUG 1LUQ 1LUR 1LV7 1LVA 1LVF 1LVG 1LVL 1LVM 1LVO 1LVW 1LW3 1LW7 1LWB 1LWD 1LWJ 1LXA 1LXJ 1LY1 1LYC 1LYQ 1LYV 1LYW 1LZJ 1LZL 1M0D 1M0K 1M0S 1M0U 1M0W 1M0Z 1M15 1M1C 1M1E 1M1F 1M1L 1M1N 1M1N 1M1S 1M22 1M2D 1M2K 1M2O 1M2O 1M2R 1M2X 1M2Z 1M32 1M33 1M3K 1M3S 1M3U 1M3Y 1M40 1M45 1M46 1M48 1M4J 1M4L 1M4R 1M4V 1M4Y 1M4Z 1M53 1M55 1M56 1M56 1M56 1M5H 1M5I 1M5N 1M5Q 1M5S 1M5Y 1M61 1M65 1M6D 1M6E 1M6H 1M6I 1M6J 1M6K 1M6P 1M6S 1M6Y 1M70 1M72 1M7B 1M7G 1M7S 1M7V 1M7X 1M7Y 1M85 1M8N 1M8P 1M8T 1M8Z 1M93 1M98 1M9I 1M9U 1M9X 1M9X 1M9Z 1MA1 1MA3 1MAI 1MAS 1MB3 1MB4 1MBA 1MBM 1MBX 1MBX 1MC2 1MC3 1MCP 1MCP 1MCT 1MD6 1MD8 1MDA 1MDA 1MDA 1MDB 1MDC 1MDW 1ME4 1MEM 1MEO 1MFM 1MFO 1MG2 1MG2 1MG2 1MG2 1MG4 1MG7 1MGP 1MGT 1MH1 1MHH 1MHH 1MHM 1MHQ 1MHY 1MHY 1MHY 1MI3 1MI8 1MIJ 1MIL 1MIO 1MIO 1MIU 1MIW 1MIX 1MJ0 1MJ3 1MJ5 1MJF 1MJH 1MJN 1MJT 1MJU 1MJU 1MK4 1MKA 1MKF 1MKH 1MKI 1MKM 1MKP 1MKY 1MKZ 1ML0 1ML4 1ML8 1ML9 1MLA 1MLD 1MLW 1MML 1MMQ 1MN4 1MN8 1MNA 1MNG 1MO0 1MO3 1MO9 1MOQ 1MOU 1MOZ 1MP8 1MP9 1MPG 1MPP 1MPX 1MPY 1MQ0 1MQ4 1MQB 1MQE 1MQI 1MQK 1MQK 1MQS 1MR1 1MR7 1MRG 1MRJ 1MRZ 1MS6 1MS9 1MSC 1MSK 1MSL 1MSP 1MT0 1MT5 1MTP 1MTY 1MTY 1MTY 1MTZ 1MU2 1MU2 1MU5 1MUC 1MUG 1MUK 1MUW 1MV5 1MV8 1MVE 1MVF 1MVH 1MVL 1MVO 1MW5 1MW7 1MW9 1MWM 1MWQ 1MWV 1MWW 1MX3 1MX9 1MXE 1MXG 1MXH 1MXI 1MXR 1MXS 1MY6 1MY7 1MYT 1MZ4 1MZ8 1MZA 1MZB 1MZG 1MZH 1MZJ 1MZN 1MZR 1MZU 1MZW 1MZY 1N00 1N08 1N0E 1N0U 1N0W 1N0X 1N0X 1N11 1N12 1N13 1N1B 1N1C 1N1L 1N1Q 1N28 1N2A 1N2E 1N2F 1N2S 1N2Z 1N3L 1N3Y 1N40 1N45 1N46 1N4K 1N4Q 1N4W 1N4X 1N4X 1N57 1N5D 1N5N 1N5U 1N62 1N62 1N62 1N67 1N6A 1N7H 1N7K 1N7V 1N7Z 1N81 1N82 1N83 1N8F 1N8I 1N8J 1N8P 1N8V 1N93 1N97 1N9B 1N9L 1N9P 1N9W 1NA5 1NA6 1NA8 1NAQ 1NAR 1NB2 1NB9 1NBA 1NBC 1NBF 1NBQ 1NBU 1NBW 1NBW 1NC5 1NC7 1NCI 1NCN 1NCQ 1NCQ 1NCQ 1NCW 1NCW 1NCX 1ND1 1ND2 1ND2 1ND2 1ND7 1NE2 1NE6 1NE7 1NE8 1NE9 1NEK 1NEK 1NEK 1NEK 1NEU 1NEX 1NEX 1NEY 1NF1 1NF2 1NF3 1NF3 1NF9 1NFG 1NFP 1NFV 1NG0 1NG2 1NG4 1NG6 1NGK 1NGN 1NGV 1NGV 1NH1 1NH8 1NHK 1NHP 1NHY 1NHZ 1NI3 1NI4 1NI4 1NI5 1NI9 1NIG 1NIJ 1NIR 1NIW 1NJ1 1NJ8 1NJF 1NJH 1NJK 1NJR 1NKG 1NKI 1NKO 1NKQ 1NKR 1NKS 1NKT 1NKV 1NLF 1NLN 1NLQ 1NLS 1NLT 1NLX 1NM2 1NM3 1NM8 1NME 1NMM 1NMM 1NMO 1NMU 1NMU 1NN4 1NN5 1NN7 1NNA 1NNF 1NNG 1NNH 1NNI 1NNL 1NNQ 1NNS 1NNW 1NNX 1NO1 1NO5 1NO7 1NOA 1NOF 1NOG 1NOS 1NOX 1NOZ 1NP3 1NP6 1NP7 1NPB 1NPE 1NPE 1NPL 1NPP 1NPU 1NPY 1NQ6 1NQ7 1NQE 1NQJ 1NQK 1NQN 1NQU 1NQZ 1NR0 1NR9 1NRF 1NRI 1NRJ 1NRJ 1NRK 1NRL 1NRR 1NRV 1NRW 1NRZ 1NS5 1NSJ 1NSL 1NST 1NSW 1NSZ 1NT2 1NT2 1NT4 1NTF 1NTG 1NTH 1NTM 1NTM 1NTM 1NTM 1NTM 1NTM 1NTV 1NTY 1NU0 1NU5 1NU7 1NU7 1NU7 1NUE 1NUK 1NUL 1NUN 1NUN 1NUU 1NUY 1NV8 1NVM 1NVM 1NVT 1NVU 1NVU 1NW1 1NW3 1NW9 1NWA 1NWP 1NWW 1NWZ 1NX8 1NX9 1NXH 1NXJ 1NXK 1NXM 1NXP 1NXQ 1NXU 1NY1 1NY5 1NY7 1NY7 1NYC 1NYK 1NYL 1NYR 1NYT 1NZ0 1NZ6 1NZA 1NZE 1NZI 1NZJ 1NZN 1NZO 1NZY 1O04 1O08 1O0E 1O0I 1O0S 1O0U 1O0W 1O0X 1O0Y 1O12 1O13 1O14 1O17 1O1X 1O1Y 1O1Z 1O20 1O22 1O26 1O2D 1O3U 1O3Y 1O4R 1O4S 1O4T 1O4U 1O4V 1O4W 1O4Y 1O4Z 1O50 1O51 1O54 1O58 1O59 1O5H 1O5I 1O5K 1O5L 1O5O 1O5U 1O5X 1O5Z 1O60 1O63 1O65 1O66 1O69 1O6B 1O6C 1O6D 1O6E 1O6L 1O6O 1O6O 1O6S 1O6S 1O6Y 1O6Z 1O73 1O75 1O7E 1O7F 1O7I 1O7J 1O7N 1O7N 1O7Q 1O7X 1O88 1O89 1O8B 1O8V 1O8X 1O91 1O94 1O94 1O94 1O97 1O97 1O98 1O9D 1O9G 1O9I 1O9J 1O9R 1O9W 1OA4 1OA8 1OAA 1OAC 1OAF 1OAH 1OAL 1OAO 1OAO 1OAP 1OAQ 1OAQ 1OB1 1OB1 1OB3 1OB8 1OBB 1OBD 1OBF 1OBO 1OBR 1OC0 1OC2 1OCK 1OCS 1OCV 1OCX 1OCY 1OD3 1OD5 1OD9 1ODF 1ODK 1ODM 1ODO 1ODZ 1OE0 1OE1 1OE8 1OE9 1OE9 1OEJ 1OEP 1OEQ 1OEW 1OEY 1OF3 1OF5 1OF5 1OF8 1OFC 1OFD 1OFH 1OFH 1OFU 1OFU 1OFV 1OFW 1OFZ 1OGA 1OGA 1OGA 1OGA 1OGD 1OGI 1OGL 1OGO 1OGP 1OGQ 1OGY 1OGY 1OH0 1OH2 1OHE 1OHF 1OHG 1OHL 1OHT 1OHU 1OHV 1OHZ 1OI0 1OI1 1OI2 1OI4 1OI6 1OI7 1OIH 1OIS 1OIU 1OIU 1OIV 1OJ1 1OJ4 1OJ5 1OJ7 1OJQ 1OJR 1OJS 1OJT 1OJX 1OK7 1OKC 1OKG 1OKI 1OKJ 1OKK 1OKK 1OKQ 1OKR 1OKT 1OL0 1OL5 1OLL 1OLM 1OLP 1OLR 1OLT 1OLZ 1OM4 1OMI 1OMO 1OMR 1OMW 1OMW 1OMZ 1ON0 1ON2 1ON3 1ONC 1ONF 1ONL 1ONR 1ONW 1OO0 1OO0 1OO2 1OOE 1OOH 1OOP 1OOP 1OOP 1OOY 1OPC 1OPO 1OQ1 1OQ9 1OQC 1OQE 1OQF 1OQQ 1OQV 1OR0 1OR0 1OR4 1OR7 1OR8 1ORE 1ORF 1ORJ 1ORR 1ORS 1ORS 1ORS 1ORU 1ORY 1OS8 1OSC 1OSH 1OSM 1OSN 1OSP 1OSP 1OSP 1OSY 1OT8 1OTG 1OTH 1OTJ 1OTK 1OTS 1OTS 1OTS 1OTV 1OU0 1OU5 1OU8 1OUO 1OUT 1OUT 1OUV 1OUW 1OVL 1OVM 1OVN 1OW1 1OW4 1OWL 1OX0 1OX3 1OX8 1OXD 1OXJ 1OXK 1OXK 1OXW 1OXX 1OY0 1OY3 1OY3 1OY5 1OYC 1OYE 1OYG 1OYJ 1OYS 1OYW 1OYZ 1OZ2 1OZ6 1OZ7 1OZ7 1OZ9 1OZB 1OZH 1P0F 1P0H 1P0K 1P0Y 1P0Z 1P15 1P16 1P1J 1P1L 1P1M 1P1X 1P22 1P22 1P27 1P27 1P2F 1P2Z 1P32 1P35 1P3C 1P3D 1P3R 1P3W 1P3Y 1P42 1P4A 1P4C 1P4D 1P4K 1P4L 1P4L 1P4O 1P4P 1P4T 1P4U 1P4X 1P57 1P57 1P5D 1P5F 1P5J 1P5Q 1P5S 1P5T 1P5V 1P5V 1P5Z 1P6O 1P6P 1P6X 1P77 1P7G 1P7K 1P7K 1P7O 1P80 1P8T 1P90 1P91 1P99 1P9B 1P9E 1P9H 1P9L 1P9O 1P9R 1P9S 1P9Y 1PA1 1PA2 1PA7 1PAM 1PAQ 1PAZ 1PB1 1PB6 1PB7 1PBE 1PBG 1PBJ 1PBK 1PBW 1PBY 1PBY 1PC6 1PCL 1PCQ 1PCX 1PCZ 1PDG 1PDK 1PDK 1PDO 1PDU 1PE1 1PE9 1PEA 1PEQ 1PEW 1PEX 1PF5 1PFF 1PFK 1PFO 1PFV 1PFZ 1PG4 1PG5 1PG5 1PG6 1PGI 1PGJ 1PGR 1PGR 1PGS 1PGT 1PGU 1PGV 1PGW 1PGW 1PHK 1PHO 1PHP 1PHS 1PI1 1PI4 1PIE 1PII 1PIN 1PIW 1PIX 1PJ3 1PJ5 1PJA 1PJC 1PJH 1PJM 1PJN 1PJQ 1PJR 1PJX 1PK5 1PK6 1PK6 1PK6 1PKH 1PKL 1PKO 1PKP 1PL4 1PL5 1PL8 1PLQ 1PM1 1PM4 1PMA 1PMA 1PME 1PMI 1PMJ 1PMN 1PMP 1PMT 1PMY 1PN0 1PN2 1PN3 1PNE 1PNO 1PO5 1POA 1POC 1POI 1POI 1POT 1POX 1PP0 1PP2 1PPJ 1PPJ 1PPJ 1PPJ 1PPJ 1PPJ 1PPO 1PPR 1PQ1 1PQ3 1PQ4 1PQ7 1PQH 1PQW 1PQZ 1PR9 1PRE 1PRT 1PRT 1PRT 1PRX 1PRZ 1PS1 1PS9 1PSD 1PSQ 1PSR 1PSU 1PSW 1PSZ 1PT6 1PTM 1PU5 1PU6 1PUC 1PUI 1PUJ 1PUO 1PV1 1PV5 1PV8 1PV9 1PVA 1PVC 1PVC 1PVC 1PVD 1PVG 1PVM 1PVN 1PVT 1PVV 1PW4 1PWA 1PWB 1PWG 1PWV 1PX0 1PXF 1PXV 1PXV 1PXW 1PXY 1PXZ 1PY5 1PYA 1PYB 1PYF 1PYK 1PYO 1PYO 1PYT 1PYT 1PYT 1PZ1 1PZ4 1PZ7 1PZG 1PZL 1PZM 1PZN 1PZS 1PZT 1PZV 1PZX 1Q06 1Q0B 1Q0P 1Q0Q 1Q0R 1Q0S 1Q0U 1Q12 1Q13 1Q15 1Q16 1Q16 1Q16 1Q1C 1Q1F 1Q1H 1Q1L 1Q1R 1Q1S 1Q1U 1Q20 1Q23 1Q2W 1Q2Y 1Q32 1Q33 1Q35 1Q3B 1Q3E 1Q3I 1Q3O 1Q3Q 1Q3X 1Q40 1Q40 1Q42 1Q44 1Q45 1Q46 1Q4M 1Q4R 1Q4U 1Q52 1Q5D 1Q5H 1Q5N 1Q5Q 1Q5Q 1Q5X 1Q5Z 1Q67 1Q6H 1Q6O 1Q6W 1Q6X 1Q6Z 1Q74 1Q77 1Q79 1Q7B 1Q7E 1Q7F 1Q7H 1Q7L 1Q7R 1Q7S 1Q88 1Q8A 1Q8B 1Q8C 1Q8D 1Q8F 1Q8I 1Q8R 1Q8U 1Q8Y 1Q90 1Q90 1Q90 1Q90 1Q92 1Q98 1Q9C 1Q9I 1Q9J 1Q9U 1QA7 1QA7 1QA9 1QAD 1QAH 1QAP 1QAU 1QAV 1QAZ 1QB0 1QB2 1QB3 1QB7 1QBA 1QBE 1QBK 1QBK 1QBZ 1QC6 1QC7 1QC9 1QCQ 1QCS 1QCX 1QCZ 1QD1 1QD6 1QD9 1QDL 1QDL 1QDM 1QE0 1QE3 1QE5 1QEZ 1QF8 1QF9 1QFH 1QFJ 1QFM 1QFT 1QFT 1QG3 1QGD 1QGH 1QGJ 1QGN 1QGO 1QGQ 1QGR 1QGT 1QGV 1QH4 1QH5 1QH8 1QH8 1QHD 1QHF 1QHL 1QHO 1QHQ 1QHT 1QHV 1QHX 1QI9 1QIB 1QID 1QIP 1QJ4 1QJ8 1QJB 1QJP 1QJV 1QK1 1QKI 1QKK 1QKM 1QKR 1QKS 1QL0 1QLA 1QLA 1QLA 1QLE 1QLE 1QLE 1QLE 1QLE 1QLM 1QLP 1QLW 1QM4 1QME 1QMG 1QMJ 1QMO 1QMO 1QMV 1QMY 1QN2 1QNG 1QNI 1QNT 1QNX 1QO0 1QO0 1QO3 1QO3 1QO3 1QO5 1QO7 1QO8 1QOI 1QOP 1QOP 1QOR 1QOU 1QOX 1QOY 1QP8 1QPC 1QPG 1QPO 1QPX 1QQ5 1QQ9 1QQE 1QQF 1QQG 1QQK 1QQL 1QQQ 1QQR 1QR2 1QR4 1QRE 1QS0 1QS0 1QSA 1QSD 1QSM 1QST 1QTF 1QTJ 1QTN 1QTO 1QTW 1QTX 1QU1 1QU7 1QU9 1QUA 1QUQ 1QUQ 1QUS 1QUU 1QV0 1QV9 1QVB 1QVC 1QVE 1QW2 1QW9 1QWD 1QWG 1QWI 1QWJ 1QWK 1QWL 1QWR 1QWT 1QWY 1QWZ 1QX1 1QXH 1QXM 1QXO 1QXY 1QY1 1QY5 1QY6 1QY7 1QY9 1QYC 1QYD 1QYI 1QYN 1QYR 1QYS 1QZ5 1QZ7 1QZ9 1QZF 1QZN 1QZT 1QZU 1QZZ 1R03 1R0D 1R0K 1R0M 1R0P 1R0R 1R0T 1R0U 1R0V 1R0W 1R12 1R13 1R17 1R18 1R1K 1R1K 1R1M 1R1Q 1R26 1R29 1R2F 1R2J 1R2Q 1R2R 1R30 1R31 1R3C 1R3D 1R3F 1R3H 1R3J 1R3J 1R3J 1R3N 1R3S 1R3U 1R44 1R45 1R4C 1R4P 1R4Q 1R4U 1R4V 1R4W 1R4X 1R53 1R59 1R5A 1R5B 1R5I 1R5I 1R5I 1R5J 1R5L 1R5P 1R5Q 1R5R 1R5T 1R5Y 1R61 1R62 1R6D 1R6F 1R6L 1R6N 1R6V 1R6W 1R6X 1R75 1R76 1R7A 1R7L 1R85 1R88 1R89 1R8G 1R8J 1R8N 1R8S 1R8S 1R9C 1R9D 1R9G 1R9H 1R9J 1R9L 1R9O 1R9W 1RA0 1RA4 1RA6 1RA9 1RBD 1RBL 1RBL 1RC2 1RC6 1RC9 1RCD 1RCQ 1RCU 1RCW 1RD5 1RDO 1RDS 1RDT 1RDT 1RE5 1RE9 1REG 1REQ 1REQ 1REW 1REW 1RF3 1RF6 1RFE 1RFM 1RFN 1RFS 1RFY 1RFZ 1RG8 1RG9 1RGX 1RGZ 1RH1 1RH2 1RHC 1RHF 1RHS 1RHY 1RI5 1RI6 1RI7 1RIE 1RIF 1RII 1RIL 1RIQ 1RJ1 1RJ8 1RJB 1RJD 1RJO 1RJW 1RK6 1RK8 1RK8 1RKB 1RKD 1RKQ 1RKT 1RKU 1RKX 1RL0 1RL2 1RL4 1RL6 1RLH 1RLI 1RLJ 1RLK 1RLM 1RLR 1RLW 1RM4 1RM6 1RM6 1RM6 1RM8 1RMD 1RMW 1RNF 1RO2 1RO5 1RO7 1ROC 1ROW 1RP3 1RP4 1RPM 1RPN 1RPX 1RPY 1RQ0 1RQ2 1RQB 1RQJ 1RQP 1RQW 1RR7 1RRE 1RRM 1RRO 1RRP 1RRP 1RSS 1RSY 1RT8 1RTF 1RTQ 1RTR 1RTT 1RTU 1RTV 1RTW 1RTY 1RU0 1RU4 1RU7 1RU7 1RU8 1RUR 1RUR 1RUT 1RV3 1RV9 1RVE 1RVG 1RVK 1RVV 1RW0 1RW1 1RW6 1RW7 1RWH 1RWI 1RWR 1RWT 1RWY 1RWZ 1RX0 1RXD 1RXQ 1RXX 1RXY 1RY2 1RY6 1RY9 1RYA 1RYB 1RYL 1RYO 1RYP 1RYP 1RYP 1RYP 1RYP 1RYP 1RYP 1RYP 1RYP 1RYP 1RYP 1RZ1 1RZ2 1RZ3 1RZ4 1RZ6 1RZF 1RZF 1RZH 1RZH 1RZH 1RZM 1RZN 1RZO 1RZU 1S0A 1S0P 1S0U 1S14 1S16 1S1C 1S1D 1S1F 1S1M 1S1P 1S1Q 1S21 1S28 1S2E 1S2K 1S2W 1S2X 1S35 1S3E 1S3G 1S3I 1S3J 1S3M 1S3S 1S3S 1S48 1S4B 1S4D 1S4E 1S4K 1S4Q 1S4V 1S4Y 1S57 1S58 1S5A 1S5D 1S5D 1S5J 1S5L 1S5L 1S5L 1S5L 1S5L 1S5L 1S5L 1S5P 1S5T 1S5U 1S68 1S69 1S6C 1S6Y 1S70 1S70 1S7I 1S7J 1S7M 1S7O 1S7Z 1S8N 1S95 1S96 1S98 1S99 1S9A 1S9J 1S9P 1S9R 1S9U 1S9V 1S9V 1SA0 1SA0 1SA0 1SAC 1SAT 1SAW 1SB8 1SBF 1SBP 1SBQ 1SBW 1SBX 1SBZ 1SC3 1SCF 1SCJ 1SCT 1SCT 1SCZ 1SD4 1SDI 1SDM 1SDO 1SDW 1SE0 1SE8 1SEB 1SEB 1SEB 1SED 1SEF 1SEI 1SEK 1SEN 1SES 1SEZ 1SF8 1SF9 1SFD 1SFE 1SFF 1SFL 1SFN 1SFP 1SFR 1SFX 1SG1 1SG1 1SG6 1SGH 1SGJ 1SGL 1SGM 1SGP 1SGW 1SGX 1SH0 1SH5 1SH8 1SHE 1SHS 1SHU 1SHY 1SHY 1SI5 1SI8 1SIG 1SIQ 1SIX 1SJ2 1SJD 1SJW 1SJY 1SK4 1SK7 1SKQ 1SKY 1SKY 1SKZ 1SL8 1SLQ 1SLU 1SLU 1SM2 1SMB 1SML 1SMO 1SMP 1SMP 1SMR 1SMT 1SMV 1SNC 1SNN 1SNR 1SNY 1SNZ 1SO2 1SO7 1SOT 1SOX 1SP3 1SP8 1SPG 1SPG 1SPP 1SPP 1SPV 1SPX 1SQ1 1SQ2 1SQ2 1SQ4 1SQ5 1SQ9 1SQE 1SQH 1SQI 1SQJ 1SQK 1SQL 1SQS 1SQU 1SR4 1SR4 1SR4 1SR7 1SR8 1SRA 1SRD 1SRQ 1SRR 1SRV 1SS4 1SSQ 1SSX 1ST9 1STF 1STM 1STZ 1SU0 1SU1 1SU8 1SUM 1SUR 1SUU 1SUW 1SV6 1SV8 1SVA 1SVI 1SVM 1SVP 1SVS 1SVV 1SVY 1SW5 1SW6 1SWV 1SWX 1SX7 1SXG 1SXJ 1SXJ 1SXJ 1SXJ 1SXJ 1SXJ 1SXR 1SY7 1SYR 1SYY 1SZ2 1SZ6 1SZ6 1SZ9 1SZB 1SZH 1SZI 1SZO 1SZP 1SZQ 1SZW 1T01 1T06 1T0A 1T0B 1T0F 1T0H 1T0H 1T0I 1T0J 1T0J 1T0L 1T0N 1T0Q 1T0Q 1T0T 1T10 1T11 1T15 1T16 1T1D 1T1G 1T1J 1T2A 1T2D 1T2L 1T2W 1T33 1T35 1T3B 1T3C 1T3D 1T3E 1T3I 1T3Q 1T3Q 1T3Q 1T3T 1T3U 1T3W 1T43 1T46 1T47 1T4B 1T4G 1T4H 1T4O 1T4W 1T56 1T57 1T5B 1T5H 1T5I 1T5J 1T5L 1T5O 1T5R 1T5Y 1T61 1T61 1T62 1T64 1T6A 1T6C 1T6E 1T6G 1T6G 1T6J 1T6L 1T6S 1T6T 1T6U 1T70 1T71 1T72 1T73 1T77 1T79 1T7F 1T7M 1T7R 1T7V 1T8P 1T8Q 1T8T 1T92 1T94 1T95 1T9B 1T9F 1T9H 1T9K 1TA0 1TA8 1TAZ 1TBF 1TBM 1TBR 1TBR 1TC1 1TC5 1TD4 1TD5 1TD6 1TDH 1TDJ 1TDQ 1TDQ 1TE2 1TE5 1TE6 1TED 1TEL 1TEV 1TEX 1TF0 1TF1 1TF4 1TF5 1TF7 1TFE 1TFF 1TFR 1TFU 1TFX 1TFZ 1TG8 1TGS 1TGZ 1TH0 1TH1 1TH1 1TH8 1TH8 1THF 1THM 1THQ 1THT 1THX 1TIA 1TIB 1TID 1TID 1TIE 1TII 1TIQ 1TIS 1TJ7 1TJC 1TJG 1TJG 1TJL 1TJN 1TJO 1TJV 1TJY 1TK4 1TK9 1TKE 1TKI 1TKS 1TL2 1TL9 1TLJ 1TLQ 1TLT 1TLU 1TLY 1TM0 1TM8 1TME 1TME 1TME 1TMK 1TML 1TMO 1TMY 1TN3 1TN6 1TN6 1TNR 1TNR 1TO0 1TO3 1TO6 1TO9 1TOA 1TOC 1TOC 1TOI 1TOL 1TON 1TP6 1TQ4 1TQ8 1TQG 1TQH 1TQI 1TQN 1TQX 1TQY 1TQY 1TR0 1TR9 1TRB 1TRE 1TRK 1TS9 1TSJ 1TT5 1TT5 1TT7 1TT8 1TU1 1TU3 1TU7 1TU9 1TUA 1TUE 1TUE 1TUH 1TUL 1TUV 1TUW 1TVD 1TVF 1TVG 1TVL 1TW3 1TW4 1TW6 1TW9 1TWD 1TWF 1TWF 1TWF 1TWF 1TWF 1TWF 1TWF 1TWF 1TWI 1TWL 1TWU 1TWY 1TX2 1TX4 1TX4 1TXD 1TXG 1TXJ 1TXK 1TXN 1TXO 1TXU 1TY0 1TY4 1TY9 1TYF 1TYG 1TYV 1TYY 1TZ0 1TZ9 1TZA 1TZF 1TZJ 1TZL 1TZP 1TZY 1TZY 1TZY 1TZY 1TZZ 1U00 1U02 1U04 1U08 1U0J 1U0K 1U0M 1U0S 1U0V 1U11 1U14 1U1I 1U1J 1U1Z 1U24 1U2C 1U2K 1U2M 1U2X 1U2Z 1U3D 1U3Y 1U4G 1U4J 1U4N 1U59 1U5H 1U5K 1U5P 1U5R 1U5U 1U60 1U61 1U69 1U6D 1U6G 1U6G 1U6G 1U6L 1U6M 1U79 1U7B 1U7G 1U7I 1U7K 1U7L 1U7N 1U7P 1U83 1U8S 1U8V 1U8W 1U8X 1U8Z 1U94 1U9A 1U9C 1U9D 1U9J 1U9K 1UA2 1UAC 1UAC 1UAC 1UAD 1UAI 1UAL 1UAN 1UAR 1UAS 1UAX 1UAY 1UAZ 1UB0 1UB2 1UB3 1UB4 1UB7 1UB9 1UBK 1UBK 1UBY 1UC2 1UC8 1UCT 1UCY 1UCY 1UCY 1UD0 1UD2 1UD9 1UDC 1UDD 1UDH 1UDS 1UDX 1UDZ 1UE5 1UE8 1UEA 1UEA 1UEB 1UED 1UEH 1UES 1UF2 1UF2 1UF2 1UF3 1UF5 1UF9 1UFA 1UFB 1UFK 1UFO 1UFR 1UFY 1UG6 1UGM 1UGN 1UGP 1UGP 1UGX 1UH5 1UHN 1UHV 1UI0 1UI5 1UIK 1UIR 1UIS 1UIU 1UIY 1UIZ 1UJ2 1UJ6 1UJK 1UJM 1UJN 1UJP 1UK8 1UKF 1UKG 1UKJ 1UKK 1UKL 1UKU 1UKV 1UKV 1UKW 1UKZ 1UL9 1ULH 1ULI 1ULI 1ULK 1ULQ 1ULS 1ULU 1ULV 1ULY 1ULZ 1UM0 1UM2 1UM5 1UM5 1UM8 1UMD 1UMD 1UMG 1UMH 1UMK 1UMM 1UMN 1UMR 1UMR 1UMV 1UMW 1UN0 1UN2 1UN3 1UN7 1UN8 1UNA 1UNF 1UNL 1UNL 1UNN 1UNN 1UNQ 1UOC 1UOH 1UOK 1UOL 1UOU 1UOW 1UOZ 1UP7 1UP8 1UP9 1UPB 1UPI 1UPK 1UPQ 1UPS 1UPT 1UPV 1UQR 1UQT 1UQW 1UQX 1UR3 1UR4 1UR5 1URH 1URJ 1URR 1URS 1URU 1URV 1URZ 1US0 1US3 1US5 1US7 1US7 1USC 1USG 1USJ 1USP 1USU 1USU 1USX 1USY 1USY 1UT1 1UT7 1UT9 1UTH 1UTN 1UTY 1UU1 1UU3 1UUF 1UUH 1UUL 1UUQ 1UUR 1UUY 1UUZ 1UUZ 1UV0 1UV7 1UW4 1UW6 1UW7 1UWH 1UWK 1UWS 1UWV 1UWZ 1UX5 1UX6 1UX8 1UXA 1UXO 1UXT 1UXY 1UXZ 1UY2 1UYJ 1UYL 1UYN 1UYP 1UYR 1UZ1 1UZ5 1UZB 1UZE 1UZP 1UZV 1UZX 1V00 1V02 1V04 1V0D 1V0E 1V0L 1V10 1V1A 1V1O 1V1Q 1V25 1V29 1V29 1V2A 1V2D 1V2X 1V2Z 1V30 1V33 1V37 1V3V 1V3W 1V3Y 1V43 1V47 1V4A 1V4P 1V4S 1V4V 1V4X 1V4X 1V54 1V54 1V54 1V54 1V54 1V58 1V5D 1V5V 1V5X 1V6C 1V6I 1V6S 1V6T 1V6Z 1V70 1V74 1V7C 1V7L 1V7R 1V7W 1V7Z 1V84 1V8A 1V8B 1V8C 1V8D 1V8F 1V8G 1V8P 1V8Y 1V93 1V97 1V9C 1V9D 1V9L 1V9M 1V9T 1V9Y 1VA4 1VA6 1VAK 1VAP 1VAV 1VB5 1VBF 1VC1 1VC4 1VCA 1VCL 1VCP 1VD5 1VDC 1VDH 1VDK 1VDR 1VDU 1VDW 1VE6 1VE9 1VEA 1VEC 1VEI 1VES 1VET 1VET 1VF5 1VF5 1VF5 1VF5 1VF7 1VFJ 1VFR 1VFS 1VFV 1VG0 1VG0 1VG8 1VGG 1VGQ 1VGS 1VGW 1VGY 1VH0 1VH1 1VH4 1VH5 1VH6 1VH9 1VHC 1VHE 1VHH 1VHI 1VHK 1VHM 1VHN 1VHO 1VHQ 1VHR 1VHS 1VHU 1VHV 1VHW 1VHX 1VHY 1VHZ 1VI0 1VI1 1VI2 1VI4 1VI6 1VI7 1VI9 1VIA 1VIC 1VIM 1VIO 1VIP 1VIU 1VIZ 1VJ1 1VJ2 1VJ7 1VJE 1VJF 1VJG 1VJH 1VJL 1VJN 1VJO 1VJP 1VJR 1VJT 1VJU 1VJV 1VJX 1VJZ 1VK0 1VK1 1VK2 1VK3 1VK4 1VK6 1VKC 1VKD 1VKE 1VKF 1VKH 1VKI 1VKJ 1VKK 1VKM 1VKN 1VKO 1VKP 1VKU 1VKV 1VKW 1VKY 1VKZ 1VL0 1VL1 1VL2 1VL4 1VL5 1VL6 1VL7 1VL8 1VLA 1VLB 1VLC 1VLF 1VLF 1VLG 1VLH 1VLI 1VLJ 1VLM 1VLO 1VLP 1VLQ 1VLR 1VLS 1VLU 1VLV 1VLW 1VM6 1VM7 1VM9 1VMA 1VMB 1VMD 1VME 1VMF 1VMH 1VMI 1VMJ 1VMK 1VMO 1VNS 1VOK 1VOM 1VP2 1VP4 1VP5 1VP6 1VP7 1VP8 1VPA 1VPB 1VPD 1VPE 1VPH 1VPJ 1VPK 1VPL 1VPM 1VPN 1VPP 1VPQ 1VPX 1VPY 1VQ0 1VQ2 1VQQ 1VQR 1VQS 1VQV 1VQW 1VR2 1VRT 1VRT 1VSR 1VYB 1VYD 1VYF 1VYI 1VYR 1VYU 1VZ0 1VZ6 1VZE 1VZO 1VZV 1VZY 1W07 1W0D 1W0I 1W0M 1W0N 1W0P 1W15 1W1H 1W1W 1W1W 1W1Z 1W23 1W25 1W27 1W2F 1W2W 1W2W 1W2Y 1W30 1W32 1W3B 1W3I 1W3O 1W3U 1W44 1W4X 1W5B 1W5F 1W5T 1W6K 1W6N 1W6S 1W6U 1W74 1W7B 1W7L 1W7W 1W85 1W85 1W8A 1W8I 1W8M 1W8O 1W96 1W97 1W9A 1W9C 1W9P 1W9S 1WA5 1WA5 1WA5 1WAD 1WB1 1WBA 1WC3 1WC9 1WCH 1WD5 1WD6 1WD7 1WDA 1WDC 1WDD 1WDD 1WDE 1WDI 1WDJ 1WDK 1WDK 1WDN 1WDU 1WDV 1WDY 1WE1 1WEH 1WEK 1WER 1WF3 1WF4 1WF4 1WFX 1WG8 1WGB 1WHI 1WIW 1WJ9 1WJG 1WJX 1WK2 1WK4 1WKC 1WKQ 1WKR 1WLF 1WLG 1WLJ 1WM1 1WMD 1WMG 1WMS 1WMU 1WMU 1WMX 1WMZ 1WND 1WOH 1WOQ 1WOS 1WOU 1WP1 1WP5 1WP6 1WPB 1WPG 1WPH 1WPN 1WPO 1WPW 1WQ8 1WR8 1WS8 1WSA 1WTD 1WTL 1WTY 1WU2 1WU3 1WUB 1WUE 1WUF 1WUU 1WV2 1WVI 1WWB 1WWC 1WWW 1WWW 1X6M 1X6O 1X72 1X79 1X7D 1X7F 1X7G 1X7Y 1X7Y 1X82 1X87 1X8H 1X8M 1X8Q 1X8V 1X91 1X92 1X94 1X99 1X9F 1X9F 1X9F 1X9F 1X9G 1X9I 1X9Z 1XA0 1XA1 1XA3 1XA6 1XAA 1XAR 1XAU 1XB2 1XB2 1XB3 1XB7 1XBB 1XBN 1XBT 1XBW 1XC2 1XC3 1XCB 1XCC 1XCG 1XCG 1XCL 1XCO 1XD3 1XD5 1XD7 1XDI 1XDN 1XDT 1XDY 1XDZ 1XE0 1XE1 1XE3 1XE7 1XEA 1XEB 1XED 1XER 1XEW 1XEW 1XEY 1XFH 1XFI 1XFJ 1XFK 1XFO 1XFP 1XFP 1XFS 1XG0 1XG5 1XG7 1XG8 1XG9 1XGK 1XGS 1XHC 1XHD 1XHK 1XHL 1XHO 1XHX 1XI3 1XI6 1XI9 1XIA 1XIM 1XIO 1XIP 1XIQ 1XIW 1XIW 1XIW 1XIZ 1XJ5 1XJC 1XJD 1XJU 1XKI 1XKK 1XKL 1XKN 1XKQ 1XKR 1XKS 1XKT 1XL4 1XLY 1XM3 1XM5 1XM7 1XM8 1XMA 1XMB 1XMC 1XMP 1XMR 1XMT 1XMX 1XNB 1XNF 1XNI 1XNV 1XNZ 1XO5 1XO7 1XOR 1XOU 1XP4 1XP8 1XPC 1XPJ 1XPM 1XPP 1XQ1 1XQ4 1XQ5 1XQ5 1XQ6 1XQ9 1XQA 1XQB 1XQG 1XQM 1XQO 1XQU 1XR4 1XR5 1XR7 1XRG 1XRH 1XRI 1XRK 1XS1 1XS5 1XSJ 1XSM 1XSO 1XSQ 1XSV 1XSZ 1XT9 1XTC 1XTC 1XTE 1XTG 1XTO 1XTP 1XU1 1XU2 1XU9 1XUB 1XUU 1XUV 1XV1 1XV2 1XVA 1XVH 1XVI 1XVP 1XVP 1XVQ 1XVS 1XVX 1XW6 1XW8 1XWA 1XWL 1XWM 1XWS 1XWV 1XX4 1XX6 1XX7 1XXF 1XXF 1XXL 1XY7 1XYG 1XYN 1XYP 1XYZ 1XZP 1XZP 1Y01 1Y01 1Y02 1Y08 1Y0E 1Y0G 1Y0H 1Y0Z 1Y13 1Y14 1Y14 1Y1L 1Y1O 1Y1X 1Y23 1Y2I 1Y2T 1Y4T 1Y60 1Y63 1Y6H 1Y6J 1Y6L 1Y7E 1Y88 1Y8C 1Y9I 1Y9Q 1Y9U 1YAA 1YAC 1YAL 1YAT 1YAV 1YB1 1YB5 1YBE 1YBF 1YCC 1YCN 1YCQ 1YCR 1YCS 1YCS 1YDG 1YDH 1YDW 1YEM 1YEY 1YFM 1YFO 1YFQ 1YG2 1YGE 1YGH 1YGP 1YGS 1YNA 1YPR 1YRG 1YTT 1ZBD 1ZBD 1ZFJ 1ZIN 1ZNC 1ZPD 1ZRN 1ZYM 256B 2A0B 2A2U 2AAA 2AAK 2ABK 2ACT 2ADM 2AE2 2AHJ 2AHJ 2AK3 2APS 2ARC 2ASR 2AT2 2AY1 2AYH 2AYQ 2AZA 2BAA 2BB2 2BBK 2BBK 2BC2 2BCE 2BEM 2BES 2BTM 2BTV 2BTV 2CAS 2CAU 2CB5 2CBL 2CCY 2CEV 2CHR 2CKB 2CKB 2CKB 2CMD 2CND 2CUA 2CY3 2CYH 2DLD 2DPM 2DRI 2E2A 2E2C 2EBN 2EIF 2END 2ENG 2FCB 2FCR 2FHE 2FOK 2FRV 2FRV 2GDM 2GMF 2GPR 2GSA 2GSQ 2HBG 2HFT 2HGS 2HHM 2HLC 2HMZ 2HRV 2HVM 2ILA 2ILK 2LDX 2LHB 2LIS 2LJR 2LTN 2MAD 2MAD 2MCM 2MEV 2MEV 2MEV 2MHR 2MNR 2MYS 2MYS 2MYS 2NAC 2NAP 2NCD 2OAT 2PF2 2PGD 2PGK 2PIA 2PII 2PKA 2PLC 2POR 2PRD 2PSP 2PTD 2PTH 2PVA 2PVB 2RHE 2RIG 2RMC 2RSL 2RSP 2SAK 2SAS 2SCP 2SCU 2SCU 2SGA 2SLI 2SPC 2SQC 2STV 2TBV 2TCT 2TGI 2TMG 2TNF 2TOH 2TPS 2TPT 2TRC 2TRC 2TRX 2UCZ 2VHB 2VSG 2XAT 2ZNC 3ADK 3APP 3C2C 3CAO 3CBH 3CLA 3CMS 3COX 3CSU 3CTS 3DFR 3EUG 3EZM 3FAP 3FIB 3GCB 3GRS 3KVT 3LAD 3LYN 3LZT 3MAG 3MDD 3NUL 3PCC 3PGA 3PMG 3PRN 3PRO 3PSG 3RAB 3RP2 3SDH 3SEB 3SIL 3SXL 3TAT 3TDT 3TGL 3THI 3TSS 3ULL 3VUB 4BCL 4CAT 4CPA 4FIV 4HB1 4MDH 4PFK 4PGA 4SBV 4SGB 4TS1 4UAG 4UBP 4XIA 5CSM 5CYT 5EAU 5NUL 5PAL 5RUB 5TMP 7A3H 7AAT 7AHL 7FD1 7LYZ 7MDH 7ODC 7TAA 830C 8ABP 8ACN 8TLN 9WGA PDB ID -------------- next part -------------- A non-text attachment was scrubbed... Name: 1.doc Type: application/octet-stream Size: 296960 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050307/0ed1e79d/1-0001.obj From s0460205 at sms.ed.ac.uk Mon Mar 7 09:24:11 2005 From: s0460205 at sms.ed.ac.uk (SG Edwards) Date: Mon Mar 7 13:38:25 2005 Subject: [Bioperl-l] (no subject) Message-ID: <1110205451.422c640b5a5ba@sms.ed.ac.uk> Hi, I am writing a perl program that will extract data from a UniProt flatfile so that I can automatically put data into my PostgreSQL database. I am taking out name, protein ID number, references etc from the file. Does anyone know if there is a script available to do this already? Many thanks, Stephen From s0460205 at sms.ed.ac.uk Mon Mar 7 09:24:57 2005 From: s0460205 at sms.ed.ac.uk (SG Edwards) Date: Mon Mar 7 13:38:26 2005 Subject: [Bioperl-l] Extraction from UniProt flatfile Message-ID: <1110205497.422c64392756e@sms.ed.ac.uk> Hi, I am writing a perl program that will extract data from a UniProt flatfile so that I can automatically put data into my PostgreSQL database. I am taking out name, protein ID number, references etc from the file. Does anyone know if there is a script available to do this already? Many thanks, Stephen From lstein at cshl.edu Mon Mar 7 12:56:34 2005 From: lstein at cshl.edu (Lincoln Stein) Date: Mon Mar 7 13:38:27 2005 Subject: [Bioperl-l] bump in GD::SVG In-Reply-To: <1108754778.4216415a6dea9@webmail.njit.edu> References: <1108754778.4216415a6dea9@webmail.njit.edu> Message-ID: <200503071256.34583.lstein@cshl.edu> Sorry for the delay. I have forwarded this bug report to Todd Harris, who maintains GD::SVG. Offhand I don't see a good explanation for this behavior, as GD::SVG is at a level below the code that does the bumping. It would help to send a script that elicits the behavior. Lincoln On Friday 18 February 2005 02:26 pm, hz5@njit.edu wrote: > Hi everybody, > My bump setting in GD::SVG for generic glyph doesn't work. Has this > happen to anyone or it is just me? > (setting bump to 0 doesn't make exons align in one line) > Thanks! > haibo > ========================================================= > Haibo Zhang, PhD student > Computational Biology, NJIT & Rutgers University > Center for Applied Genomics, PHRI > http://afs13.njit.edu/~hz5 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 NOTE: Please copy Sandra Michelsen on all emails regarding scheduling and other time-critical topics. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050307/f21bb6c5/attachment-0001.bin From lstein at cshl.edu Mon Mar 7 13:02:58 2005 From: lstein at cshl.edu (Lincoln Stein) Date: Mon Mar 7 13:38:28 2005 Subject: [Bioperl-l] two colors on the same track, still not work In-Reply-To: <1108946376.42192dc8f1bb2@webmail.njit.edu> References: <1108946376.42192dc8f1bb2@webmail.njit.edu> Message-ID: <200503071302.59272.lstein@cshl.edu> Here's one guess. You have to make sure that the %colors hash is visible from the callback under Perl lexical scoping rules. If you use "use strict" and the -w switch then Perl will warn about this type of common error. If this doesn't work please send a complete test case script that shows the problem. Make the script as short as possible and remove dependencies on other aspects of your system. Also send the version numbers for Perl and BioPerl. Lincoln On Sunday 20 February 2005 07:39 pm, hz5@njit.edu wrote: > Hi everybody, > I am still struggling with two colors on the same track, if anybody > can help me, I would appreciate it a lot! > > I want to have utr blue, coding seq brown, so I have utr > splitlocations in one feature, and coding seq splitlocations in > another: > ###################################################### > my $f1 = Bio::SeqFeature::Generic->new( > -primary_tag => $geneid, > -seq_id => $nm, > -source_tag => $UTR_str, > -location => $splitlocation_utr, > ); > my $f = Bio::SeqFeature::Generic->new( > -primary_tag => $geneid, > -source_tag => $coding_str, > -seq_id => $nm, > -location => $splitlocation, > ); > push @allft, $f1; > push @allft, $f; > > then I try to render @allft on one track, but color utr and coding > sequence differently use a subroutine for bgcolor: > ################################################# > my $track_nm = $panel ->add_track(\@allft, > -glyph => 'generic', > -font2color => 'blue', > -connector => 'solid', > -bump => $bump, > -description => sub{ > my $f_tmp = shift; > if($f_tmp->source_tag eq $dHSP_str){ > return ''; > }else{ > return $f_tmp->seq_id; > } > }, > -bgcolor => sub{ > my $f_tmp = shift; > print "**".$colors{$f_tmp->source_tag}."\n"; > return $colors{$f_tmp->source_tag}; > }, > ); > > I have %colors keyed by the source_tag I put in features. But it > doesn't work. > > Anybody knows how to fulfill this kind of functions? > Thanks!!!! > haibo > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 NOTE: Please copy Sandra Michelsen on all emails regarding scheduling and other time-critical topics. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050307/65dc5bce/attachment-0001.bin From Ned.Young at tufts.edu Mon Mar 7 13:27:05 2005 From: Ned.Young at tufts.edu (Ned Young) Date: Mon Mar 7 13:38:29 2005 Subject: [Bioperl-l] Need help using AlignIO Message-ID: <9d0af755cfda0c309264e58c73a9f248@tufts.edu> Hi, I must not be using AlignIO right, for when I try to read in an alignment and then output it to a file, I get an empty file. I'm trying to write a script for the design of multiplex SNP primers, and, after looking at several modules, thought that AlignIO would be good. Can someone give me a pointer? Here's a trimmed down version of my script, to show the problem, as well as the input file I've been using. I run the script by typing: ./test3.pl test.fasta -------------- next part -------------- A non-text attachment was scrubbed... Name: test3.pl Type: application/text Size: 535 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050307/30f90540/test3-0001.bin -------------- next part -------------- A non-text attachment was scrubbed... Name: test.fasta Type: application/text Size: 73730 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050307/30f90540/test-0001.bin -------------- next part -------------- Any other modules I should look at? Yours truly, Ned Young Department of Biomedical Sciences Division of Infectious Diseases Tufts University School of Veterinary Medicine 200 Westboro Rd. N. Grafton, MA 01536 508-887-4540 From jason.stajich at duke.edu Mon Mar 7 13:48:49 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Mon Mar 7 13:47:40 2005 Subject: [Bioperl-l] gap/ambiguous character only sequences: Bio::PrimarySeq In-Reply-To: <1109863560.20641.154.camel@tick.compbio.dundee.ac.uk> References: <1109863560.20641.154.camel@tick.compbio.dundee.ac.uk> Message-ID: I think you are talking about _guess_alphabet? You can always override the _guess_alphabet method - I posted a soln to this last month. http://portal.open-bio.org/pipermail/bioperl-l/2005-February/018253.html Does that work for you? It warns instead of throws when it is all gapped. You can make it even quieter if you like of course. -jason -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ On Mar 3, 2005, at 10:26 AM, Jon manning wrote: > Hi All, > > For a lot of the stuff I'm doing at the moment I'm chopping up > alignments and playing with the bits etc. I've had to nobble > Bio::PrimarySeq to allow the resulting gap-only sequences in > Bio::LocatableSeq- I understand the rationale behind this check, and > it's a useful default, but could we perhaps have an option to allow > tolerance instead? If such exists, I'd be grateful if someone could > point me in the right direction! > > Thanks, > > Jon > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -------------- next part -------------- A non-text attachment was scrubbed... Name: PGP.sig Type: application/pgp-signature Size: 186 bytes Desc: This is a digitally signed message part Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050307/a5842925/PGP.bin From jason.stajich at duke.edu Mon Mar 7 13:51:09 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Mon Mar 7 13:51:11 2005 Subject: [Bioperl-l] GFF question In-Reply-To: <42270CFF.1080502@iq.usp.br> References: <42270CFF.1080502@iq.usp.br> Message-ID: <4dd8e4a7f8f5c914952ff9ab3a2ac063@duke.edu> All of the Parsers produce Bio::SeqFeatureI objects (well nearly all of them). SeqFeatureI objects can be written out to GFF with Bio::Tools::GFF (and presumably Bio::FeatureIO). Some of the genefeature parsers try and build Gene objects so you may have to untangle them some to get at the underlying exons and write each of those out to GFF as well. There isn't a Repeatmasker parser in Bioperl that I know of although Ensembl has one which could be ported some day. -jason -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ On Mar 3, 2005, at 8:11 AM, Thiago Motta Venancio wrote: > Hi folks. > I would like to get a more detailed explanation about how to construct > GFF files with the outputs of several programs, like genescan, > repeatmasker... > thanks in advance. > Thiago > > -- > Thiago Motta Venancio - PhD student in Bioinformatics > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -------------- next part -------------- A non-text attachment was scrubbed... Name: PGP.sig Type: application/pgp-signature Size: 186 bytes Desc: This is a digitally signed message part Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050307/80368bda/PGP.bin From jason.stajich at duke.edu Mon Mar 7 13:56:01 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Mon Mar 7 13:53:28 2005 Subject: [Bioperl-l] Need help using AlignIO In-Reply-To: <9d0af755cfda0c309264e58c73a9f248@tufts.edu> References: <9d0af755cfda0c309264e58c73a9f248@tufts.edu> Message-ID: You are not providing the input file so no alignments are being read in my $in = Bio::AlignIO->new(); Should be # or whatever format you have it stored in. my $in = Bio::AlignIO->new(-file => 'filename.aln', -format => 'fasta'); Or if you want it to be the cmdline you need to specify it my $in = Bio::AlignIO->new(-file => shift @ARGV, -format => 'fasta'); If you wanted to revert to the old behavior (> Bioperl 1.2) where either cmdline or STDIN would be re-directed as input you need the special ARGV handle. my $in = Bio::AlignIO->new(-fh => \*ARGV, -format => 'fasta'); # or whatever format you have it in, can't mix formats... -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ On Mar 7, 2005, at 1:27 PM, Ned Young wrote: > Hi, > > I must not be using AlignIO right, for when I try to read in an > alignment and then output it to a file, I get an empty file. > > I'm trying to write a script for the design of multiplex SNP primers, > and, after looking at several modules, thought that AlignIO would be > good. > > Can someone give me a pointer? Here's a trimmed down version of my > script, to show the problem, as well as the input file I've been > using. I run the script by typing: > ./test3.pl test.fasta > > > > Any other modules I should look at? > > Yours truly, > > Ned Young > Department of Biomedical Sciences > Division of Infectious Diseases > Tufts University School of Veterinary Medicine > 200 Westboro Rd. > N. Grafton, MA 01536 > 508-887-4540 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -------------- next part -------------- A non-text attachment was scrubbed... Name: PGP.sig Type: application/pgp-signature Size: 186 bytes Desc: This is a digitally signed message part Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050307/14136e2b/PGP.bin From barry.moore at genetics.utah.edu Mon Mar 7 18:03:29 2005 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Mon Mar 7 17:58:18 2005 Subject: [Bioperl-l] need some help about pqs In-Reply-To: <3481.172.28.124.134.1110178851.squirrel@nwebmail.iitk.ac.in> References: <3481.172.28.124.134.1110178851.squirrel@nwebmail.iitk.ac.in> Message-ID: <422CDDC1.9090701@genetics.utah.edu> Ashish- I'll take a stab at this. If I understand correctly you want to get about pdb files for 6,000 proteins. There is a perl script on the PDB website that will download PDB, mmCIF, structure factors, NMR restraints files from the database. You can find that here: ftp://ftp.rcsb.org/pub/pdb/software/getPdbStructures.pl Barry ----------------------------- Dear sir, Im student of Bioinformatics ,sir im sending u ,a problem which im facing at this time , in PQS, it is attached with this mail, the problem is .. 1 . i have 6000 proteins which i selected for my research ( see attachment as list_id ), in first case i run it on pqs page of pdb id ,which give me out put in 2. for mate & then on going to 3ed step it will give me result in .mol file , which was i needed . It is all correct , but it is good for 100 or 200 proteins , it can be done manually , but for more than 6000 proteins it is ,tedious job , so , can u help me to do this job by any other method other than manually , or is their any script for downloading all these files. waiting for reply .. thanks.. ---------------------------------------------------- Ashish Kumar Jaiswal MScBioinformatics c/o Dr. Balaji Prakash Structural Biology Lab, Department of Biological Sciences and Bioengineering, Indian Institute of Technology, Kanpur, UP-208016, INDIA Ph: +91-512-2594024 FAX: +91-512-2594010 Email: jaiswal@iitk.ac.in ---------------------------------------------------- From heikki at ebi.ac.uk Tue Mar 8 04:11:19 2005 From: heikki at ebi.ac.uk (Heikki Lehvaslaiho) Date: Tue Mar 8 04:08:26 2005 Subject: [Bioperl-l] Extraction from UniProt flatfile In-Reply-To: <1110205497.422c64392756e@sms.ed.ac.uk> References: <1110205497.422c64392756e@sms.ed.ac.uk> Message-ID: <200503080911.20036.heikki@ebi.ac.uk> Take a look at the BioSQL project. There is a cvs repository called bioperl-db. It contains the script load_seqdatabase.pl, that does what you need. The database schema is in a repository biosql-schema as it is shared among several language projects. -Heikki > I am writing a perl program that will extract data from a UniProt > flatfile so that I can automatically put data into > my PostgreSQL database. I am taking out name, protein ID number, > references etc from the file. > > Does anyone know if there is a script available to do this already? > > Many thanks, > > Stephen > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki at_ebi _ac _uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambridge, CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From sdavis2 at mail.nih.gov Tue Mar 8 05:57:38 2005 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Tue Mar 8 05:52:21 2005 Subject: [Bioperl-l] Extraction from UniProt flatfile In-Reply-To: <200503080911.20036.heikki@ebi.ac.uk> References: <1110205497.422c64392756e@sms.ed.ac.uk> <200503080911.20036.heikki@ebi.ac.uk> Message-ID: Stephen, There is another alternative that may meet your needs. The folks at the UCSC genome browser maintain a relationalized version of uniprot (i.e., a MySQL database) here: http://hgdownload.cse.ucsc.edu/goldenPath/uniProt/database/ that is available for download. You can connect directly to it (for SQL queries) via their genome mysql server (open to the public). Connection information is: host: genome-mysql.cse.ucsc.edu User: genome password: There isn't one (leave it blank) Hope this helps. Sean On Mar 8, 2005, at 4:11 AM, Heikki Lehvaslaiho wrote: > Take a look at the BioSQL project. There is a cvs repository called > bioperl-db. It contains the script load_seqdatabase.pl, that does what > you > need. The database schema is in a repository biosql-schema as it is > shared > among several language projects. > > -Heikki > > > >> I am writing a perl program that will extract data from a UniProt >> flatfile so that I can automatically put data into >> my PostgreSQL database. I am taking out name, protein ID number, >> references etc from the file. >> >> Does anyone know if there is a script available to do this already? >> >> Many thanks, >> >> Stephen >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- > ______ _/ _/_____________________________________________________ > _/ _/ http://www.ebi.ac.uk/mutations/ > _/ _/ _/ Heikki Lehvaslaiho heikki at_ebi _ac _uk > _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute > _/ _/ _/ Wellcome Trust Genome Campus, Hinxton > _/ _/ _/ Cambridge, CB10 1SD, United Kingdom > _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 > ___ _/_/_/_/_/________________________________________________________ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From mail2doreen at gmx.de Tue Mar 8 06:15:59 2005 From: mail2doreen at gmx.de (mail2doreen@gmx.de) Date: Tue Mar 8 06:14:00 2005 Subject: [Bioperl-l] (no subject) Message-ID: <3768.1110280559@www63.gmx.net> Hello all, i need to train a gene prediction programm and for that i have to convert a multi gff file into genbank format! Does anyone know a programm to use for that? Many thanks -- SMS bei wichtigen e-mails und Ihre Gedanken sind frei ... Alle Infos zur SMS-Benachrichtigung: http://www.gmx.net/de/go/sms From jason.stajich at duke.edu Tue Mar 8 08:21:07 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Tue Mar 8 08:16:49 2005 Subject: [Bioperl-l] (no subject) In-Reply-To: <3768.1110280559@www63.gmx.net> References: <3768.1110280559@www63.gmx.net> Message-ID: get the features with Bio::Tools::GFF read in the sequence with Bio::SeqIO call $seq->add_SeqFeature will all the features you got from Bio::Tools::GFF write out the sequence with its new features with Bio::SeqIO using the 'genbank' format. -jason -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ On Mar 8, 2005, at 6:15 AM, mail2doreen@gmx.de wrote: > Hello all, > > i need to train a gene prediction programm and for that i have to > convert > a multi gff file into genbank format! > Does anyone know a programm to use for that? > > Many thanks > > -- > SMS bei wichtigen e-mails und Ihre Gedanken sind frei ... > Alle Infos zur SMS-Benachrichtigung: http://www.gmx.net/de/go/sms > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -------------- next part -------------- A non-text attachment was scrubbed... Name: PGP.sig Type: application/pgp-signature Size: 186 bytes Desc: This is a digitally signed message part Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050308/5094cb13/PGP.bin From akarger at CGR.Harvard.edu Tue Mar 8 12:12:26 2005 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Tue Mar 8 12:06:22 2005 Subject: [Bioperl-l] Request for advice and pointers on a project to help biologists d o simple formatting and analysis Message-ID: <339D68B133EAD311971E009027DC4797022FD7BD@montecarlo.cgr.harvard.edu> Hi. I've gotten the impression - in my short time in bioinformatics - that biologists get very frustrated with data formatting and analysis tasks. Which is too bad, because many of these tasks are trivial for someone with a bit of Perl knowledge. Then again, we can't force them to learn Perl, even if it would be For Their Own Good. I was thinking it would be useful to have a toolkit of outrageously simple Perl one-liners. Here's one: # Merge two lists, removing duplicates (logical OR) perl -ne '$seen{$_}++; END {print keys %seen}' file1 file2 > outfile A biologist (call her Sue) would look through a website containing a bunch of (searchable, categorized, etc.) scripts, cut & paste the Perl into Unix (from a website), then backspace over the filenames and type in their own filenames, and end up with something like this on the command line: myhost>perl -ne '$seen{$_}++; END {print keys %seen}' genes1 genes2 > all_genes The biologist hits return & voil?! Instant data munging! Of course, I'm not the first one to identify this problem or try to solve it. But I think I'm working on a slightly different problem than previous solutions, and my (complete lack of) interface is different too. Here's the "prior art" I've seen in this area, compared and contrasted with my idea. - EMBOSS et al.: solving harder bioinformatics problems; Interface is Unix executables - Bioperl's bioscripts: harder problems; Perl executables - Taverna / myGrid: fancy GUI interface (but I do think of my scripts as "shims") I'm really aiming for the lowest of low-hanging fruit here. I don't want scripts that run Blast or do fancy analysis. Rather, we'll have scripts like the above to merge lists, or get the standard deviation of column 7 of tabular data, or get the GenBank IDs of the top 10 hits from a BLAST output, or whatever. These are all tasks that're trivial in (Bio)Perl - and some you can even do in Excel - but most biologists won't know either Perl or fancy Excel. Think of it as pipelining software for your vterm100. Why one-liners? - really, really fast development of new tools (especially compared with GUI tools) - no installation necessary, no dependencies (except Perl) - no download necessary; just cut and paste a tool from the web page - biologist doesn't need to learn an interface - if a biologist learns just a bit of Perl, they can tweak the one-liners: much easier than writing from scratch, but makes tools much more flexible - take advantage of existing tools' APIs: perl -MBio::Perl -e '...' Potential problems: - psychological barrier to using command line (I figure I'll aim first at the Unix-aware subset of biologists first, and leave complete World Domination to Phase 2.) - we can't fit error-handling into one-liners. Caveat scriptor So my questions for you bioperlers (finally!): - Are there other projects that have tried to solve this niche of problems i.e., allowing biologists to do simple formatting & analysis of biological or tabular data? - Are there at least discussions of this issue that I could read somewhere for ideas (e.g., bioperl-l archive)? - Does anyone have any free advice (positive or negative or both) to offer for this project? - Are there any other lists I should post these questions to? The working name for my toolbox of bio scripts is "Scriptome". If it ever gets off the ground (and anyone cares), I'll post more info about it, along with a request for more advice, I'm sure. Thanks, -Amir Karger akarger@cgr.harvard.edu From skirov at utk.edu Tue Mar 8 13:20:04 2005 From: skirov at utk.edu (Stefan Kirov) Date: Tue Mar 8 13:16:03 2005 Subject: [Bioperl-l] Request for advice and pointers on a project to help biologists d o simple formatting and analysis In-Reply-To: <339D68B133EAD311971E009027DC4797022FD7BD@montecarlo.cgr.harvard.edu> References: <339D68B133EAD311971E009027DC4797022FD7BD@montecarlo.cgr.harvard.edu> Message-ID: <422DECD4.5080406@utk.edu> I like a lot this idea. First my answer to your first 2 questions: no, no. But I bet may biologists would scream in pain just hearing the word console (as you mentioned). So I offer 0 step (bait to learn a little UNIX). Imagine a simple web form that is hooked to the perl interpreter (might be tricky from a security point, still it could be restricted in several ways) and does (amazingly) what the biologist types in. This would have to include file uploads/downloads as well. Of course the capabilities will be quite restricted, but the appetite comes with eating as some say and suddenly the console might be not a bad idea (thus Mac shares would go up :-) ). Amir Karger wrote: >Hi. > >I've gotten the impression - in my short time in bioinformatics - that >biologists get very frustrated with data formatting and analysis tasks. >Which is too bad, because many of these tasks are trivial for someone with a >bit of Perl knowledge. Then again, we can't force them to learn Perl, even >if it would be For Their Own Good. > >I was thinking it would be useful to have a toolkit of outrageously simple >Perl one-liners. Here's one: > > # Merge two lists, removing duplicates (logical OR) > perl -ne '$seen{$_}++; END {print keys %seen}' file1 file2 > outfile > >A biologist (call her Sue) would look through a website containing a bunch >of (searchable, categorized, etc.) scripts, cut & paste the Perl into Unix >(from a website), then backspace over the filenames and type in their own >filenames, and end up with something like this on the command line: > >myhost>perl -ne '$seen{$_}++; END {print keys %seen}' genes1 genes2 > >all_genes > >The biologist hits return & voil?! Instant data munging! > >Of course, I'm not the first one to identify this problem or try to solve >it. But I think I'm working on a slightly different problem than previous >solutions, and my (complete lack of) interface is different too. Here's the >"prior art" I've seen in this area, compared and contrasted with my idea. >- EMBOSS et al.: solving harder bioinformatics problems; Interface is Unix >executables >- Bioperl's bioscripts: harder problems; Perl executables >- Taverna / myGrid: fancy GUI interface (but I do think of my scripts as >"shims") > >I'm really aiming for the lowest of low-hanging fruit here. I don't want >scripts that run Blast or do fancy analysis. Rather, we'll have scripts like >the above to merge lists, or get the standard deviation of column 7 of >tabular data, or get the GenBank IDs of the top 10 hits from a BLAST output, >or whatever. These are all tasks that're trivial in (Bio)Perl - and some you >can even do in Excel - but most biologists won't know either Perl or fancy >Excel. Think of it as pipelining software for your vterm100. > >Why one-liners? >- really, really fast development of new tools (especially compared with GUI >tools) >- no installation necessary, no dependencies (except Perl) >- no download necessary; just cut and paste a tool from the web page >- biologist doesn't need to learn an interface >- if a biologist learns just a bit of Perl, they can tweak the one-liners: >much easier than writing from scratch, but makes tools much more flexible >- take advantage of existing tools' APIs: perl -MBio::Perl -e '...' > >Potential problems: >- psychological barrier to using command line (I figure I'll aim first at >the Unix-aware subset of biologists first, and leave complete World >Domination to Phase 2.) >- we can't fit error-handling into one-liners. Caveat scriptor > >So my questions for you bioperlers (finally!): >- Are there other projects that have tried to solve this niche of problems >i.e., allowing biologists to do simple formatting & analysis of biological >or tabular data? >- Are there at least discussions of this issue that I could read somewhere >for ideas (e.g., bioperl-l archive)? >- Does anyone have any free advice (positive or negative or both) to offer >for this project? >- Are there any other lists I should post these questions to? > >The working name for my toolbox of bio scripts is "Scriptome". If it ever >gets off the ground (and anyone cares), I'll post more info about it, along >with a request for more advice, I'm sure. > >Thanks, >-Amir Karger >akarger@cgr.harvard.edu > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Stefan Kirov, Ph.D. University of Tennessee/Oak Ridge National Laboratory 5700 bldg, PO BOX 2008 MS6164 Oak Ridge TN 37831-6164 USA tel +865 576 5120 fax +865-576-5332 e-mail: skirov@utk.edu sao@ornl.gov "And the wars go on with brainwashed pride For the love of God and our human rights And all these things are swept aside" From akarger at CGR.Harvard.edu Tue Mar 8 14:05:26 2005 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Tue Mar 8 13:58:42 2005 Subject: [Bioperl-l] Request for advice and pointers on a project to h elp biologists d o simple formatting and analysis Message-ID: <339D68B133EAD311971E009027DC4797022FD7BF@montecarlo.cgr.harvard.edu> > From: Stefan Kirov [mailto:skirov@utk.edu] > > I like a lot this idea. > First my answer to your first 2 questions: no, no. > But I bet may biologists would scream in pain just hearing the word > console (as you mentioned). You make an excellent point. There are a number of avenues we've thought of for making this tool more accessible to non-UNIX folks. However, each one of them requires some extra work in planning issues like security, command paths, accessing input files, shell variables, etc. Because we've already got a bunch of users here who need to use UNIX to use our computing cluster, I figured I could have a prototype that only works from the UNIX command line. If it works well on those folks, we can think about extending things later (with the caveat that I want to be careful to keep the interface VERY lightweight, because I don't trust myself to build a portable, "intuitive" GUI.) > So I offer 0 step (bait to learn > a little UNIX). > Imagine a simple web form that is hooked to the perl > interpreter (might > be tricky from a security point, still it could be restricted > in several > ways) and does (amazingly) what the biologist types in. This > would have > to include file uploads/downloads as well. Of course the capabilities > will be quite restricted, but the appetite comes with eating > as some say > and suddenly the console might be not a bad idea (thus Mac > shares would > go up :-) ). -Amir > Amir Karger wrote: > > > > >I was thinking it would be useful to have a toolkit of > outrageously simple > >Perl one-liners. > > > From harris at cshl.edu Tue Mar 8 13:29:17 2005 From: harris at cshl.edu (Todd Harris) Date: Tue Mar 8 17:34:45 2005 Subject: [Bioperl-l] Request for advice and pointers on a project to help biologists d o simple formatting and analysis In-Reply-To: <422DECD4.5080406@utk.edu> Message-ID: Hi Amir - I like this idea. You could also have the scripts process @ARGV so no hand-editing would be necessary. You might even just make the scripts executable droplets which would be even easier to use. Todd > On 3/8/05 11:20 AM, Stefan Kirov wrote: > I like a lot this idea. > First my answer to your first 2 questions: no, no. > But I bet may biologists would scream in pain just hearing the word > console (as you mentioned). So I offer 0 step (bait to learn a little UNIX). > Imagine a simple web form that is hooked to the perl interpreter (might > be tricky from a security point, still it could be restricted in several > ways) and does (amazingly) what the biologist types in. This would have > to include file uploads/downloads as well. Of course the capabilities > will be quite restricted, but the appetite comes with eating as some say > and suddenly the console might be not a bad idea (thus Mac shares would > go up :-) ). > > Amir Karger wrote: > >> Hi. >> >> I've gotten the impression - in my short time in bioinformatics - that >> biologists get very frustrated with data formatting and analysis tasks. >> Which is too bad, because many of these tasks are trivial for someone with a >> bit of Perl knowledge. Then again, we can't force them to learn Perl, even >> if it would be For Their Own Good. >> >> I was thinking it would be useful to have a toolkit of outrageously simple >> Perl one-liners. Here's one: >> >> # Merge two lists, removing duplicates (logical OR) >> perl -ne '$seen{$_}++; END {print keys %seen}' file1 file2 > outfile >> >> A biologist (call her Sue) would look through a website containing a bunch >> of (searchable, categorized, etc.) scripts, cut & paste the Perl into Unix >> (from a website), then backspace over the filenames and type in their own >> filenames, and end up with something like this on the command line: >> >> myhost>perl -ne '$seen{$_}++; END {print keys %seen}' genes1 genes2 > >> all_genes >> >> The biologist hits return & voil?! Instant data munging! >> >> Of course, I'm not the first one to identify this problem or try to solve >> it. But I think I'm working on a slightly different problem than previous >> solutions, and my (complete lack of) interface is different too. Here's the >> "prior art" I've seen in this area, compared and contrasted with my idea From nelsonrt at iastate.edu Tue Mar 8 18:36:57 2005 From: nelsonrt at iastate.edu (Rex Nelson) Date: Tue Mar 8 18:33:53 2005 Subject: [Bioperl-l] Request for advice and pointers on a project to help biologists d o simple formatting and analysis In-Reply-To: References: Message-ID: Todd and Amir: If you are running OS X there is a program Platypus which makes applications which would be quite suitable for simple perl/shell scripts. The output and input options are a little limited but to do defined little jobs it would work. It allows you to put scripts inside an OS X application with or without drag and drop ability. I don't know about "does (amazingly) what the biologist types in" but it would do defined jobs by clicking or drag-n-drop. Rex >Hi Amir - > >I like this idea. You could also have the scripts process @ARGV so no >hand-editing would be necessary. You might even just make the scripts >executable droplets which would be even easier to use. > >Todd > >> On 3/8/05 11:20 AM, Stefan Kirov wrote: > >> I like a lot this idea. >> First my answer to your first 2 questions: no, no. >> But I bet may biologists would scream in pain just hearing the word >> console (as you mentioned). So I offer 0 step (bait to learn a little UNIX). >> Imagine a simple web form that is hooked to the perl interpreter (might >> be tricky from a security point, still it could be restricted in several >> ways) and does (amazingly) what the biologist types in. This would have >> to include file uploads/downloads as well. Of course the capabilities >> will be quite restricted, but the appetite comes with eating as some say >> and suddenly the console might be not a bad idea (thus Mac shares would >> go up :-) ). >> >> Amir Karger wrote: >> >>> Hi. >>> >>> I've gotten the impression - in my short time in bioinformatics - that >>> biologists get very frustrated with data formatting and analysis tasks. >>> Which is too bad, because many of these tasks >>>are trivial for someone with a >>> bit of Perl knowledge. Then again, we can't force them to learn Perl, even >>> if it would be For Their Own Good. >>> >>> I was thinking it would be useful to have a toolkit of outrageously simple >>> Perl one-liners. Here's one: >>> >>> # Merge two lists, removing duplicates (logical OR) >>> perl -ne '$seen{$_}++; END {print keys %seen}' file1 file2 > outfile >>> >>> A biologist (call her Sue) would look through a website containing a bunch >>> of (searchable, categorized, etc.) scripts, cut & paste the Perl into Unix >>> (from a website), then backspace over the filenames and type in their own >>> filenames, and end up with something like this on the command line: >>> >>> myhost>perl -ne '$seen{$_}++; END {print keys %seen}' genes1 genes2 > >>> all_genes >>> >>> The biologist hits return & voil?! Instant data munging! >>> >>> Of course, I'm not the first one to identify this problem or try to solve >>> it. But I think I'm working on a slightly different problem than previous >>> solutions, and my (complete lack of) >>>interface is different too. Here's the > >> "prior art" I've seen in this area, compared and contrasted with my idea -- Rex Nelson Ph.D. Postdoctoral Scientist nelsonrt@iastate.edu (515) 294-1297 ~~~_/) ~~~ From daniel.lang at biologie.uni-freiburg.de Wed Mar 9 05:20:13 2005 From: daniel.lang at biologie.uni-freiburg.de (Daniel Lang) Date: Wed Mar 9 05:16:26 2005 Subject: [Bioperl-l] strange error after changing to RC1.5 Message-ID: <422ECDDD.40404@biologie.uni-freiburg.de> Hi, I?m retrieving seq objects from a local biosql db (using the latest cvs verion of bioperl-db) and e.g. writing them with SeqIO. After changing from a cvs version ~ 12/04 to RC1.5 or latest cvs version, I get the following error: Operation `ne': no method found,!!left argument in overloaded package Bio::Annotation::Reference,!!right argument has no overloaded magic at /usr/lib/perl5/site_perl/5.6.1/Bio/DB/Persistent/PersistentObject.pm line 534, line 1.! The module PersistentObject.pm hasn?t changed and in Reference.pm there is only this change: diff bioperl-live-Dec04/Bio/Annotation/Reference.pm bioperl-live/Bio/Annotation/Reference.pm 1c1 < # $Id: Reference.pm,v 1.21 2004/08/19 20:13:32 lapp Exp $ --- > # $Id: Reference.pm,v 1.22 2005/02/02 22:13:22 allenday Exp $ 56c56,57 < # use overload '""' => \&as_text; --- > use overload '""' => sub { $_[0]->title || ''}; > use overload 'eq' => sub { "$_[0]" eq "$_[1]" }; I?ve reversed this, but no positive result - the error remains... Any hints? Thanks in advance, Daniel From chad at dieselwurks.com Tue Mar 8 22:15:07 2005 From: chad at dieselwurks.com (Chad Matsalla) Date: Wed Mar 9 13:18:58 2005 Subject: [Bioperl-l] Aggressive aggregation? Message-ID: Subject: Aggressive Aggregators Greetings all, I'm looking for help in presenting Blast hits in GBrowse. I blasted Brassica EST sequences against the Arabidopsis pseudochromosome assemblies in order to store them in a Bio::DB::GFF database. I used a tool based bp_search2gff.pl to `convert' blast reports into gff. A sample of that gff is below[1]. My problem is partly based on a peculiarity of Blast and partly based on the behavior of the aggregators in GBrowse and I'm wondering if someone else has seen this. Arabidopsis has five chromosomes. In order to get the coordinates necessary to place ESTs on the chromosomes I created a blast database containing 5 query sequences - chr1, chr2, chr3, chr4, chr5. My problem presents itself when an EST hits at more than once place on a Chromosome. Let us say that on chr1 there is a cluster of HSPs for the est chad1 at position 1000, a second cluster at position 10,000 and a third cluster at 50,000. Blast will indicate a SINGLE hit on chr1. SO, I manually find clusters of HSPs and create GFF that resembles that below[1]. Yes I know that wublast has an option to prevent that behavior. The problem is that the `match' aggregator joins all of the `matches' together. I understand that it's because all of the matches have the same Target - that's necessary to have the proper sequence appear while viewing base-base alignments. HSPs: <--> <--> <--> <--> <--> <--> matches: <--------------> <--------------> What I get : <-->--<-->--<-->-----------------<-->--<-->--<--> What I want: <-->--<-->--<--> <-->--<-->--<--> How do I get what I want? In my gbrowse.conf I tried the standard `match' aggregator and a custom aggregator: csmmatch{csmhsp/csmmatch} Chad Matsalla [1] chr1 aafcest HSP 1 75 . + . Target "Sequence:chad1" 1 75 chr1 aafcest HSP 100 150 . + . Target "Sequence:chad1" 100 150 chr1 aafcest match 1 150 . + . Target "Sequence:chad1" 1 150 chr1 aafcest HSP 200 275 . - . Target "Sequence:chad1" 200 275 chr1 aafcest HSP 300 450 . - . Target "Sequence:chad1" 300 450 chr1 aafcest match 200 450 . - . Target "Sequence:chad1" 200 450 From s0460205 at sms.ed.ac.uk Wed Mar 9 09:24:05 2005 From: s0460205 at sms.ed.ac.uk (SG Edwards) Date: Wed Mar 9 13:19:07 2005 Subject: [Bioperl-l] uniprot flatfile extraction Message-ID: <1110378245.422f07057f49c@sms.ed.ac.uk> Hi, sorry if this is basic but I've read the documentation and am still confused!! I wish to extract uniprot flatfile data into my database. I want to get the following variables: Protein ID, length, description, molecular weight, sequence, comments, cross references, disulphide bonds, species, entered date, last modified, last annotated, protein synonyms. I know that I can get some of these (e.g. protein ID, length) using Bioperl but can I get all of the data also or am I better writing my own from scratch? Thanks From jason.stajich at duke.edu Wed Mar 9 13:42:47 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Wed Mar 9 13:37:47 2005 Subject: [Bioperl-l] Aggressive aggregation? In-Reply-To: References: Message-ID: <9fed82865c1265db7eedb183112cd228@duke.edu> So personally, I wouldn't use default BLASTN. I'd use WU-BLAST with the -links option (this has worked well for mapping Brassica ESTs to Arabidopsis in my experience). Then you can parse the BLAST (writing your own slightly customized version of search2gff which looks at the $hsp->link option to group things. I just lectured on this today in fact: http://people.genome.duke.edu/~jes12/BGT203.2005/projects/ find_duplicates/scripts/draw_hits.pl http://people.genome.duke.edu/~jes12/BGT203.2005/projects/ find_duplicates/scripts/draw_hits_perlink.pl Or if you are willing to have a little more overhead - exonerate (http://www.ebi.ac.uk/~guy/exonerate/) with the est2genome model which will try and splice the EST onto the genome for you as well. You can dump out GFF directly which needs to be massaged a little before loading into Bio::DB::GFF. -jason -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ On Mar 8, 2005, at 10:15 PM, Chad Matsalla wrote: > > > Subject: Aggressive Aggregators > > Greetings all, > > I'm looking for help in presenting Blast hits in GBrowse. > > I blasted Brassica EST sequences against the Arabidopsis > pseudochromosome assemblies in order to store them in a Bio::DB::GFF > database. I used a tool based bp_search2gff.pl to `convert' blast > reports into gff. A sample of that gff is below[1]. > > My problem is partly based on a peculiarity of Blast and partly based > on > the behavior of the aggregators in GBrowse and I'm wondering if someone > else has seen this. > > Arabidopsis has five chromosomes. In order to get the coordinates > necessary to place ESTs on the chromosomes I created a blast database > containing 5 query sequences - chr1, chr2, chr3, chr4, chr5. > > My problem presents itself when an EST hits at more than once place on > a > Chromosome. Let us say that on chr1 there is a cluster of HSPs for the > est chad1 at position 1000, a second cluster at position 10,000 and a > third cluster at 50,000. Blast will indicate a SINGLE hit on chr1. > > SO, I manually find clusters of HSPs and create GFF that resembles that > below[1]. Yes I know that wublast has an option to prevent that > behavior. > > The problem is that the `match' aggregator joins all of the `matches' > together. I understand that it's because all of the matches have the > same Target - that's necessary to have the proper sequence appear while > viewing base-base alignments. > > HSPs: <--> <--> <--> <--> <--> <--> > matches: <--------------> <--------------> > > What I get : <-->--<-->--<-->-----------------<-->--<-->--<--> > What I want: <-->--<-->--<--> <-->--<-->--<--> > > How do I get what I want? In my gbrowse.conf I tried the standard > `match' aggregator and a custom aggregator: csmmatch{csmhsp/csmmatch} > > > Chad Matsalla > > > [1] > chr1 aafcest HSP 1 75 . + . Target > "Sequence:chad1" 1 75 > chr1 aafcest HSP 100 150 . + . Target > "Sequence:chad1" 100 150 > chr1 aafcest match 1 150 . + . Target > "Sequence:chad1" 1 150 > > chr1 aafcest HSP 200 275 . - . Target > "Sequence:chad1" 200 275 > chr1 aafcest HSP 300 450 . - . Target > "Sequence:chad1" 300 450 > chr1 aafcest match 200 450 . - . Target > "Sequence:chad1" 200 450 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -------------- next part -------------- A non-text attachment was scrubbed... Name: PGP.sig Type: application/pgp-signature Size: 186 bytes Desc: This is a digitally signed message part Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050309/e805c467/PGP.bin From akarger at CGR.Harvard.edu Wed Mar 9 13:46:17 2005 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Wed Mar 9 13:39:37 2005 Subject: [Bioperl-l] Request for advice and pointers on a project to help biologists d o simple formatting and analysis Message-ID: <339D68B133EAD311971E009027DC4797022FD7CB@montecarlo.cgr.harvard.edu> In a private mail, Richard Copley wrote: >Amir Karger wrote: >> I was thinking it would be useful to have a toolkit of outrageously simple >> Perl one-liners. Here's one: >> >> # Merge two lists, removing duplicates (logical OR) >> perl -ne '$seen{$_}++; END {print keys %seen}' file1 file2 > outfile > >sort -u file1 file2 I know that many of the tasks proposed for the Scriptome can be done with grep, sed, cut, Word, or Excel. I'm planning on implementing head, sort, join, and lots of others. But how many experimental biologists are familiar with Unix cut? How many bother to learn even the least fancy Excel functions? I think not many, because they have other things to worry about. One reason so many people have created integrated toolboxes is so that biologists only need to learn how to use one tool, rather than learning 30 or whatever Unix commands. The goal of Scriptome is that they only need to learn one tool AND that the learning curve for that tool is very small. And we make the learning curve small by using an extremely lightweight interface (most of solving a problem involves searching on a website) rather than by trying to create an intuitive GUI. After all, how many folks other than Apple have created GUIs that are intuitive for more than a small subset of people? -Amir From amackey at pcbi.upenn.edu Wed Mar 9 14:00:40 2005 From: amackey at pcbi.upenn.edu (Aaron J. Mackey) Date: Wed Mar 9 13:55:20 2005 Subject: [Bioperl-l] Aggressive aggregation? In-Reply-To: References: Message-ID: <3503c6582ad58219fe9c590fe09a0f46@pcbi.upenn.edu> > My problem is partly based on a peculiarity of Blast and partly based > on > the behavior of the aggregators in GBrowse and I'm wondering if someone > else has seen this. Welcome to the party ;) > My problem presents itself when an EST hits at more than once place on > a > Chromosome. Besides Jason's recommendation to use a splicing-aware tool (exonerate is one, but Spidey is also a good one, and is based on BLASTN already), you have another issue which is that your GFF Target's need to be uniquely named. This is a well-known drawback of GFF prior to GFF3, and a continuing issue with GBrowse when using the current Bio::DB:GFF (which is not yet GFF3-savvy). > chr1 aafcest HSP 1 75 . + . Target > "Sequence:chad1" 1 75 > chr1 aafcest HSP 100 150 . + . Target > "Sequence:chad1" 100 150 > chr1 aafcest match 1 150 . + . Target > "Sequence:chad1" 1 150 > > chr1 aafcest HSP 200 275 . - . Target > "Sequence:chad1" 200 275 > chr1 aafcest HSP 300 450 . - . Target > "Sequence:chad1" 300 450 > chr1 aafcest match 200 450 . - . Target > "Sequence:chad1" 200 450 These need to be Target "Sequence:chad1-1" and "Sequence:chad1-2" or some such. This also means that if you're saving the ESTs in the database (for sequence alignment display), you'll have to save them redundantly under chad1-1, chad1-2, etc. The same problem arises with BLASTX searches again protein databases. Now, you could write a custom aggregator that de-aggregated multiple chad1 "match" features, assigning the contained HSPs to each, but there is no such "default" behavior. Let me know if there's general interest for this ... Anxiously awaiting GFF3-support in Bio::DB::GFF, -Aaron -- Aaron J. Mackey, Ph.D. Dept. of Biology, Goddard 212 University of Pennsylvania email: amackey@pcbi.upenn.edu 415 S. University Avenue office: 215-898-1205 Philadelphia, PA 19104-6017 fax: 215-746-6697 From amackey at pcbi.upenn.edu Wed Mar 9 14:04:12 2005 From: amackey at pcbi.upenn.edu (Aaron J. Mackey) Date: Wed Mar 9 13:58:49 2005 Subject: [Bioperl-l] Aggressive aggregation? In-Reply-To: <9fed82865c1265db7eedb183112cd228@duke.edu> References: <9fed82865c1265db7eedb183112cd228@duke.edu> Message-ID: I also recommend -span1 to ensure consistent HSP ordering and orientation ... FYI, DPS (part of AAT package) is another good option for fast intron-savvy alignments. -Aaron On Mar 9, 2005, at 1:42 PM, Jason Stajich wrote: > I'd use WU-BLAST with the -links option -- Aaron J. Mackey, Ph.D. Dept. of Biology, Goddard 212 University of Pennsylvania email: amackey@pcbi.upenn.edu 415 S. University Avenue office: 215-898-1205 Philadelphia, PA 19104-6017 fax: 215-746-6697 From echuong at gmail.com Wed Mar 9 17:23:56 2005 From: echuong at gmail.com (Edward Chuong) Date: Wed Mar 9 17:18:56 2005 Subject: [Bioperl-l] PAML nssites model result object Message-ID: <244d2e0e050309142370997ce4@mail.gmail.com> Hi all, I'm trying to parse PAML results, and running into some trouble. I'm using branch specific omega model, and I want to get the branch specific ka/ks values out. http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/Tools/Phylo/PAML.pm?rev=HEAD&cvsroot=bioperl&content-type=text/vnd.viewcvs-markup says that $node->param('omega') should work, but Data::Dumper shows that this value isn't stored in the node (only branch lengths and seq IDs appear to be stored). I'm assuming that I can get these values out of the get_NSSite_result() Bio::Tools::Phylo::PAML::ModelResult object, but I'm not sure how to call it. The current synopsis uses "get_model_params" but it seems to be out of date because it's not in the current souce. The docs at http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/Tools/Phylo/PAML/Result.pm?rev=HEAD&cvsroot=bioperl&content-type=text/vnd.viewcvs-markup say to use my @results = @{$self->get_NSSite_results}; --that looks like a mistake, and I've tried @result = $result->get_NSSite_results but that doesn't work either (just get undefined objs). Am I doing something wrong, or is this functionality still being worked on? I've tried using both 1.4 and the LIVE versions. Any help is appreciated, thanks! -Ed From lstein at cshl.edu Wed Mar 9 15:47:29 2005 From: lstein at cshl.edu (Lincoln Stein) Date: Wed Mar 9 17:36:20 2005 Subject: [Bioperl-l] Aggressive aggregation? In-Reply-To: References: Message-ID: <200503091547.29838.lstein@cshl.edu> Each of the multiple hits should have its own unique target name. You can do this by appending a .01, .02, etc to the end of the Target name. Lincoln On Tuesday 08 March 2005 10:15 pm, Chad Matsalla wrote: > Subject: Aggressive Aggregators > > Greetings all, > > I'm looking for help in presenting Blast hits in GBrowse. > > I blasted Brassica EST sequences against the Arabidopsis > pseudochromosome assemblies in order to store them in a > Bio::DB::GFF database. I used a tool based bp_search2gff.pl to > `convert' blast reports into gff. A sample of that gff is below[1]. > > My problem is partly based on a peculiarity of Blast and partly > based on the behavior of the aggregators in GBrowse and I'm > wondering if someone else has seen this. > > Arabidopsis has five chromosomes. In order to get the coordinates > necessary to place ESTs on the chromosomes I created a blast > database containing 5 query sequences - chr1, chr2, chr3, chr4, > chr5. > > My problem presents itself when an EST hits at more than once place > on a Chromosome. Let us say that on chr1 there is a cluster of > HSPs for the est chad1 at position 1000, a second cluster at > position 10,000 and a third cluster at 50,000. Blast will indicate > a SINGLE hit on chr1. > > SO, I manually find clusters of HSPs and create GFF that resembles > that below[1]. Yes I know that wublast has an option to prevent > that behavior. > > The problem is that the `match' aggregator joins all of the > `matches' together. I understand that it's because all of the > matches have the same Target - that's necessary to have the proper > sequence appear while viewing base-base alignments. > > HSPs: <--> <--> <--> <--> <--> <--> > matches: <--------------> <--------------> > > What I get : <-->--<-->--<-->-----------------<-->--<-->--<--> > What I want: <-->--<-->--<--> <-->--<-->--<--> > > How do I get what I want? In my gbrowse.conf I tried the standard > `match' aggregator and a custom aggregator: > csmmatch{csmhsp/csmmatch} > > > Chad Matsalla > > > [1] > chr1 aafcest HSP 1 75 . + . Target > "Sequence:chad1" 1 75 chr1 aafcest HSP 100 150 . + > . Target "Sequence:chad1" 100 150 chr1 aafcest match 1 > 150 . + . Target "Sequence:chad1" 1 150 > > chr1 aafcest HSP 200 275 . - . Target > "Sequence:chad1" 200 275 chr1 aafcest HSP 300 450 . - > . Target "Sequence:chad1" 300 450 chr1 aafcest match > 200 450 . - . Target "Sequence:chad1" 200 450 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 NOTE: Please copy Sandra Michelsen on all emails regarding scheduling and other time-critical topics. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050309/5b4ea539/attachment.bin From lstein at cshl.edu Wed Mar 9 15:47:29 2005 From: lstein at cshl.edu (Lincoln Stein) Date: Wed Mar 9 17:36:22 2005 Subject: [Bioperl-l] Aggressive aggregation? In-Reply-To: References: Message-ID: <200503091547.29838.lstein@cshl.edu> Each of the multiple hits should have its own unique target name. You can do this by appending a .01, .02, etc to the end of the Target name. Lincoln On Tuesday 08 March 2005 10:15 pm, Chad Matsalla wrote: > Subject: Aggressive Aggregators > > Greetings all, > > I'm looking for help in presenting Blast hits in GBrowse. > > I blasted Brassica EST sequences against the Arabidopsis > pseudochromosome assemblies in order to store them in a > Bio::DB::GFF database. I used a tool based bp_search2gff.pl to > `convert' blast reports into gff. A sample of that gff is below[1]. > > My problem is partly based on a peculiarity of Blast and partly > based on the behavior of the aggregators in GBrowse and I'm > wondering if someone else has seen this. > > Arabidopsis has five chromosomes. In order to get the coordinates > necessary to place ESTs on the chromosomes I created a blast > database containing 5 query sequences - chr1, chr2, chr3, chr4, > chr5. > > My problem presents itself when an EST hits at more than once place > on a Chromosome. Let us say that on chr1 there is a cluster of > HSPs for the est chad1 at position 1000, a second cluster at > position 10,000 and a third cluster at 50,000. Blast will indicate > a SINGLE hit on chr1. > > SO, I manually find clusters of HSPs and create GFF that resembles > that below[1]. Yes I know that wublast has an option to prevent > that behavior. > > The problem is that the `match' aggregator joins all of the > `matches' together. I understand that it's because all of the > matches have the same Target - that's necessary to have the proper > sequence appear while viewing base-base alignments. > > HSPs: <--> <--> <--> <--> <--> <--> > matches: <--------------> <--------------> > > What I get : <-->--<-->--<-->-----------------<-->--<-->--<--> > What I want: <-->--<-->--<--> <-->--<-->--<--> > > How do I get what I want? In my gbrowse.conf I tried the standard > `match' aggregator and a custom aggregator: > csmmatch{csmhsp/csmmatch} > > > Chad Matsalla > > > [1] > chr1 aafcest HSP 1 75 . + . Target > "Sequence:chad1" 1 75 chr1 aafcest HSP 100 150 . + > . Target "Sequence:chad1" 100 150 chr1 aafcest match 1 > 150 . + . Target "Sequence:chad1" 1 150 > > chr1 aafcest HSP 200 275 . - . Target > "Sequence:chad1" 200 275 chr1 aafcest HSP 300 450 . - > . Target "Sequence:chad1" 300 450 chr1 aafcest match > 200 450 . - . Target "Sequence:chad1" 200 450 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 NOTE: Please copy Sandra Michelsen on all emails regarding scheduling and other time-critical topics. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050309/5b4ea539/attachment-0001.bin From jason.stajich at duke.edu Wed Mar 9 17:41:24 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Wed Mar 9 17:36:26 2005 Subject: [Bioperl-l] PAML nssites model result object In-Reply-To: <244d2e0e050309142370997ce4@mail.gmail.com> References: <244d2e0e050309142370997ce4@mail.gmail.com> Message-ID: <896034a8342912841a4a0d0a0686353e@duke.edu> Skipped content of type multipart/mixed-------------- next part -------------- A non-text attachment was scrubbed... Name: PGP.sig Type: application/pgp-signature Size: 186 bytes Desc: This is a digitally signed message part Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050309/f71e1689/PGP.bin From jason.stajich at duke.edu Wed Mar 9 18:01:34 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Wed Mar 9 17:56:11 2005 Subject: [Bioperl-l] PAML nssites model result object In-Reply-To: <896034a8342912841a4a0d0a0686353e@duke.edu> References: <244d2e0e050309142370997ce4@mail.gmail.com> <896034a8342912841a4a0d0a0686353e@duke.edu> Message-ID: Resend with code pasted.... #!/usr/bin/perl -w use strict; use Bio::Tools::Phylo::PAML; my $outcodeml = shift(@ARGV); my $paml_parser = new Bio::Tools::Phylo::PAML(-file => "./$outcodeml", -dir => "./"); my $result = $paml_parser->next_result(); my $MLmatrix = $result->get_MLmatrix(); # get MaxLikelihood Matrix my @otus = $result->get_seqs; if( $#{$MLmatrix} < 0 ) { for my $tree ($result->next_tree ) { for my $node ( $tree->get_nodes ) { my $id; if( $node->is_Leaf() ) { $id = $node->id; } else { $id = "(".join(",", map { $_->id } grep { $_->is_Leaf } $node->get_all_Descendents) .")"; } if( ! $node->ancestor || ! $node->has_tag('t') ) { # skip when no values have been associated with this node # (like the root node) next; } # I know this looks complicated # but we use the get_tag_values method to pull out the annotations # for each branch # The ()[0] around the call is because get_tag_values returns a list # if we want to just get the 1st item in the list we have # to tell Perl we are treating it like an array. # in the future get_tag_values needs to be smart and just # return the 1st item in the array if called in scalar # context printf "%s\tt=%.3f\tS=%.1f\tN=%.1f\tdN/ dS=%.4f\tdN=%.4f\tdS=%.4f\tS*dS=%.1f\tN*dN=%.1f\n", $id, map { ($node->get_tag_values($_))[0] } qw(t S N dN/dS dN dS), 'S*dS', 'N*dN'; } } } else { my $i =0; my @seqs = $result->get_seqs; for my $row ( @$MLmatrix ) { print $seqs[$i++]->display_id, join("\t",@$row), "\n"; } } On Mar 9, 2005, at 5:41 PM, Jason Stajich wrote: > I just updated things last week so this is brand-spanking-new. I > don't know if I connected everything up for NSsites stuff quite yet > as that is handled in - the branch-specific parsing should work now. > I don't know if the synopsis code is really up to snuff either. When > I get around to it I will try and see what still needs to be connected > in NSsites parsing. > > I don't think $node->param() is going to work - > $node->get_tag_values() is the way I've implemented it. > > <00parse_codeml.pl> > > -jason > -- > Jason Stajich > jason.stajich at duke.edu > http://www.duke.edu/~jes12/ > > On Mar 9, 2005, at 5:23 PM, Edward Chuong wrote: > >> Hi all, >> >> I'm trying to parse PAML results, and running into some trouble. I'm >> using branch specific omega model, and I want to get the branch >> specific ka/ks values out. >> http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/ >> Tools/Phylo/PAML.pm?rev=HEAD&cvsroot=bioperl&content-type=text/ >> vnd.viewcvs-markup >> says that $node->param('omega') should work, but Data::Dumper shows >> that this value isn't stored in the node (only branch lengths and seq >> IDs appear to be stored). >> >> I'm assuming that I can get these values out of the >> get_NSSite_result() Bio::Tools::Phylo::PAML::ModelResult object, but >> I'm not sure how to call it. The current synopsis uses >> "get_model_params" but it seems to be out of date because it's not in >> the current souce. The docs at >> http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/ >> Tools/Phylo/PAML/Result.pm?rev=HEAD&cvsroot=bioperl&content- >> type=text/vnd.viewcvs-markup >> say to use my >> @results = @{$self->get_NSSite_results}; >> --that looks like a mistake, and I've tried >> @result = $result->get_NSSite_results but that doesn't work either >> (just get undefined objs). >> >> Am I doing something wrong, or is this functionality still being >> worked on? I've tried using both 1.4 and the LIVE versions. Any help >> is appreciated, thanks! >> >> -Ed >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> From echuong at gmail.com Wed Mar 9 18:37:21 2005 From: echuong at gmail.com (Edward Chuong) Date: Wed Mar 9 18:32:24 2005 Subject: [Bioperl-l] PAML nssites model result object In-Reply-To: References: <244d2e0e050309142370997ce4@mail.gmail.com> <896034a8342912841a4a0d0a0686353e@duke.edu> Message-ID: <244d2e0e0503091537d5f283d@mail.gmail.com> Hi Jason, Thanks for the help. The code seems to get stuck at if( ! $node->ancestor || ! $node->has_tag('t') ) { (this condition turns out true for every node, not just root, so it always hits "next") I used Data::Dumper to check on the node and I've pasted the results--it seems like those tags aren't being sent in? Thanks! -Ed '_root_cleanup_methods' => [ sub { "DUMMY" } ], '_creation_id' => 0, '_branch_length' => '0.613722', '_desc' => {}, '_id' => 'NP_033437.2_mus', '_ancestor' => bless( { '_root_cleanup_methods' => [ $VAR1->{'_root_cleanup_methods'}[0] ], '_creation_id' => 3, '_desc' => { '2' => bless( { '_root_cleanup_methods' => [ $VAR1->{'_root_cleanup_methods'}[0] ], '_creation_id' => 2, '_branch_length' => '0.768322', '_desc' => {}, '_id' => 'PM_BWp0001H02f', '_ancestor' => $VAR1->{'_ancestor'}, '_root_verbose' => 0 }, 'Bio::Tree::Node' ), '0' => $VAR1, '1' => bless( { '_root_cleanup_methods' => [ $VAR1->{'_root_cleanup_methods'}[0] ], '_creation_id' => 1, '_branch_length' => '0.366319', '_desc' => {}, '_id' => 'NP_742070.1_rat', '_ancestor' => $VAR1->{'_ancestor'}, '_root_verbose' => 0 }, 'Bio::Tree::Node' ) }, '_id' => '', '_height' => undef, '_root_verbose' => 0 }, 'Bio::Tree::Node' ), '_root_verbose' => 0 }, 'Bio::Tree::Node' ); On Wed, 9 Mar 2005 18:01:34 -0500, Jason Stajich wrote: > Resend with code pasted.... > > #!/usr/bin/perl -w > use strict; > use Bio::Tools::Phylo::PAML; > > my $outcodeml = shift(@ARGV); > > my $paml_parser = new Bio::Tools::Phylo::PAML(-file => "./$outcodeml", > -dir => "./"); > my $result = $paml_parser->next_result(); > my $MLmatrix = $result->get_MLmatrix(); # get MaxLikelihood Matrix > my @otus = $result->get_seqs; > if( $#{$MLmatrix} < 0 ) { > for my $tree ($result->next_tree ) { > for my $node ( $tree->get_nodes ) { > my $id; > if( $node->is_Leaf() ) { > $id = $node->id; > } else { > $id = "(".join(",", map { $_->id } grep { $_->is_Leaf } > $node->get_all_Descendents) .")"; > } > if( ! $node->ancestor || ! $node->has_tag('t') ) { > # skip when no values have been associated with this node > # (like the root node) > next; > } > # I know this looks complicated > # but we use the get_tag_values method to pull out the annotations > # for each branch > # The ()[0] around the call is because get_tag_values returns a > list > # if we want to just get the 1st item in the list we have > # to tell Perl we are treating it like an array. > # in the future get_tag_values needs to be smart and just > # return the 1st item in the array if called in scalar > # context > > printf > "%s\tt=%.3f\tS=%.1f\tN=%.1f\tdN/ > dS=%.4f\tdN=%.4f\tdS=%.4f\tS*dS=%.1f\tN*dN=%.1f\n", > $id, > map { ($node->get_tag_values($_))[0] } > qw(t S N dN/dS dN dS), 'S*dS', 'N*dN'; > } > } > } else { > my $i =0; > my @seqs = $result->get_seqs; > for my $row ( @$MLmatrix ) { > print $seqs[$i++]->display_id, join("\t",@$row), "\n"; > } > } > > On Mar 9, 2005, at 5:41 PM, Jason Stajich wrote: > > > I just updated things last week so this is brand-spanking-new. I > > don't know if I connected everything up for NSsites stuff quite yet > > as that is handled in - the branch-specific parsing should work now. > > I don't know if the synopsis code is really up to snuff either. When > > I get around to it I will try and see what still needs to be connected > > in NSsites parsing. > > > > I don't think $node->param() is going to work - > > $node->get_tag_values() is the way I've implemented it. > > > > <00parse_codeml.pl> > > > > -jason > > -- > > Jason Stajich > > jason.stajich at duke.edu > > http://www.duke.edu/~jes12/ > > > > On Mar 9, 2005, at 5:23 PM, Edward Chuong wrote: > > > >> Hi all, > >> > >> I'm trying to parse PAML results, and running into some trouble. I'm > >> using branch specific omega model, and I want to get the branch > >> specific ka/ks values out. > >> http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/ > >> Tools/Phylo/PAML.pm?rev=HEAD&cvsroot=bioperl&content-type=text/ > >> vnd.viewcvs-markup > >> says that $node->param('omega') should work, but Data::Dumper shows > >> that this value isn't stored in the node (only branch lengths and seq > >> IDs appear to be stored). > >> > >> I'm assuming that I can get these values out of the > >> get_NSSite_result() Bio::Tools::Phylo::PAML::ModelResult object, but > >> I'm not sure how to call it. The current synopsis uses > >> "get_model_params" but it seems to be out of date because it's not in > >> the current souce. The docs at > >> http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/ > >> Tools/Phylo/PAML/Result.pm?rev=HEAD&cvsroot=bioperl&content- > >> type=text/vnd.viewcvs-markup > >> say to use my > >> @results = @{$self->get_NSSite_results}; > >> --that looks like a mistake, and I've tried > >> @result = $result->get_NSSite_results but that doesn't work either > >> (just get undefined objs). > >> > >> Am I doing something wrong, or is this functionality still being > >> worked on? I've tried using both 1.4 and the LIVE versions. Any help > >> is appreciated, thanks! > >> > >> -Ed > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l@portal.open-bio.org > >> http://portal.open-bio.org/mailman/listinfo/bioperl-l > >> > > -- Edward Chuong (949) 939-2732 AIM: edawad85 From davila at ioc.fiocruz.br Wed Mar 9 21:42:55 2005 From: davila at ioc.fiocruz.br (davila) Date: Wed Mar 9 21:42:51 2005 Subject: [Bioperl-l] Mysql columns and Blast evalues Message-ID: <8D44604203DAF9438BF9123B4A08C779575FC1@alpha.ioc.fiocruz.br> Hi All, Not sure you already discussed this but I was not able to find anything by using google... I am trying to store parsed Blast e-values (parsed with SearchIO) into mysql tables (MyISAM), the column in question is double(11,2)... would it be ok for really small e-values (eg: 1e-197) ? I am using MySQL 4.1.10 and only see "0" (zero) in the tables... when I set the column to double(11,3) then can see smaller evalues (like the above mentioned) ... Another problem is to print in the screen those evalues, actually we are using CGI and sprintf like this: $e_value = sprintf ( "%0.1e", $blast_hits->[$i][2]/1e200 ); But can only see values as "0.0e+00" in the screen... Any tips, would be greatly appreciated. Kindest regards, Alberto From skirov at utk.edu Wed Mar 9 22:25:46 2005 From: skirov at utk.edu (Stefan Kirov) Date: Wed Mar 9 22:21:15 2005 Subject: [Bioperl-l] Mysql columns and Blast evalues In-Reply-To: <8D44604203DAF9438BF9123B4A08C779575FC1@alpha.ioc.fiocruz.br> References: <8D44604203DAF9438BF9123B4A08C779575FC1@alpha.ioc.fiocruz.br> Message-ID: <422FBE3A.3080006@utk.edu> Try to store it as a varchar, it will not make much difference. You printf is OK I guess, I guess either $blast or $i is actually undef and undef/anything (except undef or 0) is 0. You can use CGI::Debug (for example use CGI::Debug (report=>'everything', on=>'anything'); to trace the vars in a CGI script. There are other ways to debug CGI script, including command line -d (try searching google for debug CGI). Hope this helps. davila wrote: >Hi All, > >Not sure you already discussed this but I was not able to find anything by using google... > >I am trying to store parsed Blast e-values (parsed with SearchIO) into mysql tables (MyISAM), the column in question is double(11,2)... would it be ok for really small e-values (eg: 1e-197) ? I am using MySQL 4.1.10 and only see "0" (zero) in the tables... when I set the column to double(11,3) then can see smaller evalues (like the above mentioned) ... > >Another problem is to print in the screen those evalues, actually we are using CGI and sprintf like this: > >$e_value = sprintf ( "%0.1e", $blast_hits->[$i][2]/1e200 ); > >But can only see values as "0.0e+00" in the screen... > >Any tips, would be greatly appreciated. > >Kindest regards, Alberto > > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > From ak at ebi.ac.uk Thu Mar 10 04:43:29 2005 From: ak at ebi.ac.uk (Andreas Kahari) Date: Thu Mar 10 04:38:28 2005 Subject: [Bioperl-l] Request for advice and pointers on a project to help biologists d o simple formatting and analysis In-Reply-To: <339D68B133EAD311971E009027DC4797022FD7CB@montecarlo.cgr.harvard.edu> References: <339D68B133EAD311971E009027DC4797022FD7CB@montecarlo.cgr.harvard.edu> Message-ID: <20050310094329.GB29547@ebi.ac.uk> I'm not quite sure what this has to do with bioperl... On Wed, Mar 09, 2005 at 01:46:17PM -0500, Amir Karger wrote: > In a private mail, Richard Copley wrote: Forwarding private emails to mailing lists are we? > >Amir Karger wrote: > >> I was thinking it would be useful to have a toolkit of outrageously > simple > >> Perl one-liners. Here's one: http://www.oreilly.com/catalog/cookbook/ > >> > >> # Merge two lists, removing duplicates (logical OR) > >> perl -ne '$seen{$_}++; END {print keys %seen}' file1 file2 > outfile > > > >sort -u file1 file2 > > I know that many of the tasks proposed for the Scriptome can be done with > grep, sed, cut, Word, or Excel. I'm planning on implementing head, sort, > join, and lots of others. But how many experimental biologists are familiar > with Unix cut? How many bother to learn even the least fancy Excel > functions? I think not many, because they have other things to worry about. Hmmm, comparing 'cut' and 'sed' with Word and Excel? Oh well. The philosophy of Unix utilities is to do only one thing, but to do it very well. In the case with the 'sort' utility for example, it will most likely use an out-of-core sorting algorithm to cope with files larger than the available memory of the machine, and will probably be a fair bit quicker and flexible than your own implementation. > One reason so many people have created integrated toolboxes is so that > biologists only need to learn how to use one tool, rather than learning 30 > or whatever Unix commands. The goal of Scriptome is that they only need to > learn one tool AND that the learning curve for that tool is very small. And > we make the learning curve small by using an extremely lightweight interface > (most of solving a problem involves searching on a website) rather than by > trying to create an intuitive GUI. After all, how many folks other than > Apple have created GUIs that are intuitive for more than a small subset of > people? The reason why so many people are creating integrating toolboxes (really, are they?) is probably just because so many people before them didn't do it right. Mind you, doing it "right" is not possible. I do understand that there is a need for integrated utilities with easy-to-press buttons, and I won't try to put you off working on those kind of projects, but... What would an experimental biologists, who is not familiar with 'sort', 'cut' or 'join', do with a Perl script that implemented those functionalities? Wouldn't it be better to provide a high-level interface to common tasks, like parsing the output from various programs and providing simple ways of accessing and manipulating sequence features etc. If you find ways to expand the application area of BioPerl, or if you rationalize and improve existing BioPerl code, then I'm sure the BioPerl maintainers would be happy to consider commiting your code to the project. Regards, Andreas -- Andreas K?h?ri EMBL-EBI/ensembl 1024D/C2E163CB From davila at ioc.fiocruz.br Thu Mar 10 04:42:19 2005 From: davila at ioc.fiocruz.br (davila) Date: Thu Mar 10 04:42:29 2005 Subject: RES: [Bioperl-l] Mysql columns and Blast evalues Message-ID: <8D44604203DAF9438BF9123B4A08C779575FC3@alpha.ioc.fiocruz.br> Hi Stefan, Thanks for the tips ! I guess the problem of using VARCHAR could be the limitations to compare the real evalues, so if I want to do something or only show evalues greater or smaller than 1e-50 would it work ok ? I wonder to know what other (mysql) column types (any further details would be appreciated) colleagues are using to store their Blast evalues ? Thanks. -----Mensagem original----- De: Stefan Kirov [mailto:skirov@utk.edu] Enviada: qui 10/3/2005 00:25 Para: davila Cc: bioperl-l@portal.open-bio.org Assunto: Re: [Bioperl-l] Mysql columns and Blast evalues From avilella at gmail.com Thu Mar 10 08:08:57 2005 From: avilella at gmail.com (Albert Vilella) Date: Thu Mar 10 08:10:18 2005 Subject: [Bioperl-l] stockholm AlignIO write_aln method Message-ID: <1110460137.8027.6.camel@magneto> Hi, I'm willing to use the unimplemented write_aln method in stockholm format. As it isn't implemented, I would like to ask where could I find the doc files for the format and the minimal the set of rules write_aln method should obey, Anyone? Bests, Albert. From skirov at utk.edu Thu Mar 10 08:21:28 2005 From: skirov at utk.edu (Stefan Kirov) Date: Thu Mar 10 08:16:34 2005 Subject: RES: [Bioperl-l] Mysql columns and Blast evalues In-Reply-To: <8D44604203DAF9438BF9123B4A08C779575FC3@alpha.ioc.fiocruz.br> References: <8D44604203DAF9438BF9123B4A08C779575FC3@alpha.ioc.fiocruz.br> Message-ID: <423049D8.3070407@utk.edu> I think it should work OK. And I think the values you want to store are too small for float (53 is the smallest: you should use float(53,53), which defines both storage and display precision). I am personally using Oracle so it is a differnt game. One thing you can do is store the exponent, even just as an integer: my $exp=int(log($blast_eval)); or my $exp=log($blast_eval); and store it as float. When you want to work with the number again: my $blast_eval=1**$exp; Stefan davila wrote: >Hi Stefan, > >Thanks for the tips ! > >I guess the problem of using VARCHAR could be the limitations to compare the real evalues, so if I want to do something or only show evalues greater or smaller than 1e-50 would it work ok ? > >I wonder to know what other (mysql) column types (any further details would be appreciated) colleagues are using to store their Blast evalues ? > >Thanks. > > >-----Mensagem original----- >De: Stefan Kirov [mailto:skirov@utk.edu] >Enviada: qui 10/3/2005 00:25 >Para: davila >Cc: bioperl-l@portal.open-bio.org >Assunto: Re: [Bioperl-l] Mysql columns and Blast evalues > > > > > From avilella at ub.edu Thu Mar 10 08:26:31 2005 From: avilella at ub.edu (Albert Vilella) Date: Thu Mar 10 08:20:58 2005 Subject: [Bioperl-l] stockholm AlignIO write_aln method Message-ID: <1110461192.8193.0.camel@magneto> Hi, I'm willing to use the unimplemented write_aln method in stockholm format. As it isn't implemented, I would like to ask where could I find the doc files for the format and the minimal the set of rules write_aln method should obey, Anyone? Bests, Albert. From avilella at ub.edu Thu Mar 10 08:31:16 2005 From: avilella at ub.edu (Albert Vilella) Date: Thu Mar 10 08:25:45 2005 Subject: [Bioperl-l] hapmap.pm startingcol now 11? Message-ID: <1110461476.8193.6.camel@magneto> Hi all, AFAICS, Hapmap dump files have (since Dec 2004?) an extra field previous to the starting column for the first genotype, so the $startingcol in hapmap.pm should change from 10 to 11 (see end of message). Can anyone confirm? I'm getting a MSG: -------------------- WARNING --------------------- MSG: cannot add NA06993 as a genotype skipping -------------------------------------------------- And I'm not sure is related to this or not, Bests, Albert. hapmap.pm --------------------------- sub _pivot { my ($self) = @_; my (@cols,@rows,@idheader); while ($_ = $self->_readline){ chomp($_); next if( /^\s*\#/ || /^\s+$/ || ! length($_) ); if( /^rs\#\s+alleles\s+chrom\s+pos\s+strand/ ) { @idheader = split $self->flag('field_delimiter'); } else { push @cols, [split $self->flag('field_delimiter')]; } } #Post Dec 2004. Previously was 10 my $startingcol = 11; $self->{'_header'} = [ map { $_->[0] } @cols]; for my $n ($startingcol.. $#{ $cols[ 0 ]}) { my $column = [ $idheader[$n], map{ $_->[ $n ] } @cols ]; push (@rows, $column); } $self->{'_pivot'} = [@rows]; $self->{'_i'} = 0; } --------------------------- From jason.stajich at duke.edu Thu Mar 10 08:30:32 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Thu Mar 10 08:30:21 2005 Subject: [Bioperl-l] stockholm AlignIO write_aln method In-Reply-To: <1110460137.8027.6.camel@magneto> References: <1110460137.8027.6.camel@magneto> Message-ID: <98e7cc2b3a0c45ccb1f8b674181850d3@duke.edu> Albert write_aln should take a list of Bio::Align::AlignI objects (more concretely, a Bio::SimpleAlign objects) and write them out using $self->_print. AlignIO::clustalw is a good example. Stockholm format is documented here: http://www.cgb.ki.se/cgb/groups/sonnhammer/Stockholm.html We don't really support the notion of #=GF, #=GC, #=GS, #=GR lines in Bioperl at this point although that would be nice so we can store and manipulate data like secondary structure strings within Bioperl. I know some people had talked about this a long time ago, I don't think anything was done... -jason -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ On Mar 10, 2005, at 8:08 AM, Albert Vilella wrote: > Hi, > > I'm willing to use the unimplemented write_aln method in stockholm > format. > > As it isn't implemented, I would like to ask where could I find the doc > files for the format and the minimal the set of rules write_aln method > should obey, > > Anyone? > > Bests, > > Albert. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -------------- next part -------------- A non-text attachment was scrubbed... Name: PGP.sig Type: application/pgp-signature Size: 186 bytes Desc: This is a digitally signed message part Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050310/4925255d/PGP.bin From fernan at iib.unsam.edu.ar Thu Mar 10 08:48:14 2005 From: fernan at iib.unsam.edu.ar (Fernan Aguero) Date: Thu Mar 10 08:43:21 2005 Subject: [Bioperl-l] Proposal for bio-perl updates: ACE assembly file In-Reply-To: <4f12c65ac919697fd8a7e9220db182fd@tigem.it> References: <200502141205.52256.jswanson@iastate.edu> <200502281005.06990.jswanson@iastate.edu> <4f12c65ac919697fd8a7e9220db182fd@tigem.it> Message-ID: <20050310134814.GE27364@iib.unsam.edu.ar> +----[ Elia Stupka (01.Mar.2005 10:17): | | Hi Jordan, | | I have been doing some work on Contig::Assembly myself recently, and | have also been in touch with the author (Robson) about it. Perhaps the | best thing would be for the three of us to have a chat about this | object, try to revamp it a little with our improvements, and then | Robson or I can check it in? | | regards, | | Elia | +----] Hi! We have just got a need to produce .ace files and noticed that this functionality was lacking. I also saw the recent thread about this topic on the list. Question: has this moved forward since the last message was posted (March 1st)? If so, are the proposed changes in a form that can be applied and tested by others (a recursive diff, perhaps against a recent CVS checkout or against the 1.5-release) Thanks in advance, Fernan From skirov at utk.edu Thu Mar 10 09:14:40 2005 From: skirov at utk.edu (Stefan Kirov) Date: Thu Mar 10 09:09:42 2005 Subject: [Bioperl-l] Entrez Gene ASN Message-ID: <42305650.30403@utk.edu> Hi guys! I have done some (mostly) serious thinking about ASN Entrez Gene parsing and I propose we do my favorite thing- postpone everything we cannot deal with right now. If you want it to sound better: take a gradual approach where we store the data we can deal with in the existing Bioperl objects and skipping the rest for now. In details: ASN gene record can be correctly represented as a tree. I have written a simple parser for my own purposes which is storing the following: node_id---| --parent --level --tag --values What I do then is get specific levels and tags and build different objects. So level 2 with parent EntrezGene (which is the root level and has no information) is gene description and has tags such as gene, name, etc; at level 3, 5 and 6 you can get the complete specie definition by looking for orgname and org as tags and records with parent mod (which is a value for orgname, descend down the branch). I am using this approach to store most of the data in a relational database without going through Bioperl. What I ultimately want to do is use standard Bioperl modules. However, I don't think we have an object that can efficiently represent the structure (correct me if I am wrong). I think it may be a good idea to have a container object, possibly Bio::Gene that may contain multiple Bio::Seq objects (with or without real sequence). I believe we can borrow some structure and code from EnsEMBL gene representation (way to contain multiple transcripts, etc., not the database interactions certainly). Please let me know what you think. Stefan From skirov at utk.edu Thu Mar 10 09:20:49 2005 From: skirov at utk.edu (Stefan Kirov) Date: Thu Mar 10 09:15:38 2005 Subject: [Bioperl-l] Request for advice and pointers on a project to help biologists d o simple formatting and analysis In-Reply-To: <20050310094329.GB29547@ebi.ac.uk> References: <339D68B133EAD311971E009027DC4797022FD7CB@montecarlo.cgr.harvard.edu> <20050310094329.GB29547@ebi.ac.uk> Message-ID: <423057C1.1090700@utk.edu> Allow me disagree. My understanding is this project is more about making biologist "computational hungry" rather than creating effective applications from a computation point of view. So I think it is more of an outreach project (did I get this right Amir). Bioperl. The next logical thing for any biologist who is starting to use the computer as something more than a typewriter is to use something like Bioperl, because it is quite easy to understand and use (in many cases anyway). Stefan Andreas Kahari wrote: >I'm not quite sure what this has to do with bioperl... > >On Wed, Mar 09, 2005 at 01:46:17PM -0500, Amir Karger wrote: > > >>In a private mail, Richard Copley wrote: >> >> > >Forwarding private emails to mailing lists are we? > > > >>>Amir Karger wrote: >>> >>> >>>>I was thinking it would be useful to have a toolkit of outrageously >>>> >>>> >>simple >> >> >>>>Perl one-liners. Here's one: >>>> >>>> > >http://www.oreilly.com/catalog/cookbook/ > > > >>>> # Merge two lists, removing duplicates (logical OR) >>>> perl -ne '$seen{$_}++; END {print keys %seen}' file1 file2 > outfile >>>> >>>> >>>sort -u file1 file2 >>> >>> >>I know that many of the tasks proposed for the Scriptome can be done with >>grep, sed, cut, Word, or Excel. I'm planning on implementing head, sort, >>join, and lots of others. But how many experimental biologists are familiar >>with Unix cut? How many bother to learn even the least fancy Excel >>functions? I think not many, because they have other things to worry about. >> >> > >Hmmm, comparing 'cut' and 'sed' with Word and Excel? Oh well. > >The philosophy of Unix utilities is to do only one thing, >but to do it very well. In the case with the 'sort' utility >for example, it will most likely use an out-of-core sorting >algorithm to cope with files larger than the available memory >of the machine, and will probably be a fair bit quicker and >flexible than your own implementation. > > > >>One reason so many people have created integrated toolboxes is so that >>biologists only need to learn how to use one tool, rather than learning 30 >>or whatever Unix commands. The goal of Scriptome is that they only need to >>learn one tool AND that the learning curve for that tool is very small. And >>we make the learning curve small by using an extremely lightweight interface >>(most of solving a problem involves searching on a website) rather than by >>trying to create an intuitive GUI. After all, how many folks other than >>Apple have created GUIs that are intuitive for more than a small subset of >>people? >> >> > >The reason why so many people are creating integrating toolboxes >(really, are they?) is probably just because so many people >before them didn't do it right. Mind you, doing it "right" is >not possible. > >I do understand that there is a need for integrated utilities >with easy-to-press buttons, and I won't try to put you off >working on those kind of projects, but... > >What would an experimental biologists, who is not familiar with >'sort', 'cut' or 'join', do with a Perl script that implemented >those functionalities? Wouldn't it be better to provide a >high-level interface to common tasks, like parsing the output >from various programs and providing simple ways of accessing >and manipulating sequence features etc. If you find ways to >expand the application area of BioPerl, or if you rationalize >and improve existing BioPerl code, then I'm sure the BioPerl >maintainers would be happy to consider commiting your code to >the project. > > > >Regards, >Andreas > > > -- Stefan Kirov, Ph.D. University of Tennessee/Oak Ridge National Laboratory 5700 bldg, PO BOX 2008 MS6164 Oak Ridge TN 37831-6164 USA tel +865 576 5120 fax +865-576-5332 e-mail: skirov@utk.edu sao@ornl.gov "And the wars go on with brainwashed pride For the love of God and our human rights And all these things are swept aside" From amackey at pcbi.upenn.edu Thu Mar 10 09:34:47 2005 From: amackey at pcbi.upenn.edu (Aaron J. Mackey) Date: Thu Mar 10 09:29:59 2005 Subject: RES: [Bioperl-l] Mysql columns and Blast evalues In-Reply-To: <8D44604203DAF9438BF9123B4A08C779575FC3@alpha.ioc.fiocruz.br> References: <8D44604203DAF9438BF9123B4A08C779575FC3@alpha.ioc.fiocruz.br> Message-ID: Many databases store mantissa and exponent separately, e.g. 4.5e-100 gets stored as 4.5 (low-precision float) and -100 (signed "medium" integer) That way you can continue to use native database filters: SELECT * FROM hit WHERE hit.exponent <= -6 OR (hit.exponent = -5 AND hit.mantissa = 1) This will identify all hits with E values less than or equal to 1e-5 (if you don't care about those equal to exactly 1e-5, you can drop the OR clause). This mechanism also allows you to format the mantissa for printing precision, without worrying about converting the entire thing to a less-precise double: my $evalue = sprintf(%0.1fe%d, $mantissa, $exponent); -Aaron On Mar 10, 2005, at 4:42 AM, davila wrote: > I wonder to know what other (mysql) column types (any further details > would be appreciated) colleagues are using to store their Blast > evalues ? > -- Aaron J. Mackey, Ph.D. Dept. of Biology, Goddard 212 University of Pennsylvania email: amackey@pcbi.upenn.edu 415 S. University Avenue office: 215-898-1205 Philadelphia, PA 19104-6017 fax: 215-746-6697 From jason.stajich at duke.edu Thu Mar 10 09:48:02 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Thu Mar 10 09:42:40 2005 Subject: [Bioperl-l] PAML nssites model result object In-Reply-To: <244d2e0e0503091537d5f283d@mail.gmail.com> References: <244d2e0e050309142370997ce4@mail.gmail.com> <896034a8342912841a4a0d0a0686353e@duke.edu> <244d2e0e0503091537d5f283d@mail.gmail.com> Message-ID: <4ad236e4a716973b61ce63f1aa251a31@duke.edu> The script needs to be adjusted for NSsites because their are trees are associated with each model result so you need one more loop on the get_NSSite_results. I added some code to the script to print out the positively selected sites as well. #!/usr/bin/perl -w use strict; use Bio::Tools::Phylo::PAML; my $outcodeml = shift(@ARGV); my $paml_parser = new Bio::Tools::Phylo::PAML(-file => "./$outcodeml", -dir => "./"); my $result = $paml_parser->next_result(); my $MLmatrix = $result->get_MLmatrix(); # get MaxLikelihood Matrix my @otus = $result->get_seqs; # process the NSsites results for my $ns_result ( $result->get_NSSite_results ) { print "model ", $ns_result->model_num, " ", $ns_result->model_description, "\n"; while ( my $tree = $ns_result->next_tree ) { for my $node ( $tree->get_nodes ) { my $id; if( $node->is_Leaf() ) { $id = $node->id; } else { $id = "(".join(",", map { $_->id } grep { $_->is_Leaf } $node->get_all_Descendents) .")"; } if( ! $node->ancestor || ! $node->has_tag('t') ) { # skip when no values have been associated with this node # (like the root node) next; } printf "%s\tt=%.3f\tS=%.1f\tN=%.1f\tdN/ dS=%.4f\tdN=%.4f\tdS=%.4f\tS*dS=%.1f\tN*dN=%.1f\n", $id, map { ($node->get_tag_values($_))[0] } qw(t S N dN/dS dN dS), 'S*dS', 'N*dN'; } } print "positively selected sites:\n"; # get the positively select sites for my $site ( $ns_result->get_pos_selected_sites ) { print join(" ", @$site, "\n"); } print "\n"; } -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ On Mar 9, 2005, at 6:37 PM, Edward Chuong wrote: > Hi Jason, > > Thanks for the help. > > The code seems to get stuck at > > if( ! $node->ancestor || ! $node->has_tag('t') ) { > (this condition turns out true for every node, not just root, so it > always hits "next") > > I used Data::Dumper to check on the node and I've pasted the > results--it seems like those tags aren't being sent in? > > > Thanks! > -Ed > > '_root_cleanup_methods' => [ > sub { "DUMMY" } > ], > '_creation_id' => 0, > '_branch_length' => '0.613722', > '_desc' => {}, > '_id' => 'NP_033437.2_mus', > '_ancestor' => bless( { > '_root_cleanup_methods' => [ > $VAR1->{'_root_cleanup_methods'}[0] > ], > '_creation_id' => 3, > '_desc' => { > '2' => bless( { > '_root_cleanup_methods' => [ > $VAR1->{'_root_cleanup_methods'}[0] > ], > '_creation_id' => 2, > '_branch_length' => '0.768322', > '_desc' => {}, > '_id' => 'PM_BWp0001H02f', > '_ancestor' => $VAR1->{'_ancestor'}, > '_root_verbose' => 0 > }, 'Bio::Tree::Node' ), > '0' => $VAR1, > '1' => bless( { > '_root_cleanup_methods' => [ > $VAR1->{'_root_cleanup_methods'}[0] > ], > '_creation_id' => 1, > '_branch_length' => '0.366319', > '_desc' => {}, > '_id' => 'NP_742070.1_rat', > '_ancestor' => $VAR1->{'_ancestor'}, > '_root_verbose' => 0 > }, 'Bio::Tree::Node' ) > }, > '_id' => '', > '_height' => undef, > '_root_verbose' => 0 > }, 'Bio::Tree::Node' ), > '_root_verbose' => 0 > }, 'Bio::Tree::Node' ); > > > On Wed, 9 Mar 2005 18:01:34 -0500, Jason Stajich > wrote: >> Resend with code pasted.... >> >> #!/usr/bin/perl -w >> use strict; >> use Bio::Tools::Phylo::PAML; >> >> my $outcodeml = shift(@ARGV); >> >> my $paml_parser = new Bio::Tools::Phylo::PAML(-file => "./$outcodeml", >> -dir => "./"); >> my $result = $paml_parser->next_result(); >> my $MLmatrix = $result->get_MLmatrix(); # get MaxLikelihood Matrix >> my @otus = $result->get_seqs; >> if( $#{$MLmatrix} < 0 ) { >> for my $tree ($result->next_tree ) { >> for my $node ( $tree->get_nodes ) { >> my $id; >> if( $node->is_Leaf() ) { >> $id = $node->id; >> } else { >> $id = "(".join(",", map { $_->id } grep { $_->is_Leaf >> } >> $node->get_all_Descendents) .")"; >> } >> if( ! $node->ancestor || ! $node->has_tag('t') ) { >> # skip when no values have been associated with this >> node >> # (like the root node) >> next; >> } >> # I know this looks complicated >> # but we use the get_tag_values method to pull out the >> annotations >> # for each branch >> # The ()[0] around the call is because get_tag_values >> returns a >> list >> # if we want to just get the 1st item in the list we have >> # to tell Perl we are treating it like an array. >> # in the future get_tag_values needs to be smart and just >> # return the 1st item in the array if called in scalar >> # context >> >> printf >> "%s\tt=%.3f\tS=%.1f\tN=%.1f\tdN/ >> dS=%.4f\tdN=%.4f\tdS=%.4f\tS*dS=%.1f\tN*dN=%.1f\n", >> $id, >> map { ($node->get_tag_values($_))[0] } >> qw(t S N dN/dS dN dS), 'S*dS', 'N*dN'; >> } >> } >> } else { >> my $i =0; >> my @seqs = $result->get_seqs; >> for my $row ( @$MLmatrix ) { >> print $seqs[$i++]->display_id, join("\t",@$row), "\n"; >> } >> } >> >> On Mar 9, 2005, at 5:41 PM, Jason Stajich wrote: >> >>> I just updated things last week so this is brand-spanking-new. I >>> don't know if I connected everything up for NSsites stuff quite yet >>> as that is handled in - the branch-specific parsing should work now. >>> I don't know if the synopsis code is really up to snuff either. When >>> I get around to it I will try and see what still needs to be >>> connected >>> in NSsites parsing. >>> >>> I don't think $node->param() is going to work - >>> $node->get_tag_values() is the way I've implemented it. >>> >>> <00parse_codeml.pl> >>> >>> -jason >>> -- >>> Jason Stajich >>> jason.stajich at duke.edu >>> http://www.duke.edu/~jes12/ >>> >>> On Mar 9, 2005, at 5:23 PM, Edward Chuong wrote: >>> >>>> Hi all, >>>> >>>> I'm trying to parse PAML results, and running into some trouble. I'm >>>> using branch specific omega model, and I want to get the branch >>>> specific ka/ks values out. >>>> http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/ >>>> Tools/Phylo/PAML.pm?rev=HEAD&cvsroot=bioperl&content-type=text/ >>>> vnd.viewcvs-markup >>>> says that $node->param('omega') should work, but Data::Dumper shows >>>> that this value isn't stored in the node (only branch lengths and >>>> seq >>>> IDs appear to be stored). >>>> >>>> I'm assuming that I can get these values out of the >>>> get_NSSite_result() Bio::Tools::Phylo::PAML::ModelResult object, but >>>> I'm not sure how to call it. The current synopsis uses >>>> "get_model_params" but it seems to be out of date because it's not >>>> in >>>> the current souce. The docs at >>>> http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/ >>>> Tools/Phylo/PAML/Result.pm?rev=HEAD&cvsroot=bioperl&content- >>>> type=text/vnd.viewcvs-markup >>>> say to use my >>>> @results = @{$self->get_NSSite_results}; >>>> --that looks like a mistake, and I've tried >>>> @result = $result->get_NSSite_results but that doesn't work either >>>> (just get undefined objs). >>>> >>>> Am I doing something wrong, or is this functionality still being >>>> worked on? I've tried using both 1.4 and the LIVE versions. Any help >>>> is appreciated, thanks! >>>> >>>> -Ed >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l@portal.open-bio.org >>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>> >> >> > > > -- > Edward Chuong > (949) 939-2732 > AIM: edawad85 > From akarger at CGR.Harvard.edu Thu Mar 10 10:08:27 2005 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Thu Mar 10 10:02:08 2005 Subject: [Bioperl-l] Request for advice and pointers on a project to h elp biologists d o simple formatting and analysis Message-ID: <339D68B133EAD311971E009027DC4797022FD7DA@montecarlo.cgr.harvard.edu> [snipped throughout for "brevity"] > From: Andreas Kahari [mailto:ak@ebi.ac.uk] > > I'm not quite sure what this has to do with bioperl... 1. From http://www.bioperl.org: "The Bioperl server provides an online resource for modules, scripts, and web links for developers of Perl-based software for life science research." I assumed bioperl-l was for disucssions of doing Bio with Perl. 2. I asked in my original mail: "Are there any other lists I should post these questions to?" but no one has suggested any lists or newsgroups yet. 3. My original mail also said, "take advantage of existing tools' APIs: perl -MBio::Perl -e '...'" > On Wed, Mar 09, 2005 at 01:46:17PM -0500, Amir Karger wrote: > > > >Amir Karger wrote: > > >> I was thinking it would be useful to have a > > >> toolkit of outrageously simple > > >> Perl one-liners. Here's one: > > http://www.oreilly.com/catalog/cookbook/ How many biologists who don't use Perl will read the Perl cookbook? Or were you just making a suggestion of where I could take scripts from? Actually, looking through the table of contents, I see only a few recipes that would fit. In any case, writing the scripts is not the hard part; it's knowing which scripts will be useful and helping biologists find the right ones to solve their particular problems. > > I know that many of the tasks proposed for the Scriptome > > can be done with > > grep, sed, cut, Word, or Excel. But how many experimental > > biologists are familiar > > with Unix cut? I think not many, because they have other > things to worry about. > > Hmmm, comparing 'cut' and 'sed' with Word and Excel? Oh well. I'm not comparing the quality of sed vs. Find/Replace. Most biologists (at least here) prefer Windows. They already use Excel to look at their data. Excel has functions to do simple data analysis, but my impression is that few biologists use those functions. > The philosophy of Unix utilities is to do only one thing, > but to do it very well. In the case with the 'sort' utility > for example, it will most likely use an out-of-core sorting > algorithm to cope with files larger than the available memory > of the machine, and will probably be a fair bit quicker and > flexible than your own implementation. The Scriptome is not aiming at sorting gigabyte files; does a biologist want to sort an entire Genbank file? I think much more often they'll want to sort < 10 MB lists of genes or whatever. On small files, the sorting algorithm doesn't matter. If they do try to sort too big a file, the script will break, and they'll need to try a different tool. I'm not claiming that my solution will solve every conceivable task, just the easy ones. > I do understand that there is a need for integrated utilities > with easy-to-press buttons, and I won't try to put you off > working on those kind of projects, but... > > What would an experimental biologists, who is not familiar with > 'sort', 'cut' or 'join', do with a Perl script that implemented > those functionalities? sort, cut, or join files! I don't think I understand your question. An experimental biologist who knows just a little Unix can take a sorting script, paste it to the command line, and use it. We're talking about use cases where the biologist knows exactly what they want to do - sort a file, merge files together, pull out the 8th column from the data into a new file, etc. - but not how to implement a solution. Who knows? Maybe eventually we'll decide to put "sort -u file1 file2" as a "script". But we wouldn't want to use *only* Unix commands because that ignores all the stuff Unix can't (easily) do. > Wouldn't it be better to provide a > high-level interface to common tasks, like parsing the output > from various programs and providing simple ways of accessing > and manipulating sequence features etc. That's exactly what I want to do. My interface is searching for a tool on a website and pasting it onto the Unix command line. > If you find ways to > expand the application area of BioPerl, or if you rationalize > and improve existing BioPerl code, then I'm sure the BioPerl > maintainers would be happy to consider commiting your code to > the project. I believe my project is complementary to Bioperl's bioscripts, but it aims at a different set of tasks, namely, tasks that are so simple that Bioperlers haven't bothered to commit the scripts to CVS. If I want to count how many microarray hits have names and how many just have CG numbers, I'll do it in a Perl one-liner that takes 3 minutes to write and maybe 10 for debugging and formatting. Why bother committing that to CVS? Well, an experimental biologist in my group gave me that exact example, and told me she spent 20 minutes counting and double-checking. If she had had 1000 hits instead of 100, she would have needed hours to count. More likely, she would have just given up. To put it another way, I'm aiming to make hard things possible - specifically things that are hard for biologists who aren't programmers. Bioperl, on the other hand, is focusing on things that are hard (or hard to do right, or at least annoying) even for programmers. I am making at least a couple assumptions about the niche I'm aiming for: people who know how to use the command line but don't know Perl. 1. There are many such people (or enough to care about) 2. They will be able to put the "atomic" scripts together to solve real problems (first join two files with a script, sort with another script, remove duplicates with a third) I may be wrong about either of these. It may be that even with the Scriptome tools, you have to "think like a programmer" to do these sorts of tasks, and that many biologists' brains just don't work that way. But I think it's worth trying. -Amir From akarger at CGR.Harvard.edu Thu Mar 10 10:14:01 2005 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Thu Mar 10 10:08:06 2005 Subject: [Bioperl-l] Request for advice and pointers on a project to h elp biologists d o simple formatting and analysis Message-ID: <339D68B133EAD311971E009027DC4797022FD7DB@montecarlo.cgr.harvard.edu> Stefan Kirov wrote [responding to Andreas]: > > Allow me disagree. My understanding is this project is more > about making biologist "computational hungry" rather than > creating effective applications from a computation point of view. > So I think it is more of > an outreach project (did I get this right Amir). Well, not quite. I absolutely want these tools to be useful, but I don't expect them to solve all problems. On the other hand, there is definitely the potential that the scripts will: 1. Provide very useful examples that biologists can "tweak". This is much easier than writing programs from scratch, so it can provide a much less threatening intro to programming. 2. Demonstrate to biologists that Perl is extremely useful. > Bioperl. The next > logical thing for any biologist who is starting to use the > computer as > something more than a typewriter is to use something like Bioperl, > because it is quite easy to understand and use (in many cases anyway). Absolutely. Don't get me wrong: if we could convince everyone on Earth to learn Perl, I'd be thrilled. And for every experimental biologist to learn Bioperl. I'm a big fan. It's just that a lot of people don't have the time or will to learn programming. -Amir From mbasu at mail.nih.gov Thu Mar 10 10:41:16 2005 From: mbasu at mail.nih.gov (Malay) Date: Thu Mar 10 10:35:31 2005 Subject: [Bioperl-l] Request for advice and pointers on a project to h elp biologists d o simple formatting and analysis In-Reply-To: <339D68B133EAD311971E009027DC4797022FD7DA@montecarlo.cgr.harvard.edu> References: <339D68B133EAD311971E009027DC4797022FD7DA@montecarlo.cgr.harvard.edu> Message-ID: <42306A9C.5030908@mail.nih.gov> Hello Amir: Without going into any arguments, I'll put my two cents into it. The mentality to help out biologists is a fundamental mistake. Most of the biologists who come into this field already knows the tricks of the game, if not they hire someone who knows. But toolmakers in the fields believe they have to help biologists, that's why there are too many non-specialized tools in the field. Toolmakers should now concentrate on tools for specialists. There are where the main dearth is and it requires a great effort to actually satisfy experts in the field. Create tools for the experts if you can. -Malay Amir Karger wrote: > [snipped throughout for "brevity"] > > >>From: Andreas Kahari [mailto:ak@ebi.ac.uk] >> >>I'm not quite sure what this has to do with bioperl... > > > 1. From http://www.bioperl.org: "The Bioperl server provides an online > resource for modules, scripts, and web links for developers of Perl-based > software for life science research." I assumed bioperl-l was for disucssions > of doing Bio with Perl. > > 2. I asked in my original mail: "Are there any other lists I should post > these questions to?" but no one has suggested any lists or newsgroups yet. > > 3. My original mail also said, "take advantage of existing tools' APIs: perl > -MBio::Perl -e '...'" > > >>On Wed, Mar 09, 2005 at 01:46:17PM -0500, Amir Karger wrote: >> >> >>>>Amir Karger wrote: >>>> >>>>>I was thinking it would be useful to have a >>>>>toolkit of outrageously simple >>>>>Perl one-liners. Here's one: >> >>http://www.oreilly.com/catalog/cookbook/ > > > How many biologists who don't use Perl will read the Perl cookbook? Or were > you just making a suggestion of where I could take scripts from? > > Actually, looking through the table of contents, I see only a few recipes > that would fit. In any case, writing the scripts is not the hard part; it's > knowing which scripts will be useful and helping biologists find the right > ones to solve their particular problems. > > >>>I know that many of the tasks proposed for the Scriptome >>>can be done with >>>grep, sed, cut, Word, or Excel. But how many experimental >>>biologists are familiar >>>with Unix cut? I think not many, because they have other >> >>things to worry about. >> >>Hmmm, comparing 'cut' and 'sed' with Word and Excel? Oh well. > > > I'm not comparing the quality of sed vs. Find/Replace. Most biologists (at > least here) prefer Windows. They already use Excel to look at their data. > Excel has functions to do simple data analysis, but my impression is that > few biologists use those functions. > > >>The philosophy of Unix utilities is to do only one thing, >>but to do it very well. In the case with the 'sort' utility >>for example, it will most likely use an out-of-core sorting >>algorithm to cope with files larger than the available memory >>of the machine, and will probably be a fair bit quicker and >>flexible than your own implementation. > > > The Scriptome is not aiming at sorting gigabyte files; does a biologist want > to sort an entire Genbank file? I think much more often they'll want to sort > < 10 MB lists of genes or whatever. On small files, the sorting algorithm > doesn't matter. If they do try to sort too big a file, the script will > break, and they'll need to try a different tool. I'm not claiming that my > solution will solve every conceivable task, just the easy ones. > > >>I do understand that there is a need for integrated utilities >>with easy-to-press buttons, and I won't try to put you off >>working on those kind of projects, but... >> >>What would an experimental biologists, who is not familiar with >>'sort', 'cut' or 'join', do with a Perl script that implemented >>those functionalities? > > > sort, cut, or join files! I don't think I understand your question. > An experimental biologist who knows just a little Unix can take a sorting > script, paste it to the command line, and use it. We're talking about use > cases where the biologist knows exactly what they want to do - sort a file, > merge files together, pull out the 8th column from the data into a new file, > etc. - but not how to implement a solution. > > Who knows? Maybe eventually we'll decide to put "sort -u file1 file2" as a > "script". But we wouldn't want to use *only* Unix commands because that > ignores all the stuff Unix can't (easily) do. > > >> Wouldn't it be better to provide a >>high-level interface to common tasks, like parsing the output >>from various programs and providing simple ways of accessing >>and manipulating sequence features etc. > > > That's exactly what I want to do. My interface is searching for a tool on a > website and pasting it onto the Unix command line. > > >> If you find ways to >>expand the application area of BioPerl, or if you rationalize >>and improve existing BioPerl code, then I'm sure the BioPerl >>maintainers would be happy to consider commiting your code to >>the project. > > > I believe my project is complementary to Bioperl's bioscripts, but it aims > at a different set of tasks, namely, tasks that are so simple that > Bioperlers haven't bothered to commit the scripts to CVS. If I want to count > how many microarray hits have names and how many just have CG numbers, I'll > do it in a Perl one-liner that takes 3 minutes to write and maybe 10 for > debugging and formatting. Why bother committing that to CVS? Well, an > experimental biologist in my group gave me that exact example, and told me > she spent 20 minutes counting and double-checking. If she had had 1000 hits > instead of 100, she would have needed hours to count. More likely, she > would have just given up. > > To put it another way, I'm aiming to make hard things possible - > specifically things that are hard for biologists who aren't programmers. > Bioperl, on the other hand, is focusing on things that are hard (or hard to > do right, or at least annoying) even for programmers. > > I am making at least a couple assumptions about the niche I'm aiming for: > people who know how to use the command line but don't know Perl. > 1. There are many such people (or enough to care about) > 2. They will be able to put the "atomic" scripts together to solve real > problems (first join two files with a script, sort with another script, > remove duplicates with a third) > > I may be wrong about either of these. It may be that even with the > Scriptome tools, you have to "think like a programmer" to do these sorts of > tasks, and that many biologists' brains just don't work that way. But I > think it's worth trying. > > -Amir > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From akarger at CGR.Harvard.edu Thu Mar 10 10:58:43 2005 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Thu Mar 10 10:51:58 2005 Subject: [Bioperl-l] Request for advice and pointers on a project to h elp biologists d o simple formatting and analysis Message-ID: <339D68B133EAD311971E009027DC4797022FD7E1@montecarlo.cgr.harvard.edu> Malay wrote: > Most of the > biologists who come into this field already knows the tricks of the > game, Really? I haven't been in the field for very long, but I've already met (smart, talented) people who are counting lines in Excel by hand, or merge files and eliminate duplicates by hand. The worst part is, I suspect there are things they give up doing just because they think it will be too much work. (By the way, I'm talking about *experimental* biologists who need to do work on computers, not about computational biologists.) > if not they hire someone who knows. Yes, that's me. But I would much prefer to collaborate with someone on an interesting research project that pushes the boundaries of using computational biology to help experimental biology rather than write a Perl one-liner to merge two gene lists for the 800th time. We have a lot of clients, and if we just give them the solution, then we'll need to do it again the next time they have a marginally different problem. (In some ways, you can think of the Scriptome as just a FAQ.) -Amir From hancy at gene.ucl.ac.be Thu Mar 10 11:58:19 2005 From: hancy at gene.ucl.ac.be (hancy) Date: Thu Mar 10 11:53:21 2005 Subject: RES: [Bioperl-l] Mysql columns and Blast evalues Message-ID: <42307CAB.6080909@gene.ucl.ac.be> I'm using DOUBLE UNSIGNED to store blast evalues in MySQL, which allows you to store 0 and positive values between ~2.2e-308 to 1.8e308. Anything with an evalue between 0 and e-308 will be set to 0 automatically (it shouldn't be a great loss of information anyway). Hope this can help, fred. From chad at dieselwurks.com Thu Mar 10 12:21:01 2005 From: chad at dieselwurks.com (Chad Matsalla) Date: Thu Mar 10 12:34:17 2005 Subject: [Bioperl-l] Aggressive aggregation? In-Reply-To: <3503c6582ad58219fe9c590fe09a0f46@pcbi.upenn.edu> References: <3503c6582ad58219fe9c590fe09a0f46@pcbi.upenn.edu> Message-ID: On Wed, 9 Mar 2005, Aaron J. Mackey wrote: > > chr1 aafcest HSP 200 275 . - . Target > > "Sequence:chad1" 200 275 > > chr1 aafcest HSP 300 450 . - . Target > > "Sequence:chad1" 300 450 > > chr1 aafcest match 200 450 . - . Target > > "Sequence:chad1" 200 450 > > > These need to be Target "Sequence:chad1-1" and "Sequence:chad1-2" or > some such. This also means that if you're saving the ESTs in the > database (for sequence alignment display), you'll have to save them > redundantly under chad1-1, chad1-2, etc. This is horrible. I want to fix this. > Now, you could write a custom aggregator that de-aggregated multiple > chad1 "match" features, assigning the contained HSPs to each, but there > is no such "default" behavior. Let me know if there's general interest > for this ... I think there is, and I volunteer to write it. I'm new to the Bio::DB subsystem but I'm eager to dive in. Can you help me by providing a general flowchart on what you'd do to create this? What should the Aggregator be called? Hmm. Bio::DB::GFF::Aggregator::manymatch ? Chad Matsalla From palmeida at igc.gulbenkian.pt Thu Mar 10 12:45:47 2005 From: palmeida at igc.gulbenkian.pt (Paulo Almeida) Date: Thu Mar 10 12:40:21 2005 Subject: [Bioperl-l] Request for advice and pointers on a project to h elp biologists d o simple formatting and analysis In-Reply-To: <339D68B133EAD311971E009027DC4797022FD7E1@montecarlo.cgr.harvard.edu> References: <339D68B133EAD311971E009027DC4797022FD7E1@montecarlo.cgr.harvard.edu> Message-ID: <28072.192.168.50.3.1110476747.squirrel@webmail.igc.gulbenkian.pt> > I've already met (smart, talented) people who are counting lines in Excel You mean there are other ways of doing that?! Good God! All these years that I've been counting 1000s of lines by hand... >> if not they hire someone who knows. > > We have a lot of clients, and if we just give them the solution, then > we'll need to do it again the next time they have a marginally different > problem. Profit! Seriously though, I work in a lab where everyone else is an experimentalist biologist and I think there may be demand for a project like yours. The only doubt I have is whether those people who count by hand, or give up, will make the effort to find the Scriptome project. I suppose that is a marketing issue, and the debate on demand is mostly based on different personal experiences. If you think it's worth it (if only to help the people you work with), and go ahead with it, I can help you with some simple scripts I have (to split files, get a certain column in a csv file, etc). Paulo From letondal at pasteur.fr Thu Mar 10 12:54:13 2005 From: letondal at pasteur.fr (Catherine Letondal) Date: Thu Mar 10 12:44:39 2005 Subject: [Bioperl-l] Request for advice and pointers on a project to help biologists do simple formatting and analysis In-Reply-To: <339D68B133EAD311971E009027DC4797022FD7DA@montecarlo.cgr.harvard.edu> References: <339D68B133EAD311971E009027DC4797022FD7DA@montecarlo.cgr.harvard.edu> Message-ID: Hi, On Mar 10, 2005, at 4:08 PM, Amir Karger wrote: > 2. I asked in my original mail: "Are there any other lists I should > post > these questions to?" but no one has suggested any lists or newsgroups > yet You can discuss your execellent idea here: edu-sig@python.org (it's not really a python discussion list, they discuss general end-user issues) -- Catherine Letondal -- Institut Pasteur From cjfields at uiuc.edu Thu Mar 10 14:07:21 2005 From: cjfields at uiuc.edu (Chris Fields) Date: Thu Mar 10 14:02:10 2005 Subject: [Bioperl-l] Re: Request for advice and pointers on a project to h elp biologists d o simple formatting and analysis In-Reply-To: <42306A9C.5030908@mail.nih.gov> References: <339D68B133EAD311971E009027DC4797022FD7DA@montecarlo.cgr.harvard.edu> <42306A9C.5030908@mail.nih.gov> Message-ID: <6.1.1.1.2.20050310100611.01b44938@express.cites.uiuc.edu> I completely disagree. I am a biologist first and a programmer second (shock!!!), though I believe that many full-time bioinformaticians would agree with my view. I could spend an enormous amount of time trying to accomplish a routine repetitive task in Perl, Java, or whatever your language of choice is. However, the Bioperl (and OpenBio) community has made my job much easier. ANY contribution, whether it is a module, package, or a script, is helpful, as long as someone can use it. Furthermore, To make the snap judgement that every biologist entering the field already comes equipped with the tools is a bit short-sighted and naive. I do agree that there are many "non-specific" tools (i.e. multiple methods for phylogenetic analysis, multiple alignment, etc), but I think that any person worth their salt would find that to be a benefit and not a problem. I personally like having multiple methods available. I could also make the argument that the "experts" in the field, if they live up to that title, can actually design the tools for their specific (specialist) needs. Who better knows their specific needs anyway. In other words, why hire a carpenter to do the plumbing? It doesn't make much sense to have somebody with little to no knowledge on RNA structure, for example, to design an algorithm ad hoc for another RNA structure expert. Anyway, I think we're getting a bit off topic here... My two cents, Chris At 09:41 AM 3/10/2005, Malay wrote: >Hello Amir: > >Without going into any arguments, I'll put my two cents into it. The >mentality to help out biologists is a fundamental mistake. Most of the >biologists who come into this field already knows the tricks of the game, >if not they hire someone who knows. But toolmakers in the fields believe >they have to help biologists, that's why there are too many >non-specialized tools in the field. > >Toolmakers should now concentrate on tools for specialists. There are >where the main dearth is and it requires a great effort to actually >satisfy experts in the field. Create tools for the experts if you can. > >-Malay __________________________________ Chris Fields - Postdoctoral Researcher Lab of Dr. Robert Switzer Address: University of Illinois at Urbana-Champaign Dept. of Biochemistry - 323 RAL 600 S. Mathews Ave. Urbana, IL 61801 Phone : (217) 333-7098 Fax : (217) 244-5858 From jswanson at iastate.edu Thu Mar 10 14:03:39 2005 From: jswanson at iastate.edu (Jordan Swanson) Date: Thu Mar 10 14:04:12 2005 Subject: [Bioperl-l] Proposal for bio-perl updates: ACE assembly file In-Reply-To: <20050310134814.GE27364@iib.unsam.edu.ar> References: <200502141205.52256.jswanson@iastate.edu> <4f12c65ac919697fd8a7e9220db182fd@tigem.it> <20050310134814.GE27364@iib.unsam.edu.ar> Message-ID: <200503101303.40339.jswanson@iastate.edu> On Thursday 10 March 2005 07:48 am, Fernan Aguero wrote: > +----[ Elia Stupka (01.Mar.2005 10:17): > | Hi Jordan, > | > | I have been doing some work on Contig::Assembly myself recently, and > | have also been in touch with the author (Robson) about it. Perhaps the > | best thing would be for the three of us to have a chat about this > | object, try to revamp it a little with our improvements, and then > | Robson or I can check it in? > | > | regards, > | > | Elia > > +----] > > Hi! > > We have just got a need to produce .ace files and noticed > that this functionality was lacking. > > I also saw the recent thread about this topic on the list. > Question: has this moved forward since the last message was > posted (March 1st)? > > If so, are the proposed changes in a form that can be > applied and tested by others (a recursive diff, perhaps > against a recent CVS checkout or against the 1.5-release) I sent a copy of the code I have written to Elia and Robson. Robson mentioned that he was extremely busy this month, and that he would be willing to discuss it at a later time. Later today, I can send you a zipped up file (my diff-skills are non-existant, so I can't do it that way without poring through the manual) of the code we have been using, which is working well for the features that our lab uses. Of course, I would appreciate any feedback that you could offer, as well. --- Jordan M Swanson Department of Ecology, Evolution, and Organismal Biology 431 Bessey Hall Iowa State University Ames, IA 50011 Lab 515 294-7098 FAX: 515-294-1337 From echuong at gmail.com Thu Mar 10 15:47:17 2005 From: echuong at gmail.com (Edward Chuong) Date: Thu Mar 10 15:42:05 2005 Subject: [Bioperl-l] PAML nssites model result object In-Reply-To: <4ad236e4a716973b61ce63f1aa251a31@duke.edu> References: <244d2e0e050309142370997ce4@mail.gmail.com> <896034a8342912841a4a0d0a0686353e@duke.edu> <244d2e0e0503091537d5f283d@mail.gmail.com> <4ad236e4a716973b61ce63f1aa251a31@duke.edu> Message-ID: <244d2e0e050310124735d62b56@mail.gmail.com> Hi, I think the problem is that the $result object, according to Dumper, doesn't store any ModelResult (NSSite_results), so the for loop condition in this code ($result->get_NSSite_results) is never true. Is this working on a mlc file that you have, and if so, can you send it so I can see if it's a problem on my side? Thanks -Ed On Thu, 10 Mar 2005 09:48:02 -0500, Jason Stajich wrote: > The script needs to be adjusted for NSsites because their are trees are > associated with each model result so you need one more loop on the > get_NSSite_results. I added some code to the script to print out the > positively selected sites as well. > > #!/usr/bin/perl -w > use strict; > use Bio::Tools::Phylo::PAML; > > my $outcodeml = shift(@ARGV); > > my $paml_parser = new Bio::Tools::Phylo::PAML(-file => "./$outcodeml", > -dir => "./"); > my $result = $paml_parser->next_result(); > my $MLmatrix = $result->get_MLmatrix(); # get MaxLikelihood Matrix > my @otus = $result->get_seqs; > # process the NSsites results > for my $ns_result ( $result->get_NSSite_results ) { > print "model ", $ns_result->model_num, " ", > $ns_result->model_description, "\n"; > while ( my $tree = $ns_result->next_tree ) { > for my $node ( $tree->get_nodes ) { > my $id; > if( $node->is_Leaf() ) { > $id = $node->id; > } else { > $id = "(".join(",", map { $_->id } grep { $_->is_Leaf } > $node->get_all_Descendents) .")"; > } > if( ! $node->ancestor || ! $node->has_tag('t') ) { > # skip when no values have been associated with this node > # (like the root node) > next; > } > printf > "%s\tt=%.3f\tS=%.1f\tN=%.1f\tdN/ > dS=%.4f\tdN=%.4f\tdS=%.4f\tS*dS=%.1f\tN*dN=%.1f\n", > $id, > map { ($node->get_tag_values($_))[0] } > qw(t S N dN/dS dN dS), 'S*dS', 'N*dN'; > } > } > print "positively selected sites:\n"; > # get the positively select sites > for my $site ( $ns_result->get_pos_selected_sites ) { > print join(" ", @$site, "\n"); > } > print "\n"; > } > > -- > Jason Stajich > jason.stajich at duke.edu > http://www.duke.edu/~jes12/ > > On Mar 9, 2005, at 6:37 PM, Edward Chuong wrote: > > > Hi Jason, > > > > Thanks for the help. > > > > The code seems to get stuck at > > > > if( ! $node->ancestor || ! $node->has_tag('t') ) { > > (this condition turns out true for every node, not just root, so it > > always hits "next") > > > > I used Data::Dumper to check on the node and I've pasted the > > results--it seems like those tags aren't being sent in? > > > > > > Thanks! > > -Ed > > > > '_root_cleanup_methods' => [ > > sub { "DUMMY" } > > ], > > '_creation_id' => 0, > > '_branch_length' => '0.613722', > > '_desc' => {}, > > '_id' => 'NP_033437.2_mus', > > '_ancestor' => bless( { > > '_root_cleanup_methods' => [ > > $VAR1->{'_root_cleanup_methods'}[0] > > ], > > '_creation_id' => 3, > > '_desc' => { > > '2' => bless( { > > '_root_cleanup_methods' => [ > > $VAR1->{'_root_cleanup_methods'}[0] > > ], > > '_creation_id' => 2, > > '_branch_length' => '0.768322', > > '_desc' => {}, > > '_id' => 'PM_BWp0001H02f', > > '_ancestor' => $VAR1->{'_ancestor'}, > > '_root_verbose' => 0 > > }, 'Bio::Tree::Node' ), > > '0' => $VAR1, > > '1' => bless( { > > '_root_cleanup_methods' => [ > > $VAR1->{'_root_cleanup_methods'}[0] > > ], > > '_creation_id' => 1, > > '_branch_length' => '0.366319', > > '_desc' => {}, > > '_id' => 'NP_742070.1_rat', > > '_ancestor' => $VAR1->{'_ancestor'}, > > '_root_verbose' => 0 > > }, 'Bio::Tree::Node' ) > > }, > > '_id' => '', > > '_height' => undef, > > '_root_verbose' => 0 > > }, 'Bio::Tree::Node' ), > > '_root_verbose' => 0 > > }, 'Bio::Tree::Node' ); > > > > > > On Wed, 9 Mar 2005 18:01:34 -0500, Jason Stajich > > wrote: > >> Resend with code pasted.... > >> > >> #!/usr/bin/perl -w > >> use strict; > >> use Bio::Tools::Phylo::PAML; > >> > >> my $outcodeml = shift(@ARGV); > >> > >> my $paml_parser = new Bio::Tools::Phylo::PAML(-file => "./$outcodeml", > >> -dir => "./"); > >> my $result = $paml_parser->next_result(); > >> my $MLmatrix = $result->get_MLmatrix(); # get MaxLikelihood Matrix > >> my @otus = $result->get_seqs; > >> if( $#{$MLmatrix} < 0 ) { > >> for my $tree ($result->next_tree ) { > >> for my $node ( $tree->get_nodes ) { > >> my $id; > >> if( $node->is_Leaf() ) { > >> $id = $node->id; > >> } else { > >> $id = "(".join(",", map { $_->id } grep { $_->is_Leaf > >> } > >> $node->get_all_Descendents) .")"; > >> } > >> if( ! $node->ancestor || ! $node->has_tag('t') ) { > >> # skip when no values have been associated with this > >> node > >> # (like the root node) > >> next; > >> } > >> # I know this looks complicated > >> # but we use the get_tag_values method to pull out the > >> annotations > >> # for each branch > >> # The ()[0] around the call is because get_tag_values > >> returns a > >> list > >> # if we want to just get the 1st item in the list we have > >> # to tell Perl we are treating it like an array. > >> # in the future get_tag_values needs to be smart and just > >> # return the 1st item in the array if called in scalar > >> # context > >> > >> printf > >> "%s\tt=%.3f\tS=%.1f\tN=%.1f\tdN/ > >> dS=%.4f\tdN=%.4f\tdS=%.4f\tS*dS=%.1f\tN*dN=%.1f\n", > >> $id, > >> map { ($node->get_tag_values($_))[0] } > >> qw(t S N dN/dS dN dS), 'S*dS', 'N*dN'; > >> } > >> } > >> } else { > >> my $i =0; > >> my @seqs = $result->get_seqs; > >> for my $row ( @$MLmatrix ) { > >> print $seqs[$i++]->display_id, join("\t",@$row), "\n"; > >> } > >> } > >> > >> On Mar 9, 2005, at 5:41 PM, Jason Stajich wrote: > >> > >>> I just updated things last week so this is brand-spanking-new. I > >>> don't know if I connected everything up for NSsites stuff quite yet > >>> as that is handled in - the branch-specific parsing should work now. > >>> I don't know if the synopsis code is really up to snuff either. When > >>> I get around to it I will try and see what still needs to be > >>> connected > >>> in NSsites parsing. > >>> > >>> I don't think $node->param() is going to work - > >>> $node->get_tag_values() is the way I've implemented it. > >>> > >>> <00parse_codeml.pl> > >>> > >>> -jason > >>> -- > >>> Jason Stajich > >>> jason.stajich at duke.edu > >>> http://www.duke.edu/~jes12/ > >>> > >>> On Mar 9, 2005, at 5:23 PM, Edward Chuong wrote: > >>> > >>>> Hi all, > >>>> > >>>> I'm trying to parse PAML results, and running into some trouble. I'm > >>>> using branch specific omega model, and I want to get the branch > >>>> specific ka/ks values out. > >>>> http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/ > >>>> Tools/Phylo/PAML.pm?rev=HEAD&cvsroot=bioperl&content-type=text/ > >>>> vnd.viewcvs-markup > >>>> says that $node->param('omega') should work, but Data::Dumper shows > >>>> that this value isn't stored in the node (only branch lengths and > >>>> seq > >>>> IDs appear to be stored). > >>>> > >>>> I'm assuming that I can get these values out of the > >>>> get_NSSite_result() Bio::Tools::Phylo::PAML::ModelResult object, but > >>>> I'm not sure how to call it. The current synopsis uses > >>>> "get_model_params" but it seems to be out of date because it's not > >>>> in > >>>> the current souce. The docs at > >>>> http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/ > >>>> Tools/Phylo/PAML/Result.pm?rev=HEAD&cvsroot=bioperl&content- > >>>> type=text/vnd.viewcvs-markup > >>>> say to use my > >>>> @results = @{$self->get_NSSite_results}; > >>>> --that looks like a mistake, and I've tried > >>>> @result = $result->get_NSSite_results but that doesn't work either > >>>> (just get undefined objs). > >>>> > >>>> Am I doing something wrong, or is this functionality still being > >>>> worked on? I've tried using both 1.4 and the LIVE versions. Any help > >>>> is appreciated, thanks! > >>>> > >>>> -Ed > >>>> _______________________________________________ > >>>> Bioperl-l mailing list > >>>> Bioperl-l@portal.open-bio.org > >>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l > >>>> > >> > >> > > > > > > -- > > Edward Chuong > > (949) 939-2732 > > AIM: edawad85 > > > > -- Edward Chuong (949) 939-2732 AIM: edawad85 From davila at ioc.fiocruz.br Thu Mar 10 16:36:15 2005 From: davila at ioc.fiocruz.br (Alberto Davila) Date: Thu Mar 10 16:21:53 2005 Subject: RES: [Bioperl-l] Mysql columns and Blast evalues In-Reply-To: <42307CAB.6080909@gene.ucl.ac.be> References: <42307CAB.6080909@gene.ucl.ac.be> Message-ID: <1110490575.7016.25.camel@kineto> Thanks to all of you guys. Double unsigned worked very well with me ;-) Cheers, Alberto On Thu, 2005-03-10 at 17:58 +0100, hancy wrote: > I'm using DOUBLE UNSIGNED to store blast evalues in MySQL, which allows > you to store 0 and positive values between ~2.2e-308 to 1.8e308. > Anything with an evalue between 0 and e-308 will be set to 0 > automatically (it shouldn't be a great loss of information anyway). > > Hope this can help, > fred. From Peter.Robinson at t-online.de Thu Mar 10 16:28:23 2005 From: Peter.Robinson at t-online.de (Peter.Robinson@t-online.de) Date: Thu Mar 10 16:21:58 2005 Subject: [Bioperl-l] Entrez Gene ASN In-Reply-To: <42305650.30403@utk.edu> References: <42305650.30403@utk.edu> Message-ID: <20050310212823.GB5392@anna> On Thu, Mar 10, 2005 at 09:14:40AM -0500, Stefan Kirov wrote: > Hi guys! > I have done some (mostly) serious thinking about ASN Entrez Gene parsing > and I propose we do my favorite thing- postpone everything we cannot > deal with right now. If you want it to sound better: take a gradual > approach where we store the data we can deal with in the existing > Bioperl objects and skipping the rest for now. > In details: > ASN gene record can be correctly represented as a tree. I have written a > simple parser for my own purposes which is storing the following: > node_id---| > --parent > --level > --tag > --values > What I do then is get specific levels and tags and build different > objects. So level 2 with parent EntrezGene (which is the root level and > has no information) is gene description and has tags such as gene, name, > etc; at level 3, 5 and 6 you can get the complete specie definition by > looking for orgname and org as tags and records with parent mod (which > is a value for orgname, descend down the branch). > I am using this approach to store most of the data in a relational > database without going through Bioperl. What I ultimately want to do is > use standard Bioperl modules. However, I don't think we have an object > that can efficiently represent the structure (correct me if I am wrong). > I think it may be a good idea to have a container object, possibly > Bio::Gene that may contain multiple Bio::Seq objects (with or without > real sequence). I believe we can borrow some structure and code from > EnsEMBL gene representation (way to contain multiple transcripts, etc., > not the database interactions certainly). > Please let me know what you think. > Stefan Hi Stefan, from the work I have done on this issue it would seem that your suggestion is quite promising. Let me know if you need some help on this. How is the performance that you are seeing to date? best, Peter > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From Ned.Young at tufts.edu Thu Mar 10 16:22:54 2005 From: Ned.Young at tufts.edu (Ned Young) Date: Thu Mar 10 16:29:30 2005 Subject: [Bioperl-l] naive question about Bio::Tools::Primer3 Message-ID: <90B8234D-91AA-11D9-B249-000D93ADEE80@tufts.edu> Dear All, I was wondering if you could help me. I'm not very experienced using bioperl objects and therefore have a question. How do I best get the individual result lines from the primer3 output file, using Bio::Tools::Primer3? I'll include the script I tried and the file it parsed. When I run it, I get "HASH(0xccfc)". #!/usr/bin/perl -w use lib "/Users/Ned/Documents/Perl/bioperl_source/bioperl-1.4"; use Bio::AlignIO; use Bio::Tools::Primer3;# read a primer3 output file my $primer3=Bio::Tools::Primer3->new(-file=>"p3test1.out"); #put the left- and right-primer stuff into hashes. my $primer=$primer3->next_primer; print "The right primer in the stream is ", $primer->get_primer('-right_primer')->seq->seq, "\n"; # to return results print $primer3->primer_results(0,'PRIMER_LEFT_INPUT'); primer3_core output file: PRIMER_SEQUENCE_ID=test01 SEQUENCE=ACTTGATATAGCGTAAATCGATTTGCAGAGATCAACTTGCTATAACGTAACTCGATTGCAATG ATGCTTAGCCATGCGTAGTCTGATCCTGATGCCGTGATGGCACTCATGGCGTACTCTATGAGAGTC PRIMER_LEFT_INPUT=ACTTGATATAGCGTAAATCG PRIMER_RIGHT_INPUT=GACTCTCATAGAGTACGCCA TARGET=21,1 PRIMER_PAIR_MAX_MISPRIMING=12 PRIMER_PAIR_MAX_TEMPLATE_MISPRIMING=24 PRIMER_PRODUCT_SIZE_RANGE=70-129 PRIMER_OPT_SIZE=20 PRIMER_MIN_SIZE=15 PRIMER_MAX_SIZE=36 PRIMER_PICK_ANYWAY=1 PRIMER_FILE_FLAG=1 PRIMER_EXPLAIN_FLAG=1 PRIMER_ERROR=1 PRIMER_WARNING=Left primer is unacceptable: Tm too low/High end self complementarity; Right primer is unacceptable: Tm too low/High end self complementarity PRIMER_PAIR_EXPLAIN=considered 1, ok 1 PRIMER_PAIR_PENALTY=17.0819 PRIMER_LEFT_PENALTY=10.012436 PRIMER_RIGHT_PENALTY=7.069468 PRIMER_LEFT_SEQUENCE=ACTTGATATAGCGTAAATCG PRIMER_RIGHT_SEQUENCE=GACTCTCATAGAGTACGCCA PRIMER_LEFT=0,20 PRIMER_RIGHT=128,20 PRIMER_LEFT_TM=49.988 PRIMER_RIGHT_TM=52.931 PRIMER_LEFT_GC_PERCENT=35.000 PRIMER_RIGHT_GC_PERCENT=50.000 PRIMER_LEFT_SELF_ANY=6.00 PRIMER_RIGHT_SELF_ANY=8.00 PRIMER_LEFT_SELF_END=4.00 PRIMER_RIGHT_SELF_END=4.00 PRIMER_LEFT_END_STABILITY=8.6000 PRIMER_RIGHT_END_STABILITY=11.7000 PRIMER_LEFT_TEMPLATE_MISPRIMING=14.0000 PRIMER_RIGHT_TEMPLATE_MISPRIMING=8.0000 PRIMER_PAIR_COMPL_ANY=5.00 PRIMER_PAIR_COMPL_END=3.00 PRIMER_PRODUCT_SIZE=129 PRIMER_PAIR_TEMPLATE_MISPRIMING=22.00 = Yours truly, Ned Young Department of Biomedical Sciences Division of Infectious Diseases Tufts University School of Veterinary Medicine 200 Westboro Rd. N. Grafton, MA 01536 508-887-4540 From pmiguel at purdue.edu Thu Mar 10 17:35:01 2005 From: pmiguel at purdue.edu (Phillip San Miguel) Date: Thu Mar 10 17:30:07 2005 Subject: [Bioperl-l] Living on the edg with 1.5? Message-ID: <4230CB95.2000701@purdue.edu> Because GBROWSE wants bioperl v 1.5 I'm moved to that on most of the platforms I use. But I've noticed two (presumably) unrelated glitches just in using it today. Should I really be using v 1.4? I'm not a developer, and don't really welcome additional headaches. Advice? Here are the bugs: perl -e 'use Bio::Perl; $seq_object = get_sequence("genbank","u11059"); write_sequence(">test","genbank",$seq_object);' fetches U11059 from genbank and tries to print in genbank format but something is wrong in v 1.5. Running Version 1.5: ... FEATURES Location/Qualifiers source 1..7313 /mol_type="Bio::Annotation::SimpleValue=HASH(0x9b3870)" /tissue_type="Bio::Annotation::SimpleValue=HASH(0x9b38b8)" /db_xref="Bio::Annotation::SimpleValue=HASH(0x9b3828)" /transposon="Bio::Annotation::SimpleValue=HASH(0x9b0968)" /strain="Bio::Annotation::SimpleValue=HASH(0x9b3900)" /chromosome="Bio::Annotation::SimpleValue=HASH(0x9b3948)" /organism="Bio::Annotation::SimpleValue=HASH(0x9b3990)" LTR 1..649 /label=Bio::Annotation::SimpleValue=HASH(0x9aefd4) TATA_signal 304..310 /label=Bio::Annotation::SimpleValue=HASH(0x9b55fc) misc_feature 651..659 /label=Bio::Annotation::SimpleValue=HASH(0x9b5764) ... Running Version 1.4 (no problem): ... FEATURES Location/Qualifiers source 1..7313 /transposon="retrotransposon" /mol_type="genomic DNA" /db_xref="taxon:4577" /tissue_type="leaf" /strain="A188" /chromosome="7" /organism="Zea mays" LTR 1..649 /label=upstreamLTR TATA_signal 304..310 /label=upstream misc_feature 651..659 /label=PBSsite ... The other 1.5 bug I found today: The following one-liner demonstrates it: perl -e 'use Bio::SeqIO;use Bio::Seq::PrimaryQual; $qual_object = Bio::Seq::PrimaryQual->new(-qual=> "10 20 30 40 50 40 30 20 10", -id => "test", -format => 'qual'); $qual_out = Bio::SeqIO->new(-file => ">test", -format => 'qual');$qual_out->write_seq($qual_object);' When I run it under Version 1.5 the correct output file is produced but I also get the following output sent to STDOUT: '_root_verbose' => 0 'display_id' => 'test' 'qual' => ARRAY(0x5b07e8) 0 10 1 20 2 30 3 40 4 50 5 40 6 30 7 20 8 10 Under Version 1.4 everything is fine. (No extraneous STDOUT is created.) This looks like someone uncommented a Data::Dumper print somewhere, but I wasn't able to find it. -- Phillip SanMiguel Purdue Genomics Core Facility From sanges at biogem.it Thu Mar 10 17:49:01 2005 From: sanges at biogem.it (Remo Sanges) Date: Thu Mar 10 17:43:42 2005 Subject: [Bioperl-l] Living on the edg with 1.5? In-Reply-To: <4230CB95.2000701@purdue.edu> References: <4230CB95.2000701@purdue.edu> Message-ID: On Mar 10, 2005, at 11:35 PM, Phillip San Miguel wrote: > > Because GBROWSE wants bioperl v 1.5 I'm moved to that on most of > the platforms I use. But I've noticed two (presumably) unrelated > glitches just in using it today. Should I really be using v 1.4? I'm > not a developer, and don't really welcome additional headaches. > Advice? I'm in the same situation and basically I'm using bioperl v 1.5 with GBROWSE v 1.62 and bioperl v 1.4 for my production programs... Other advices and/or suggestions? Thanks Remo From echuong at gmail.com Thu Mar 10 18:23:53 2005 From: echuong at gmail.com (Edward Chuong) Date: Thu Mar 10 18:19:58 2005 Subject: [Bioperl-l] PAML nssites model result object In-Reply-To: References: <244d2e0e050309142370997ce4@mail.gmail.com> <896034a8342912841a4a0d0a0686353e@duke.edu> <244d2e0e0503091537d5f283d@mail.gmail.com> <4ad236e4a716973b61ce63f1aa251a31@duke.edu> <244d2e0e050310124735d62b56@mail.gmail.com> Message-ID: <244d2e0e05031015235d8a64e@mail.gmail.com> Hey, Some progress when I use your file: it now does return a PAML::Result object, but there's an error Can't use an undefined value as an ARRAY reference at /Library/Perl/5.8.1/Bio/Tools/Phylo/PAML/ModelResult.pm line 308, line 329. which appears because it's trying to access positive selection array, which doesn't exist for the nssites = 0 object-- there is no "dnds_site_classes" which the other nssites models ( = 1 or =2 etc) have, even though I think there should be? I can't find where these values are stored--they don't appear to be stored on the individual nodes as the documentation would suggest. I've found that when I specify only one model for nssites, (nssites = 0 or 1 or 2), the get_NSSites code doesn't work, but if I specify more than one for the PAML run it does. I've sent the mlc files I have in a private e-mail, if you have time to check Thanks so much! -Ed On Thu, 10 Mar 2005 16:38:59 -0500, Jason Stajich wrote: > I'm using t/data/codeml_nssites.mlc > > -- > Jason Stajich > jason.stajich at duke.edu > http://www.duke.edu/~jes12/ > > On Mar 10, 2005, at 3:47 PM, Edward Chuong wrote: > > > Hi, > > > > I think the problem is that the $result object, according to Dumper, > > doesn't store any ModelResult (NSSite_results), so the for loop > > condition in this code ($result->get_NSSite_results) is never true. Is > > this working on a mlc file that you have, and if so, can you send it > > so I can see if it's a problem on my side? > > > > Thanks > > -Ed > > > > > > On Thu, 10 Mar 2005 09:48:02 -0500, Jason Stajich > > wrote: > >> The script needs to be adjusted for NSsites because their are trees > >> are > >> associated with each model result so you need one more loop on the > >> get_NSSite_results. I added some code to the script to print out the > >> positively selected sites as well. > >> > >> #!/usr/bin/perl -w > >> use strict; > >> use Bio::Tools::Phylo::PAML; > >> > >> my $outcodeml = shift(@ARGV); > >> > >> my $paml_parser = new Bio::Tools::Phylo::PAML(-file => "./$outcodeml", > >> -dir => "./"); > >> my $result = $paml_parser->next_result(); > >> my $MLmatrix = $result->get_MLmatrix(); # get MaxLikelihood Matrix > >> my @otus = $result->get_seqs; > >> # process the NSsites results > >> for my $ns_result ( $result->get_NSSite_results ) { > >> print "model ", $ns_result->model_num, " ", > >> $ns_result->model_description, "\n"; > >> while ( my $tree = $ns_result->next_tree ) { > >> for my $node ( $tree->get_nodes ) { > >> my $id; > >> if( $node->is_Leaf() ) { > >> $id = $node->id; > >> } else { > >> $id = "(".join(",", map { $_->id } grep { $_->is_Leaf > >> } > >> $node->get_all_Descendents) .")"; > >> } > >> if( ! $node->ancestor || ! $node->has_tag('t') ) { > >> # skip when no values have been associated with this > >> node > >> # (like the root node) > >> next; > >> } > >> printf > >> "%s\tt=%.3f\tS=%.1f\tN=%.1f\tdN/ > >> dS=%.4f\tdN=%.4f\tdS=%.4f\tS*dS=%.1f\tN*dN=%.1f\n", > >> $id, > >> map { ($node->get_tag_values($_))[0] } > >> qw(t S N dN/dS dN dS), 'S*dS', 'N*dN'; > >> } > >> } > >> print "positively selected sites:\n"; > >> # get the positively select sites > >> for my $site ( $ns_result->get_pos_selected_sites ) { > >> print join(" ", @$site, "\n"); > >> } > >> print "\n"; > >> } > >> > >> -- > >> Jason Stajich > >> jason.stajich at duke.edu > >> http://www.duke.edu/~jes12/ > >> > >> On Mar 9, 2005, at 6:37 PM, Edward Chuong wrote: > >> > >>> Hi Jason, > >>> > >>> Thanks for the help. > >>> > >>> The code seems to get stuck at > >>> > >>> if( ! $node->ancestor || ! $node->has_tag('t') ) { > >>> (this condition turns out true for every node, not just root, so it > >>> always hits "next") > >>> > >>> I used Data::Dumper to check on the node and I've pasted the > >>> results--it seems like those tags aren't being sent in? > >>> > >>> > >>> Thanks! > >>> -Ed > >>> > >>> '_root_cleanup_methods' => [ > >>> sub { "DUMMY" } > >>> ], > >>> '_creation_id' => 0, > >>> '_branch_length' => '0.613722', > >>> '_desc' => {}, > >>> '_id' => 'NP_033437.2_mus', > >>> '_ancestor' => bless( { > >>> '_root_cleanup_methods' => [ > >>> > >>> $VAR1->{'_root_cleanup_methods'}[0] > >>> ], > >>> '_creation_id' => 3, > >>> '_desc' => { > >>> '2' => bless( { > >>> '_root_cleanup_methods' => [ > >>> > >>> $VAR1->{'_root_cleanup_methods'}[0] > >>> ], > >>> '_creation_id' => 2, > >>> '_branch_length' => > >>> '0.768322', > >>> '_desc' => {}, > >>> '_id' => 'PM_BWp0001H02f', > >>> '_ancestor' => > >>> $VAR1->{'_ancestor'}, > >>> '_root_verbose' => 0 > >>> }, 'Bio::Tree::Node' ), > >>> '0' => $VAR1, > >>> '1' => bless( { > >>> '_root_cleanup_methods' => [ > >>> > >>> $VAR1->{'_root_cleanup_methods'}[0] > >>> > >>> ], > >>> '_creation_id' => 1, > >>> '_branch_length' => > >>> '0.366319', > >>> '_desc' => {}, > >>> '_id' => 'NP_742070.1_rat', > >>> '_ancestor' => > >>> $VAR1->{'_ancestor'}, > >>> '_root_verbose' => 0 > >>> }, 'Bio::Tree::Node' ) > >>> }, > >>> '_id' => '', > >>> '_height' => undef, > >>> '_root_verbose' => 0 > >>> }, 'Bio::Tree::Node' ), > >>> '_root_verbose' => 0 > >>> }, 'Bio::Tree::Node' ); > >>> > >>> > >>> On Wed, 9 Mar 2005 18:01:34 -0500, Jason Stajich > >>> wrote: > >>>> Resend with code pasted.... > >>>> > >>>> #!/usr/bin/perl -w > >>>> use strict; > >>>> use Bio::Tools::Phylo::PAML; > >>>> > >>>> my $outcodeml = shift(@ARGV); > >>>> > >>>> my $paml_parser = new Bio::Tools::Phylo::PAML(-file => > >>>> "./$outcodeml", > >>>> -dir => "./"); > >>>> my $result = $paml_parser->next_result(); > >>>> my $MLmatrix = $result->get_MLmatrix(); # get MaxLikelihood Matrix > >>>> my @otus = $result->get_seqs; > >>>> if( $#{$MLmatrix} < 0 ) { > >>>> for my $tree ($result->next_tree ) { > >>>> for my $node ( $tree->get_nodes ) { > >>>> my $id; > >>>> if( $node->is_Leaf() ) { > >>>> $id = $node->id; > >>>> } else { > >>>> $id = "(".join(",", map { $_->id } grep { > >>>> $_->is_Leaf > >>>> } > >>>> $node->get_all_Descendents) .")"; > >>>> } > >>>> if( ! $node->ancestor || ! $node->has_tag('t') ) { > >>>> # skip when no values have been associated with this > >>>> node > >>>> # (like the root node) > >>>> next; > >>>> } > >>>> # I know this looks complicated > >>>> # but we use the get_tag_values method to pull out the > >>>> annotations > >>>> # for each branch > >>>> # The ()[0] around the call is because get_tag_values > >>>> returns a > >>>> list > >>>> # if we want to just get the 1st item in the list we > >>>> have > >>>> # to tell Perl we are treating it like an array. > >>>> # in the future get_tag_values needs to be smart and > >>>> just > >>>> # return the 1st item in the array if called in scalar > >>>> # context > >>>> > >>>> printf > >>>> "%s\tt=%.3f\tS=%.1f\tN=%.1f\tdN/ > >>>> dS=%.4f\tdN=%.4f\tdS=%.4f\tS*dS=%.1f\tN*dN=%.1f\n", > >>>> $id, > >>>> map { ($node->get_tag_values($_))[0] } > >>>> qw(t S N dN/dS dN dS), 'S*dS', 'N*dN'; > >>>> } > >>>> } > >>>> } else { > >>>> my $i =0; > >>>> my @seqs = $result->get_seqs; > >>>> for my $row ( @$MLmatrix ) { > >>>> print $seqs[$i++]->display_id, join("\t",@$row), "\n"; > >>>> } > >>>> } > >>>> > >>>> On Mar 9, 2005, at 5:41 PM, Jason Stajich wrote: > >>>> > >>>>> I just updated things last week so this is brand-spanking-new. I > >>>>> don't know if I connected everything up for NSsites stuff quite yet > >>>>> as that is handled in - the branch-specific parsing should work > >>>>> now. > >>>>> I don't know if the synopsis code is really up to snuff either. > >>>>> When > >>>>> I get around to it I will try and see what still needs to be > >>>>> connected > >>>>> in NSsites parsing. > >>>>> > >>>>> I don't think $node->param() is going to work - > >>>>> $node->get_tag_values() is the way I've implemented it. > >>>>> > >>>>> <00parse_codeml.pl> > >>>>> > >>>>> -jason > >>>>> -- > >>>>> Jason Stajich > >>>>> jason.stajich at duke.edu > >>>>> http://www.duke.edu/~jes12/ > >>>>> > >>>>> On Mar 9, 2005, at 5:23 PM, Edward Chuong wrote: > >>>>> > >>>>>> Hi all, > >>>>>> > >>>>>> I'm trying to parse PAML results, and running into some trouble. > >>>>>> I'm > >>>>>> using branch specific omega model, and I want to get the branch > >>>>>> specific ka/ks values out. > >>>>>> http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/ > >>>>>> Bio/ > >>>>>> Tools/Phylo/PAML.pm?rev=HEAD&cvsroot=bioperl&content-type=text/ > >>>>>> vnd.viewcvs-markup > >>>>>> says that $node->param('omega') should work, but Data::Dumper > >>>>>> shows > >>>>>> that this value isn't stored in the node (only branch lengths and > >>>>>> seq > >>>>>> IDs appear to be stored). > >>>>>> > >>>>>> I'm assuming that I can get these values out of the > >>>>>> get_NSSite_result() Bio::Tools::Phylo::PAML::ModelResult object, > >>>>>> but > >>>>>> I'm not sure how to call it. The current synopsis uses > >>>>>> "get_model_params" but it seems to be out of date because it's not > >>>>>> in > >>>>>> the current souce. The docs at > >>>>>> http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/ > >>>>>> Bio/ > >>>>>> Tools/Phylo/PAML/Result.pm?rev=HEAD&cvsroot=bioperl&content- > >>>>>> type=text/vnd.viewcvs-markup > >>>>>> say to use my > >>>>>> @results = @{$self->get_NSSite_results}; > >>>>>> --that looks like a mistake, and I've tried > >>>>>> @result = $result->get_NSSite_results but that doesn't work either > >>>>>> (just get undefined objs). > >>>>>> > >>>>>> Am I doing something wrong, or is this functionality still being > >>>>>> worked on? I've tried using both 1.4 and the LIVE versions. Any > >>>>>> help > >>>>>> is appreciated, thanks! > >>>>>> > >>>>>> -Ed > >>>>>> _______________________________________________ > >>>>>> Bioperl-l mailing list > >>>>>> Bioperl-l@portal.open-bio.org > >>>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l > >>>>>> > >>>> > >>>> > >>> > >>> > >>> -- > >>> Edward Chuong > >>> (949) 939-2732 > >>> AIM: edawad85 > >>> > >> > >> > > > > > > -- > > Edward Chuong > > (949) 939-2732 > > AIM: edawad85 > > > > > -- Edward Chuong (949) 939-2732 AIM: edawad85 From allenday at ucla.edu Thu Mar 10 20:36:33 2005 From: allenday at ucla.edu (Allen Day) Date: Thu Mar 10 20:31:27 2005 Subject: [Bioperl-l] Living on the edg with 1.5? In-Reply-To: <4230CB95.2000701@purdue.edu> References: <4230CB95.2000701@purdue.edu> Message-ID: This is fixed on cvs HEAD. I haven't ported the changes to the 1.5.1 bugfix branch yet. I can't comment on the other bug you report. -Allen > ... > FEATURES Location/Qualifiers > source 1..7313 > /mol_type="Bio::Annotation::SimpleValue=HASH(0x9b3870)" > > /tissue_type="Bio::Annotation::SimpleValue=HASH(0x9b38b8)" > /db_xref="Bio::Annotation::SimpleValue=HASH(0x9b3828)" > > /transposon="Bio::Annotation::SimpleValue=HASH(0x9b0968)" > /strain="Bio::Annotation::SimpleValue=HASH(0x9b3900)" > > /chromosome="Bio::Annotation::SimpleValue=HASH(0x9b3948)" > /organism="Bio::Annotation::SimpleValue=HASH(0x9b3990)" > LTR 1..649 > /label=Bio::Annotation::SimpleValue=HASH(0x9aefd4) > TATA_signal 304..310 > /label=Bio::Annotation::SimpleValue=HASH(0x9b55fc) > misc_feature 651..659 > /label=Bio::Annotation::SimpleValue=HASH(0x9b5764) > ... From allenday at ucla.edu Thu Mar 10 20:57:36 2005 From: allenday at ucla.edu (Allen Day) Date: Thu Mar 10 20:52:29 2005 Subject: [Bioperl-l] strange error after changing to RC1.5 In-Reply-To: <422ECDDD.40404@biologie.uni-freiburg.de> References: <422ECDDD.40404@biologie.uni-freiburg.de> Message-ID: I'm unable to test the code in PersistentObject.pm as I don't have biosql set up, but you might try adding this to Reference.pm use overload 'ne' => sub { "$_[0]" ne "$_[1]" } Please let me know if this fixes your error and I'll add this 'ne' overload to all the Bio::Annotation::* classes on HEAD. -Allen On Wed, 9 Mar 2005, Daniel Lang wrote: > Hi, > I?m retrieving seq objects from a local biosql db (using the latest cvs > verion of bioperl-db) and e.g. writing them with SeqIO. After changing > from a cvs version ~ 12/04 to RC1.5 or latest cvs version, I get the > following error: > > Operation `ne': no method found,!!left argument in overloaded package > Bio::Annotation::Reference,!!right argument has no overloaded magic at > /usr/lib/perl5/site_perl/5.6.1/Bio/DB/Persistent/PersistentObject.pm > line 534, line 1.! > > The module PersistentObject.pm hasn?t changed and in Reference.pm there > is only this change: > > diff bioperl-live-Dec04/Bio/Annotation/Reference.pm > bioperl-live/Bio/Annotation/Reference.pm > 1c1 > < # $Id: Reference.pm,v 1.21 2004/08/19 20:13:32 lapp Exp $ > --- > > # $Id: Reference.pm,v 1.22 2005/02/02 22:13:22 allenday Exp $ > 56c56,57 > < # use overload '""' => \&as_text; > --- > > use overload '""' => sub { $_[0]->title || ''}; > > use overload 'eq' => sub { "$_[0]" eq "$_[1]" }; > > I?ve reversed this, but no positive result - the error remains... > Any hints? > > Thanks in advance, > Daniel > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From elia at tigem.it Fri Mar 11 03:02:03 2005 From: elia at tigem.it (Elia Stupka) Date: Fri Mar 11 03:07:03 2005 Subject: [Bioperl-l] Proposal for bio-perl updates: ACE assembly file In-Reply-To: <200503101303.40339.jswanson@iastate.edu> References: <200502141205.52256.jswanson@iastate.edu> <4f12c65ac919697fd8a7e9220db182fd@tigem.it> <20050310134814.GE27364@iib.unsam.edu.ar> <200503101303.40339.jswanson@iastate.edu> Message-ID: <17ac8b6cebdb61a12a42d6cb429a3e9b@tigem.it> I was going to have a look at both the ACE issue as well as the more general issues which I have found not ideal such as: -resetting LocatableSeq coordinates (why not keep useful coords?) -Having both features and LocatableSeqs, when LocatableSeqs give you both the gapped sequence as well as the coordinates Will write in about 24 hours (I hope) with a decent outcome (I hope) Cheers, Elia On Mar 10, 2005, at 8:03 PM, Jordan Swanson wrote: > On Thursday 10 March 2005 07:48 am, Fernan Aguero wrote: >> +----[ Elia Stupka (01.Mar.2005 10:17): >> | Hi Jordan, >> | >> | I have been doing some work on Contig::Assembly myself recently, and >> | have also been in touch with the author (Robson) about it. Perhaps >> the >> | best thing would be for the three of us to have a chat about this >> | object, try to revamp it a little with our improvements, and then >> | Robson or I can check it in? >> | >> | regards, >> | >> | Elia >> >> +----] >> >> Hi! >> >> We have just got a need to produce .ace files and noticed >> that this functionality was lacking. >> >> I also saw the recent thread about this topic on the list. >> Question: has this moved forward since the last message was >> posted (March 1st)? >> >> If so, are the proposed changes in a form that can be >> applied and tested by others (a recursive diff, perhaps >> against a recent CVS checkout or against the 1.5-release) > > I sent a copy of the code I have written to Elia and Robson. Robson > mentioned > that he was extremely busy this month, and that he would be willing to > discuss it at a later time. Later today, I can send you a zipped up > file (my > diff-skills are non-existant, so I can't do it that way without poring > through the manual) of the code we have been using, which is working > well for > the features that our lab uses. Of course, I would appreciate any > feedback > that you could offer, as well. > > --- > Jordan M Swanson > Department of Ecology, Evolution, and Organismal Biology > 431 Bessey Hall > Iowa State University > Ames, IA 50011 > Lab 515 294-7098 > FAX: 515-294-1337 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > --- Telethon Institute of Genetics and Medicine Via Pietro Castellino, 111 80131 Napoli Tel. +39 081 6132 335 Fax. +39 081 560 98 77 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 2404 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050311/b4113b5d/attachment.bin From Richard.Adams at ed.ac.uk Fri Mar 11 04:19:07 2005 From: Richard.Adams at ed.ac.uk (Richard Adams) Date: Fri Mar 11 04:14:56 2005 Subject: [Bioperl-l] 1.6 release Message-ID: <4231628B.4010007@ed.ac.uk> Hello, Is there any schedule for the 1.6 release? just to know by when I have to get by modules working..... Richard -- Dr Richard Adams Psychiatric Genetics Group, Medical Genetics, Molecular Medicine Centre, Western General Hospital, Crewe Rd West, Edinburgh UK EH4 2XU Tel: 44 131 651 1084 richard.adams@ed.ac.uk From sutripa at vbi.vt.edu Thu Mar 10 11:29:53 2005 From: sutripa at vbi.vt.edu (Sucheta Tripathy) Date: Fri Mar 11 04:38:45 2005 Subject: RES: [Bioperl-l] Mysql columns and Blast evalues In-Reply-To: <8D44604203DAF9438BF9123B4A08C779575FC3@alpha.ioc.fiocruz.b r> Message-ID: <5.1.0.14.0.20050310112935.02066100@mail.vbi.vt.edu> Try storing as double. Sucheta At 06:42 AM 3/10/2005 -0300, davila wrote: >Hi Stefan, > >Thanks for the tips ! > >I guess the problem of using VARCHAR could be the limitations to compare >the real evalues, so if I want to do something or only show evalues >greater or smaller than 1e-50 would it work ok ? > >I wonder to know what other (mysql) column types (any further details >would be appreciated) colleagues are using to store their Blast evalues ? > >Thanks. > > >-----Mensagem original----- >De: Stefan Kirov [mailto:skirov@utk.edu] >Enviada: qui 10/3/2005 00:25 >Para: davila >Cc: bioperl-l@portal.open-bio.org >Assunto: Re: [Bioperl-l] Mysql columns and Blast evalues > > > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l From daniel.lang at biologie.uni-freiburg.de Fri Mar 11 05:06:19 2005 From: daniel.lang at biologie.uni-freiburg.de (Daniel Lang) Date: Fri Mar 11 05:01:05 2005 Subject: [Bioperl-l] strange error after changing to RC1.5 In-Reply-To: References: <422ECDDD.40404@biologie.uni-freiburg.de> Message-ID: <42316D9B.801@biologie.uni-freiburg.de> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi Allen, When I add the line to all Bio::Annotation::*, we run into various other errors, e.g. : Can't call method "primary_key" on an undefined value at /usr/lib/perl5/site_perl/5.6.1/Bio/DB/BioSQL/BasePersistenceAdaptor.pm line 1325.! What are these overloaded methods for? Who/What is calling ->ne()? I?ve tried using the latest cvs version with the Annotation classes from december: This error is gone but then all SeqFeature tag_values are stringified memory addresses: /clone_lib="Bio::Annotation::SimpleValue=HASH(0x76883bb8)" /tissue_type="Bio::Annotation::SimpleValue=HASH(0x76883ccc ~ )" ~ /clone="Bio::Annotation::SimpleValue=HASH(0x76883d14)" /organism="Bio::Annotation::SimpleValue=HASH(0x76883d5c)" /lab_host="Bio::Annotation::SimpleValue=HASH(0x76883dec)" /db_xref="Bio::Annotation::SimpleValue=HASH(0x76883e34)" /mol_type="Bio::Annotation::SimpleValue=HASH(0x76885360)" ~ /note="Bio::Annotation::SimpleValue=HASH(0x76883da4)" ? To make it even more complicated, I?ve dumped both seq objects (the one all with classes from dec?04 and the bioperl-live with only the Annotation classes from dec?04) there is no diff!? The Seq, SeqI, RichSeq SeqFeature::Generic objects didn?t change since then... - -Daniel Allen Day wrote: | I'm unable to test the code in PersistentObject.pm as I don't have biosql | set up, but you might try adding this to Reference.pm | | use overload 'ne' => sub { "$_[0]" ne "$_[1]" } | | Please let me know if this fixes your error and I'll add this 'ne' | overload to all the Bio::Annotation::* classes on HEAD. | | -Allen | | | On Wed, 9 Mar 2005, Daniel Lang wrote: | | |>Hi, |>I?m retrieving seq objects from a local biosql db (using the latest cvs |>verion of bioperl-db) and e.g. writing them with SeqIO. After changing |>from a cvs version ~ 12/04 to RC1.5 or latest cvs version, I get the |>following error: |> |>Operation `ne': no method found,!!left argument in overloaded package |>Bio::Annotation::Reference,!!right argument has no overloaded magic at |>/usr/lib/perl5/site_perl/5.6.1/Bio/DB/Persistent/PersistentObject.pm |>line 534, line 1.! |> |>The module PersistentObject.pm hasn?t changed and in Reference.pm there |>is only this change: |> |>diff bioperl-live-Dec04/Bio/Annotation/Reference.pm |>bioperl-live/Bio/Annotation/Reference.pm |>1c1 |>< # $Id: Reference.pm,v 1.21 2004/08/19 20:13:32 lapp Exp $ |>--- |> > # $Id: Reference.pm,v 1.22 2005/02/02 22:13:22 allenday Exp $ |>56c56,57 |>< # use overload '""' => \&as_text; |>--- |> > use overload '""' => sub { $_[0]->title || ''}; |> > use overload 'eq' => sub { "$_[0]" eq "$_[1]" }; |> |>I?ve reversed this, but no positive result - the error remains... |>Any hints? |> |>Thanks in advance, |>Daniel |> |> |> |>_______________________________________________ |>Bioperl-l mailing list |>Bioperl-l@portal.open-bio.org |>http://portal.open-bio.org/mailman/listinfo/bioperl-l |> -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (MingW32) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFCMW2bmJnbCpJAG3ARAvbAAJ966Qc8RBFbhlFL0VpVo073N1sEWgCdF7jM 56Ozp3Rl2HdHxxXipeJnx8w= =OHJA -----END PGP SIGNATURE----- From rich at thevillas.eclipse.co.uk Fri Mar 11 09:09:06 2005 From: rich at thevillas.eclipse.co.uk (rich) Date: Fri Mar 11 09:05:22 2005 Subject: [Bioperl-l] hapmap.pm startingcol now 11? In-Reply-To: <1110461476.8193.6.camel@magneto> References: <1110461476.8193.6.camel@magneto> Message-ID: <4231A682.6080100@thevillas.eclipse.co.uk> Hi, yes, you're right. Jason, I seem to remember you were going to give me cvs access to make fixes. Could you give me access so that I can make the change? cheers Rich Albert Vilella wrote: >Hi all, > >AFAICS, Hapmap dump files have (since Dec 2004?) an extra field previous >to the starting column for the first genotype, so the $startingcol in >hapmap.pm should change from 10 to 11 (see end of message). > >Can anyone confirm? > >I'm getting a MSG: > >-------------------- WARNING --------------------- >MSG: cannot add NA06993 as a genotype skipping >-------------------------------------------------- > >And I'm not sure is related to this or not, > >Bests, > > Albert. > >hapmap.pm >--------------------------- >sub _pivot { > my ($self) = @_; > > my (@cols,@rows,@idheader); > while ($_ = $self->_readline){ > chomp($_); > next if( /^\s*\#/ || /^\s+$/ || ! length($_) ); > if( /^rs\#\s+alleles\s+chrom\s+pos\s+strand/ ) { > @idheader = split $self->flag('field_delimiter'); > } else { > push @cols, [split $self->flag('field_delimiter')]; > } > } > #Post Dec 2004. Previously was 10 > my $startingcol = 11; > > $self->{'_header'} = [ map { $_->[0] } @cols]; > for my $n ($startingcol.. $#{ $cols[ 0 ]}) { > my $column = [ $idheader[$n], > map{ $_->[ $n ] } @cols ]; > push (@rows, $column); > } > $self->{'_pivot'} = [@rows]; > $self->{'_i'} = 0; >} >--------------------------- >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > From allenday at ucla.edu Fri Mar 11 12:59:58 2005 From: allenday at ucla.edu (Allen Day) Date: Fri Mar 11 12:54:47 2005 Subject: [Bioperl-l] strange error after changing to RC1.5 In-Reply-To: <42316D9B.801@biologie.uni-freiburg.de> References: <422ECDDD.40404@biologie.uni-freiburg.de> <42316D9B.801@biologie.uni-freiburg.de> Message-ID: > What are these overloaded methods for? > Who/What is calling ->ne()? The SeqFeatureI class has been made, under the hood, to use the Bio::AnnotationCollection for storing annotations. Bio::AnnotationColleciton holds Bio::AnnotationI objects, not the simple strings that were held in older SeqFeatureI implementing classes. There is still a lot of code in bioperl (and bioperl-db I take it) that wants to treat the annotations as strings, so we add overloading to allow this to happen. No one is calling the eq() method direcly, it gets triggered when someone does like this: if ( $obj1->dbxref eq $obj2->dbxref ) { } It sounds like this might not be what is causing the problem for you, but I thought you should be aware as you debug. -Allen From hlapp at gmx.net Fri Mar 11 13:11:24 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri Mar 11 13:07:37 2005 Subject: [Bioperl-l] strange error after changing to RC1.5 In-Reply-To: Message-ID: I suggest that all the fancy overloading is removed from core bioperl modules. If we need overloading for stringification or comparison operators in one or our core modules I think we are making a mistake. This is part of the huge mess introduced when the SeqFeatureI architecture was carelessly changed days before release. It's a prototypical example for what not to do in a project that's as widely used as bioperl. *Every single bit* of those changes need to be rolled back from the release and if nobody else has done it by then I will do so in two weeks. -hilmar On Thursday, March 10, 2005, at 05:57 PM, Allen Day wrote: > I'm unable to test the code in PersistentObject.pm as I don't have > biosql > set up, but you might try adding this to Reference.pm > > use overload 'ne' => sub { "$_[0]" ne "$_[1]" } > > Please let me know if this fixes your error and I'll add this 'ne' > overload to all the Bio::Annotation::* classes on HEAD. > > -Allen > > > On Wed, 9 Mar 2005, Daniel Lang wrote: > >> Hi, >> I?m retrieving seq objects from a local biosql db (using the latest >> cvs >> verion of bioperl-db) and e.g. writing them with SeqIO. After changing >> from a cvs version ~ 12/04 to RC1.5 or latest cvs version, I get the >> following error: >> >> Operation `ne': no method found,!!left argument in overloaded package >> Bio::Annotation::Reference,!!right argument has no overloaded magic at >> /usr/lib/perl5/site_perl/5.6.1/Bio/DB/Persistent/PersistentObject.pm >> line 534, line 1.! >> >> The module PersistentObject.pm hasn?t changed and in Reference.pm >> there >> is only this change: >> >> diff bioperl-live-Dec04/Bio/Annotation/Reference.pm >> bioperl-live/Bio/Annotation/Reference.pm >> 1c1 >> < # $Id: Reference.pm,v 1.21 2004/08/19 20:13:32 lapp Exp $ >> --- >>> # $Id: Reference.pm,v 1.22 2005/02/02 22:13:22 allenday Exp $ >> 56c56,57 >> < # use overload '""' => \&as_text; >> --- >>> use overload '""' => sub { $_[0]->title || ''}; >>> use overload 'eq' => sub { "$_[0]" eq "$_[1]" }; >> >> I?ve reversed this, but no positive result - the error remains... >> Any hints? >> >> Thanks in advance, >> Daniel >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gmx.net Fri Mar 11 13:12:36 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri Mar 11 13:07:52 2005 Subject: [Bioperl-l] Living on the edg with 1.5? In-Reply-To: <4230CB95.2000701@purdue.edu> Message-ID: <25B516B6-9259-11D9-8881-000A959EB4C4@gmx.net> Don't use 1.5.0 for now. It's SeqFeatureI is broken and isn't nor ever was ready for release. Sorry about this glitch. -hilmar On Thursday, March 10, 2005, at 02:35 PM, Phillip San Miguel wrote: > > Because GBROWSE wants bioperl v 1.5 I'm moved to that on most of > the platforms I use. But I've noticed two (presumably) unrelated > glitches just in using it today. Should I really be using v 1.4? I'm > not a developer, and don't really welcome additional headaches. > Advice? > > Here are the bugs: > > perl -e 'use Bio::Perl; $seq_object = > get_sequence("genbank","u11059"); > write_sequence(">test","genbank",$seq_object);' > > > fetches U11059 from genbank and tries to print in genbank format but > something is wrong in v 1.5. > > Running Version 1.5: > > ... > FEATURES Location/Qualifiers > source 1..7313 > > /mol_type="Bio::Annotation::SimpleValue=HASH(0x9b3870)" > > /tissue_type="Bio::Annotation::SimpleValue=HASH(0x9b38b8)" > > /db_xref="Bio::Annotation::SimpleValue=HASH(0x9b3828)" > > /transposon="Bio::Annotation::SimpleValue=HASH(0x9b0968)" > > /strain="Bio::Annotation::SimpleValue=HASH(0x9b3900)" > > /chromosome="Bio::Annotation::SimpleValue=HASH(0x9b3948)" > > /organism="Bio::Annotation::SimpleValue=HASH(0x9b3990)" > LTR 1..649 > /label=Bio::Annotation::SimpleValue=HASH(0x9aefd4) > TATA_signal 304..310 > /label=Bio::Annotation::SimpleValue=HASH(0x9b55fc) > misc_feature 651..659 > /label=Bio::Annotation::SimpleValue=HASH(0x9b5764) > ... > > Running Version 1.4 (no problem): > > ... > FEATURES Location/Qualifiers > source 1..7313 > /transposon="retrotransposon" > /mol_type="genomic DNA" > /db_xref="taxon:4577" > /tissue_type="leaf" > /strain="A188" > /chromosome="7" > /organism="Zea mays" > LTR 1..649 > /label=upstreamLTR > TATA_signal 304..310 > /label=upstream > misc_feature 651..659 > /label=PBSsite > ... > > The other 1.5 bug I found today: The following one-liner demonstrates > it: > > perl -e 'use Bio::SeqIO;use Bio::Seq::PrimaryQual; $qual_object = > Bio::Seq::PrimaryQual->new(-qual=> "10 20 30 40 50 40 30 20 10", -id > => "test", -format => 'qual'); $qual_out = Bio::SeqIO->new(-file => > ">test", -format => 'qual');$qual_out->write_seq($qual_object);' > > When I run it under Version 1.5 the correct output file is produced > but I also get the following output sent to STDOUT: > > '_root_verbose' => 0 > 'display_id' => 'test' > 'qual' => ARRAY(0x5b07e8) > 0 10 > 1 20 > 2 30 > 3 40 > 4 50 > 5 40 > 6 30 > 7 20 > 8 10 > > Under Version 1.4 everything is fine. (No extraneous STDOUT is > created.) > > This looks like someone uncommented a Data::Dumper print somewhere, > but I wasn't able to find it. > > -- > Phillip SanMiguel > Purdue Genomics Core Facility > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gmx.net Fri Mar 11 13:17:23 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri Mar 11 13:11:55 2005 Subject: [Bioperl-l] Entrez Gene ASN In-Reply-To: <42305650.30403@utk.edu> Message-ID: Gene shouldn't be fundamentally different from LocusLink, and LocusLink was represented as an annotated SeqI within bioperl. If at all possible I'd still like it to remain that way for Gene in order to allow for a smooth transition from LL to Gene for code that's been using the former. If you want to emphasize the fact that it's a container for sequences, then that sounds like a ClusterI to me, which can be richly annotated too. Note also that NCBI is working on an ASN.1->XML converter. Personally, I'm inclined to wait for that converter to appear, but other priorities may prevail. Let me know what you think. -hilmar On Thursday, March 10, 2005, at 06:14 AM, Stefan Kirov wrote: > Hi guys! > I have done some (mostly) serious thinking about ASN Entrez Gene > parsing and I propose we do my favorite thing- postpone everything we > cannot deal with right now. If you want it to sound better: take a > gradual approach where we store the data we can deal with in the > existing Bioperl objects and skipping the rest for now. > In details: > ASN gene record can be correctly represented as a tree. I have written > a simple parser for my own purposes which is storing the following: > node_id---| > --parent > --level > --tag > --values > What I do then is get specific levels and tags and build different > objects. So level 2 with parent EntrezGene (which is the root level > and has no information) is gene description and has tags such as gene, > name, etc; at level 3, 5 and 6 you can get the complete specie > definition by looking for orgname and org as tags and records with > parent mod (which is a value for orgname, descend down the branch). > I am using this approach to store most of the data in a relational > database without going through Bioperl. What I ultimately want to do > is use standard Bioperl modules. However, I don't think we have an > object that can efficiently represent the structure (correct me if I > am wrong). I think it may be a good idea to have a container object, > possibly Bio::Gene that may contain multiple Bio::Seq objects (with or > without real sequence). I believe we can borrow some structure and > code from EnsEMBL gene representation (way to contain multiple > transcripts, etc., not the database interactions certainly). > Please let me know what you think. > Stefan > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gmx.net Fri Mar 11 13:19:44 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri Mar 11 13:14:27 2005 Subject: [Bioperl-l] uniprot flatfile extraction In-Reply-To: <1110378245.422f07057f49c@sms.ed.ac.uk> Message-ID: <24B0CE20-925A-11D9-8881-000A959EB4C4@gmx.net> Basically everything that's in the UniProt file should be found on the RichSeqI object returned from the parser (Bio::SeqIO::swiss). If it's in the feature table you'll find it as annotation (tag/value) of the features held by the seq object ($seq->get_SeqFeatures). Other stuff like dbxrefs are in the annotation bundle ($seq->annotation). -hilmar On Wednesday, March 9, 2005, at 06:24 AM, SG Edwards wrote: > Hi, sorry if this is basic but I've read the documentation and am still > confused!! > > I wish to extract uniprot flatfile data into my database. I want to > get the > following variables: > > Protein ID, length, description, molecular weight, sequence, comments, > cross > references, disulphide bonds, species, entered date, last modified, > last > annotated, protein synonyms. > > I know that I can get some of these (e.g. protein ID, length) using > Bioperl but > can I get all of the data also or am I better writing my own from > scratch? > > Thanks > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gmx.net Fri Mar 11 13:22:38 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri Mar 11 13:19:13 2005 Subject: [Bioperl-l] strange error after changing to RC1.5 In-Reply-To: <422ECDDD.40404@biologie.uni-freiburg.de> Message-ID: <8C8C7E3A-925A-11D9-8881-000A959EB4C4@gmx.net> Try removing the overload altogether. Or, possibly better yet, don't use 1.5. I'm saying this because it should be the responsibility of the one who created the mess and didn't test it to clean it up and test rigorously and not the reponsibility of the community out there. -hilmar On Wednesday, March 9, 2005, at 02:20 AM, Daniel Lang wrote: > Hi, > I?m retrieving seq objects from a local biosql db (using the latest > cvs verion of bioperl-db) and e.g. writing them with SeqIO. After > changing from a cvs version ~ 12/04 to RC1.5 or latest cvs version, I > get the following error: > > Operation `ne': no method found,!!left argument in overloaded package > Bio::Annotation::Reference,!!right argument has no overloaded magic at > /usr/lib/perl5/site_perl/5.6.1/Bio/DB/Persistent/PersistentObject.pm > line 534, line 1.! > > The module PersistentObject.pm hasn?t changed and in Reference.pm > there is only this change: > > diff bioperl-live-Dec04/Bio/Annotation/Reference.pm > bioperl-live/Bio/Annotation/Reference.pm > 1c1 > < # $Id: Reference.pm,v 1.21 2004/08/19 20:13:32 lapp Exp $ > --- > > # $Id: Reference.pm,v 1.22 2005/02/02 22:13:22 allenday Exp $ > 56c56,57 > < # use overload '""' => \&as_text; > --- > > use overload '""' => sub { $_[0]->title || ''}; > > use overload 'eq' => sub { "$_[0]" eq "$_[1]" }; > > I?ve reversed this, but no positive result - the error remains... > Any hints? > > Thanks in advance, > Daniel > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From skirov at utk.edu Fri Mar 11 13:36:29 2005 From: skirov at utk.edu (Stefan Kirov) Date: Fri Mar 11 13:31:41 2005 Subject: [Bioperl-l] Entrez Gene ASN In-Reply-To: <20050310212823.GB5392@anna> References: <42305650.30403@utk.edu> <20050310212823.GB5392@anna> Message-ID: <4231E52D.5030908@utk.edu> > >Hi Stefan, > >from the work I have done on this issue it would seem that your suggestion is quite promising. Let me know if you need some help on this. > I will definitely need some as right now I am just extracting data that I need for my own project. Therefore some help wold be nice with respect to the boring task of deciding where some data should go and how exactly to capture it. > How is the performance that you are seeing to date? > > > Hard to tell as I am not parsing everything. Rough estimate is few seconds and I doubt it will grow significantly. >best, >Peter > > > Stefan From skirov at utk.edu Fri Mar 11 14:02:16 2005 From: skirov at utk.edu (Stefan Kirov) Date: Fri Mar 11 13:56:53 2005 Subject: [Bioperl-l] Entrez Gene ASN In-Reply-To: References: Message-ID: <4231EB38.8040809@utk.edu> Hilmar Lapp wrote: > Gene shouldn't be fundamentally different from LocusLink, and > LocusLink was represented as an annotated SeqI within bioperl. It is not, you are right. > > If at all possible I'd still like it to remain that way for Gene in > order to allow for a smooth transition from LL to Gene for code that's > been using the former. > hmmmm, back compatibility is good thing, but sometimes it may be hard to achieve. > If you want to emphasize the fact that it's a container for sequences, > then that sounds like a ClusterI to me, which can be richly annotated > too. Let me disagree here. Cluster is designed for independent sequences, where Gene should deal with sequences, that have hierarchical relationship among themselves. This is one of the issues I think Seq object is not designed to deal with. What we need is: genome--(Bio::Seq)- |--transcript(Bio::Seq) |--protein(Bio::Seq) |--transcript(Bio::Seq) |--protein(Bio::Seq) etc. As an alternative one can store in a separate ontology object the relationships, but I don't think this is really effective. As many genome and transcript entries exist, it will be easy to loose the relations. Another significant concern I have is that if we store everything as SeqFeature or the overhead may become huge (some records have hundreds of different features) and any user of the parser will have to do quite of a data mining to find the relevant feature. One approach would be to add more Bio::Annotation:: objects (for example Bio::Annotation::STS, Bio::Annotation::GRIF, etc). And one last thing: orthology (which agan could be based on ontology) and synteny are things that should be in the Gene (or loculink) object. We may decide to create a simplified (Bio::Seq, no relationships) or more complex object (Gene), based on the user request. I hope this does not sound too counfusing as I am burried in the Gene ASN structure and I am quickly approaching quiet madness. > > Note also that NCBI is working on an ASN.1->XML converter. Personally, > I'm inclined to wait for that converter to appear, but other > priorities may prevail. > I have waited for a while. If they cannot parse their own data...? Anyway, some issues will still be there even if we have the XML. Stefan > Let me know what you think. > > -hilmar > > On Thursday, March 10, 2005, at 06:14 AM, Stefan Kirov wrote: > >> Hi guys! >> I have done some (mostly) serious thinking about ASN Entrez Gene >> parsing and I propose we do my favorite thing- postpone everything we >> cannot deal with right now. If you want it to sound better: take a >> gradual approach where we store the data we can deal with in the >> existing Bioperl objects and skipping the rest for now. >> In details: >> ASN gene record can be correctly represented as a tree. I have >> written a simple parser for my own purposes which is storing the >> following: >> node_id---| >> --parent >> --level >> --tag >> --values >> What I do then is get specific levels and tags and build different >> objects. So level 2 with parent EntrezGene (which is the root level >> and has no information) is gene description and has tags such as >> gene, name, etc; at level 3, 5 and 6 you can get the complete specie >> definition by looking for orgname and org as tags and records with >> parent mod (which is a value for orgname, descend down the branch). >> I am using this approach to store most of the data in a relational >> database without going through Bioperl. What I ultimately want to do >> is use standard Bioperl modules. However, I don't think we have an >> object that can efficiently represent the structure (correct me if I >> am wrong). I think it may be a good idea to have a container object, >> possibly Bio::Gene that may contain multiple Bio::Seq objects (with >> or without real sequence). I believe we can borrow some structure and >> code from EnsEMBL gene representation (way to contain multiple >> transcripts, etc., not the database interactions certainly). >> Please let me know what you think. >> Stefan >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >> -- Stefan Kirov, Ph.D. University of Tennessee/Oak Ridge National Laboratory 5700 bldg, PO BOX 2008 MS6164 Oak Ridge TN 37831-6164 USA tel +865 576 5120 fax +865-576-5332 e-mail: skirov@utk.edu sao@ornl.gov "And the wars go on with brainwashed pride For the love of God and our human rights And all these things are swept aside" From allenday at ucla.edu Fri Mar 11 14:36:11 2005 From: allenday at ucla.edu (Allen Day) Date: Fri Mar 11 14:32:47 2005 Subject: [Bioperl-l] strange error after changing to RC1.5 In-Reply-To: References: Message-ID: On Fri, 11 Mar 2005, Hilmar Lapp wrote: > I suggest that all the fancy overloading is removed from core bioperl > modules. If we need overloading for stringification or comparison > operators in one or our core modules I think we are making a mistake. The overloading is only there because assumptions have been made that annotations will be strings. This assumption was okay previously becasue the Bio::Annotation* modules were previously "non core" -- there was no unified annotation system in bioperl. Now these modules are being made core, and this is part of the growing pain. I'm doing what I can to address the bug reports related to these changes as they come in, and I don't think anyone will disagree that I'm doing so in a timely manner. However, I cannot fix bugs or field questions on biosql modules and would appreciate some cooperation/assistance from the biosql developers. > This is part of the huge mess introduced when the SeqFeatureI > architecture was carelessly changed days before release. It's a > prototypical example for what not to do in a project that's as widely > used as bioperl. The SeqFeatureI changes were being gradually made in the 1-2 months prior to the 1.5 release. The release was, may I remind you, a *developer* release and not expected to be bug free. > *Every single bit* of those changes need to be rolled back from the > release and if nobody else has done it by then I will do so in two > weeks. Fine for the 1.5.1 branch, although I don't agree that this should be done on the main trunk. -Allen > -hilmar > > On Thursday, March 10, 2005, at 05:57 PM, Allen Day wrote: > > > I'm unable to test the code in PersistentObject.pm as I don't have > > biosql > > set up, but you might try adding this to Reference.pm > > > > use overload 'ne' => sub { "$_[0]" ne "$_[1]" } > > > > Please let me know if this fixes your error and I'll add this 'ne' > > overload to all the Bio::Annotation::* classes on HEAD. > > > > -Allen > > > > > > On Wed, 9 Mar 2005, Daniel Lang wrote: > > > >> Hi, > >> I?m retrieving seq objects from a local biosql db (using the latest > >> cvs > >> verion of bioperl-db) and e.g. writing them with SeqIO. After changing > >> from a cvs version ~ 12/04 to RC1.5 or latest cvs version, I get the > >> following error: > >> > >> Operation `ne': no method found,!!left argument in overloaded package > >> Bio::Annotation::Reference,!!right argument has no overloaded magic at > >> /usr/lib/perl5/site_perl/5.6.1/Bio/DB/Persistent/PersistentObject.pm > >> line 534, line 1.! > >> > >> The module PersistentObject.pm hasn?t changed and in Reference.pm > >> there > >> is only this change: > >> > >> diff bioperl-live-Dec04/Bio/Annotation/Reference.pm > >> bioperl-live/Bio/Annotation/Reference.pm > >> 1c1 > >> < # $Id: Reference.pm,v 1.21 2004/08/19 20:13:32 lapp Exp $ > >> --- > >>> # $Id: Reference.pm,v 1.22 2005/02/02 22:13:22 allenday Exp $ > >> 56c56,57 > >> < # use overload '""' => \&as_text; > >> --- > >>> use overload '""' => sub { $_[0]->title || ''}; > >>> use overload 'eq' => sub { "$_[0]" eq "$_[1]" }; > >> > >> I?ve reversed this, but no positive result - the error remains... > >> Any hints? > >> > >> Thanks in advance, > >> Daniel > >> > >> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l@portal.open-bio.org > >> http://portal.open-bio.org/mailman/listinfo/bioperl-l > >> > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > From sutripa at vbi.vt.edu Sat Mar 12 08:35:42 2005 From: sutripa at vbi.vt.edu (Sucheta Tripathy) Date: Sat Mar 12 08:30:33 2005 Subject: [Bioperl-l] drawing additional axes with GD::Graph Message-ID: <2598.199.3.136.4.1110634542.squirrel@webmail.vbi.vt.edu> Dear group, I am trying to have 2 more horizontal lines to the plot I have, using GD::Graph.For example the X,Y origin be 0,0 and additional horizontal lines at y value .78 and .84. Please suggest me something other than $im->line(). Thanks Sucheta -- Sucheta Tripathy Virginia Bioinformatics Institute Phase-I Washington street. Virginia Tech. Blacksburg,VA 24061-0447 phone:(540)231-8138 Fax: (540) 231-2606 From Mingyi.Liu at gpc-biotech.com Sat Mar 12 10:43:44 2005 From: Mingyi.Liu at gpc-biotech.com (Liu, Mingyi) Date: Sat Mar 12 10:37:40 2005 Subject: [Bioperl-l] Entrez Gene ASN parsers Message-ID: <15C0817A76D1B74C8E3EEA0FADE464A4CBAC2D@sw-wal-beta.gpc-biotech.com> Hello, I have just released a project on sourceforge that contains 4 different parsers for Entrez Gene ASN file based on regex, Parse::RecDescent, Parse::Yapp, and Perl-byacc. They differ in performance and the regex-based parser is the best performer, processing over 13000 records a minute on average (It finishes the 900+ MB human annotation file in 11 minutes on one Intel Xeon 2.4 GHz CPU). The other parsers are at least a few fold slower but I included them since it'd be of intererst to people learning to use those tools or choosing among the tools for a practical project. All parsers are short OO-modules (<100 lines if not counting POD/YACC-generated code), so they are easy to use and understand. Right now my parsers do not assemble data into Bioperl objects (because for my project I only needed to put them into a proprietary XML format, which is not released (not that it's anything special, just IP issues. Without IP issues, I could've released the parser code in Feb.)). They behave like XML-parsers, namely, they parse entrez gene records and assemble content into data structures only. But I hope it could serve as a base that Bioperl objects can be built (the data structure is easy to use). Please feel free to use the code for any Bioperl or other projects as I released them under GPL (thanks to my company and a collaborating company's consent). Please also feel free to contact me if you have any suggestion or bug report. The URL for the sourceforge project is http://sourceforge.net/projects/egparser/ Thanks, Mingyi Dr. Mingyi Liu Computational Biologist GPC Biotech Inc. 610 Lincoln St. Waltham, MA 02451 USA From skirov at utk.edu Sat Mar 12 17:59:16 2005 From: skirov at utk.edu (Stefan Kirov) Date: Sat Mar 12 17:54:04 2005 Subject: [Bioperl-l] Entrez Gene ASN parsers In-Reply-To: <15C0817A76D1B74C8E3EEA0FADE464A4CBAC2D@sw-wal-beta.gpc-biotech.com> References: <15C0817A76D1B74C8E3EEA0FADE464A4CBAC2D@sw-wal-beta.gpc-biotech.com> Message-ID: <42337444.1050102@utk.edu> Mingyi, I looked at the code (EntrezGene) and so far it seems to me it gives as you claim pretty accurate and easy to understand data structure (few dead entries and some 0 size array, but nothing major). The only concern I have is that the data structure. If you want to achieve a better structure (non-redundant, two level where possible or a collection of Bioperl objects) this will slow things down. I guess I will compare how the code I wrote compares to yours and choose the faster one. I think this makes sense. Stefan Liu, Mingyi wrote: >Hello, > >I have just released a project on sourceforge that contains 4 different parsers for Entrez Gene ASN file based on regex, Parse::RecDescent, Parse::Yapp, and Perl-byacc. They differ in performance and the regex-based parser is the best performer, processing over 13000 records a minute on average (It finishes the 900+ MB human annotation file in 11 minutes on one Intel Xeon 2.4 GHz CPU). The other parsers are at least a few fold slower but I included them since it'd be of intererst to people learning to use those tools or choosing among the tools for a practical project. All parsers are short OO-modules (<100 lines if not counting POD/YACC-generated code), so they are easy to use and understand. > >Right now my parsers do not assemble data into Bioperl objects (because for my project I only needed to put them into a proprietary XML format, which is not released (not that it's anything special, just IP issues. Without IP issues, I could've released the parser code in Feb.)). They behave like XML-parsers, namely, they parse entrez gene records and assemble content into data structures only. But I hope it could serve as a base that Bioperl objects can be built (the data structure is easy to use). Please feel free to use the code for any Bioperl or other projects as I released them under GPL (thanks to my company and a collaborating company's consent). > >Please also feel free to contact me if you have any suggestion or bug report. > >The URL for the sourceforge project is http://sourceforge.net/projects/egparser/ > >Thanks, > >Mingyi > >Dr. Mingyi Liu >Computational Biologist >GPC Biotech Inc. >610 Lincoln St. >Waltham, MA 02451 >USA > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Stefan Kirov, Ph.D. University of Tennessee/Oak Ridge National Laboratory 5700 bldg, PO BOX 2008 MS6164 Oak Ridge TN 37831-6164 USA tel +865 576 5120 fax +865-576-5332 e-mail: skirov@utk.edu sao@ornl.gov "And the wars go on with brainwashed pride For the love of God and our human rights And all these things are swept aside" From Mingyi.Liu at gpc-biotech.com Sat Mar 12 18:50:41 2005 From: Mingyi.Liu at gpc-biotech.com (Liu, Mingyi) Date: Sat Mar 12 18:44:33 2005 Subject: [Bioperl-l] Entrez Gene ASN parsers Message-ID: <15C0817A76D1B74C8E3EEA0FADE464A4CBAC2E@sw-wal-beta.gpc-biotech.com> Hi, Stefan, Yes, the advantage and disadvantage of my approach are that my parsers do not take the underlying data into account. By totally ignoring the data content and focusing just on format, this appropach ensured that no data will be left behind in parsing and that the development of the parsers would be very fast, and the parsers perform very well. In addition, even if NCBI changes the data content, the parser will most likely work just fine without any modifications. However, this does result in a data structure that is not consolidated into, for example, the two level type you'd want. The data structure generated merely reflects however NCBI chose to structure their Entrez Gene ASN files. Building Bioperl objects based on my parser would take some serious efforts (1-2 weeks). It is definitely doable though, and the performance should not slow down much. The benchmark I gave included not just the time for parsing and data structure construction, but also data structure trimming, which traverses almost the entire data structure and make changes. But the initiation of Bioperl objects may make the whole process slow down a few fold. Regardless, I totally agree that it's the best if you could do a comparison and choose the most suitable approach. BTW, can you send me example entries for which there are dead entries or 0-sized array in my parser? I wonder if it's a problem of Entrez Gene file or my parser, since I simply let the data structure mirror the file. But if it isn't, then I would want to check if it's a bug. I did process the full human genome into XML files and did not see any empty elements or attributes, and the parser runs on entire mouse and rat genomes without problem, which is expected. Thanks, Mingyi > -----Original Message----- > From: Stefan Kirov [mailto:skirov@utk.edu] > Sent: Saturday, March 12, 2005 5:59 PM > To: Liu, Mingyi > Cc: bioperl-l@portal.open-bio.org > Subject: Re: [Bioperl-l] Entrez Gene ASN parsers > > > Mingyi, > I looked at the code (EntrezGene) and so far it seems to me > it gives as > you claim pretty accurate and easy to understand data structure (few > dead entries and some 0 size array, but nothing major). > The only concern I have is that the data structure. If you want to > achieve a better structure (non-redundant, two level where > possible or a > collection of Bioperl objects) this will slow things down. I guess I > will compare how the code I wrote compares to yours and choose the > faster one. I think this makes sense. > Stefan > From hlapp at gmx.net Sat Mar 12 19:33:24 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat Mar 12 19:28:07 2005 Subject: [Bioperl-l] Entrez Gene ASN parsers In-Reply-To: <15C0817A76D1B74C8E3EEA0FADE464A4CBAC2E@sw-wal-beta.gpc-biotech.com> Message-ID: <82B4FA2E-9357-11D9-B647-000A959EB4C4@gmx.net> I kind of like this approach, i.e., have a general purpose low-level parser that you have reasonable confidence in will never be the bottleneck, and then build a bioperl parser on top of it that now can focus its code on assembling the desired data structure as opposed to the file format itself. And if course assembling that data structure will slow things down a lot but hey, either you want an object hierarchy in (bio-)perl or you don't. Also, given the thread and previous ones, that ominous bioperl data structure may be very fluid initially, or even result in different top-level parsers depending on how compatible the different visions are for what to get out of that parser. -hilmar On Saturday, March 12, 2005, at 03:50 PM, Liu, Mingyi wrote: > Hi, Stefan, > > Yes, the advantage and disadvantage of my approach are that my parsers > do not take the underlying data into account. By totally ignoring the > data content and focusing just on format, this appropach ensured that > no data will be left behind in parsing and that the development of the > parsers would be very fast, and the parsers perform very well. In > addition, even if NCBI changes the data content, the parser will most > likely work just fine without any modifications. > > However, this does result in a data structure that is not consolidated > into, for example, the two level type you'd want. The data structure > generated merely reflects however NCBI chose to structure their Entrez > Gene ASN files. Building Bioperl objects based on my parser would > take some serious efforts (1-2 weeks). It is definitely doable > though, and the performance should not slow down much. The benchmark > I gave included not just the time for parsing and data structure > construction, but also data structure trimming, which traverses almost > the entire data structure and make changes. But the initiation of > Bioperl objects may make the whole process slow down a few fold. > > Regardless, I totally agree that it's the best if you could do a > comparison and choose the most suitable approach. > > BTW, can you send me example entries for which there are dead entries > or 0-sized array in my parser? I wonder if it's a problem of Entrez > Gene file or my parser, since I simply let the data structure mirror > the file. But if it isn't, then I would want to check if it's a bug. > I did process the full human genome into XML files and did not see any > empty elements or attributes, and the parser runs on entire mouse and > rat genomes without problem, which is expected. > > Thanks, > > Mingyi > >> -----Original Message----- >> From: Stefan Kirov [mailto:skirov@utk.edu] >> Sent: Saturday, March 12, 2005 5:59 PM >> To: Liu, Mingyi >> Cc: bioperl-l@portal.open-bio.org >> Subject: Re: [Bioperl-l] Entrez Gene ASN parsers >> >> >> Mingyi, >> I looked at the code (EntrezGene) and so far it seems to me >> it gives as >> you claim pretty accurate and easy to understand data structure (few >> dead entries and some 0 size array, but nothing major). >> The only concern I have is that the data structure. If you want to >> achieve a better structure (non-redundant, two level where >> possible or a >> collection of Bioperl objects) this will slow things down. I guess I >> will compare how the code I wrote compares to yours and choose the >> faster one. I think this makes sense. >> Stefan >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gmx.net Sat Mar 12 19:55:44 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat Mar 12 19:50:16 2005 Subject: [Bioperl-l] Entrez Gene ASN In-Reply-To: <4231EB38.8040809@utk.edu> Message-ID: On Friday, March 11, 2005, at 11:02 AM, Stefan Kirov wrote: > > > Hilmar Lapp wrote: > >> Gene shouldn't be fundamentally different from LocusLink, and >> LocusLink was represented as an annotated SeqI within bioperl. > > It is not, you are right. > >> >> If at all possible I'd still like it to remain that way for Gene in >> order to allow for a smooth transition from LL to Gene for code >> that's been using the former. >> > hmmmm, back compatibility is good thing, but sometimes it may be hard > to achieve. Well, now you contradict yourself. Above you agree that Gene and LocusLink are fundamentally the same, and here you say representing them in a compatible fashion may be hard to achieve ... There are problems indeed though, read on ... > >> If you want to emphasize the fact that it's a container for >> sequences, then that sounds like a ClusterI to me, which can be >> richly annotated too. > > Let me disagree here. Cluster is designed for independent sequences, > where Gene should deal with sequences, that have hierarchical > relationship among themselves. Two notes here. First, ClusterI is not designed for independent sequences. It is just meant as a container for sequences, be those related to each other or not. Second, the ability to represent hierarchical relationships between sequences is basically absent from bioperl, not just from ClusterI (aside from ClusterI representing a relationship between the containing seq and the contained seqs). We should think seriously before we add that capability. Most of the people and effort in the field towards hierarchical relationships between biological entities with sequence takes place in the domain of feature hierarchies, *not* sequence hierarchies. See GFF3, SO, GBrowse, Chado, and related efforts. The only place I know where sequence heirarchies are extensively used is in our local adaptation of Biosql, and we do all of this in SQL (as bioperl and therefore bioperl-db has zero support for it). It's possible but I'm not sure also wise to duplicate the support for feature hierarchies to sequences ... Wouldn't it in the end benefit more people if you were able to tie in Gene into the Unflattener that Chris wrote? > This is one of the issues I think Seq object is not designed to deal > with. What we need is: > genome--(Bio::Seq)- > |--transcript(Bio::Seq) > |--protein(Bio::Seq) > |--transcript(Bio::Seq) > |--protein(Bio::Seq) Well, yeah, if you replace Bio::Seq with Bio::SeqFeatureI you are pretty close to GFF3 and a growing wealth of support for it. > > Another significant concern I have is that if we store everything as > SeqFeature or the overhead may become huge (some records have hundreds > of different features) Have you talked to Lincoln about this? I believe GBrowse is dealing pretty well with this huge overhead but I may be missing something here. > [...] and any user of the parser will have to do quite of a data > mining to find the relevant feature. One approach would be to add more > Bio::Annotation:: objects (for example Bio::Annotation::STS, > Bio::Annotation::GRIF, etc). Possibly. Bio::Annotation objects was in fact what I was primarily referring to when I spoke about annotation. > We may decide to create a simplified (Bio::Seq, no relationships) or > more complex object (Gene), based on the user request. Just as an aside, I guess you know that there is a Gene object already, but it's feature based. -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gmx.net Sat Mar 12 21:43:13 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat Mar 12 21:37:48 2005 Subject: [Bioperl-l] strange error after changing to RC1.5 In-Reply-To: Message-ID: My first response to this was a long rant about almost every single one of your statements and which may have been mildly entertaining for people while the TV is on commercial. In the end I calmed down and thought people have probably better things to do than reading my rants (should I start a bioperl blog?), so here is the same in a gist and without (most of) the rant. - In my opinion the annotation system is core, like everything is by definition that attaches to a Bio::SeqI. - I'm not ever going to turn away people who took to the code to fill gaps or ambiguities in the documentation - API assumptions based on what the code did for years count as a binding contract just as expressly written contracts do. - I am strongly opposed to the notion that your customers should to the testing for your wild innovations as opposed to yourself doing that in advance, regardless of how fast or slow you respond to bug reports; people have better things to do than ironing out your revolution. - I *am* going to back out the changes from the main trunk; traditionally, in bioperl the main trunk has *not* been used for wild experiments the repercussions of which were not really clear - instead people opened their branches for that. Allen feel free to reintroduce your changes and overloads and all kinds of crazy stuff on a branch that you open. We need the main trunk free of debris as the road to the next releases to come. Feel free to wreck the train elsewhere. People need the bugfixes now and Lincoln's additions that aren't in 1.4.x. Of course, this being a community project, everybody who disagrees please feel free to speak up and if people want to stop me I'll be more than glad to step down - but then be prepared to step up yourself and take care of the mess. -hilmar On Friday, March 11, 2005, at 11:36 AM, Allen Day wrote: > On Fri, 11 Mar 2005, Hilmar Lapp wrote: > >> I suggest that all the fancy overloading is removed from core bioperl >> modules. If we need overloading for stringification or comparison >> operators in one or our core modules I think we are making a mistake. > > The overloading is only there because assumptions have been made that > annotations will be strings. This assumption was okay previously > becasue > the Bio::Annotation* modules were previously "non core" -- there was no > unified annotation system in bioperl. Now these modules are being made > core, and this is part of the growing pain. > > I'm doing what I can to address the bug reports related to these > changes > as they come in, and I don't think anyone will disagree that I'm doing > so > in a timely manner. However, I cannot fix bugs or field questions on > biosql modules and would appreciate some cooperation/assistance from > the > biosql developers. > >> This is part of the huge mess introduced when the SeqFeatureI >> architecture was carelessly changed days before release. It's a >> prototypical example for what not to do in a project that's as widely >> used as bioperl. > > The SeqFeatureI changes were being gradually made in the 1-2 months > prior > to the 1.5 release. The release was, may I remind you, a *developer* > release and not expected to be bug free. > >> *Every single bit* of those changes need to be rolled back from the >> release and if nobody else has done it by then I will do so in two >> weeks. > > Fine for the 1.5.1 branch, although I don't agree that this should be > done > on the main trunk. > > -Allen > > >> -hilmar >> >> On Thursday, March 10, 2005, at 05:57 PM, Allen Day wrote: >> >>> I'm unable to test the code in PersistentObject.pm as I don't have >>> biosql >>> set up, but you might try adding this to Reference.pm >>> >>> use overload 'ne' => sub { "$_[0]" ne "$_[1]" } >>> >>> Please let me know if this fixes your error and I'll add this 'ne' >>> overload to all the Bio::Annotation::* classes on HEAD. >>> >>> -Allen >>> >>> >>> On Wed, 9 Mar 2005, Daniel Lang wrote: >>> >>>> Hi, >>>> I?m retrieving seq objects from a local biosql db (using the latest >>>> cvs >>>> verion of bioperl-db) and e.g. writing them with SeqIO. After >>>> changing >>>> from a cvs version ~ 12/04 to RC1.5 or latest cvs version, I get the >>>> following error: >>>> >>>> Operation `ne': no method found,!!left argument in overloaded >>>> package >>>> Bio::Annotation::Reference,!!right argument has no overloaded magic >>>> at >>>> /usr/lib/perl5/site_perl/5.6.1/Bio/DB/Persistent/PersistentObject.pm >>>> line 534, line 1.! >>>> >>>> The module PersistentObject.pm hasn?t changed and in Reference.pm >>>> there >>>> is only this change: >>>> >>>> diff bioperl-live-Dec04/Bio/Annotation/Reference.pm >>>> bioperl-live/Bio/Annotation/Reference.pm >>>> 1c1 >>>> < # $Id: Reference.pm,v 1.21 2004/08/19 20:13:32 lapp Exp $ >>>> --- >>>>> # $Id: Reference.pm,v 1.22 2005/02/02 22:13:22 allenday Exp $ >>>> 56c56,57 >>>> < # use overload '""' => \&as_text; >>>> --- >>>>> use overload '""' => sub { $_[0]->title || ''}; >>>>> use overload 'eq' => sub { "$_[0]" eq "$_[1]" }; >>>> >>>> I?ve reversed this, but no positive result - the error remains... >>>> Any hints? >>>> >>>> Thanks in advance, >>>> Daniel >>>> >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l@portal.open-bio.org >>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From brian_osborne at cognia.com Sat Mar 12 22:01:50 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Sat Mar 12 21:56:24 2005 Subject: [Bioperl-l] strange error after changing to RC1.5 In-Reply-To: Message-ID: Hilmar, If I'm not mistaken this proposal to back out these changes was made previously, and not by you. There were no objections to this proposal at that time. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Hilmar Lapp Sent: Saturday, March 12, 2005 9:43 PM To: Allen Day Cc: Daniel Lang; BioPerl-List; OBDA BioSQL Subject: Re: [Bioperl-l] strange error after changing to RC1.5 My first response to this was a long rant about almost every single one of your statements and which may have been mildly entertaining for people while the TV is on commercial. In the end I calmed down and thought people have probably better things to do than reading my rants (should I start a bioperl blog?), so here is the same in a gist and without (most of) the rant. - In my opinion the annotation system is core, like everything is by definition that attaches to a Bio::SeqI. - I'm not ever going to turn away people who took to the code to fill gaps or ambiguities in the documentation - API assumptions based on what the code did for years count as a binding contract just as expressly written contracts do. - I am strongly opposed to the notion that your customers should to the testing for your wild innovations as opposed to yourself doing that in advance, regardless of how fast or slow you respond to bug reports; people have better things to do than ironing out your revolution. - I *am* going to back out the changes from the main trunk; traditionally, in bioperl the main trunk has *not* been used for wild experiments the repercussions of which were not really clear - instead people opened their branches for that. Allen feel free to reintroduce your changes and overloads and all kinds of crazy stuff on a branch that you open. We need the main trunk free of debris as the road to the next releases to come. Feel free to wreck the train elsewhere. People need the bugfixes now and Lincoln's additions that aren't in 1.4.x. Of course, this being a community project, everybody who disagrees please feel free to speak up and if people want to stop me I'll be more than glad to step down - but then be prepared to step up yourself and take care of the mess. -hilmar On Friday, March 11, 2005, at 11:36 AM, Allen Day wrote: > On Fri, 11 Mar 2005, Hilmar Lapp wrote: > >> I suggest that all the fancy overloading is removed from core bioperl >> modules. If we need overloading for stringification or comparison >> operators in one or our core modules I think we are making a mistake. > > The overloading is only there because assumptions have been made that > annotations will be strings. This assumption was okay previously > becasue > the Bio::Annotation* modules were previously "non core" -- there was no > unified annotation system in bioperl. Now these modules are being made > core, and this is part of the growing pain. > > I'm doing what I can to address the bug reports related to these > changes > as they come in, and I don't think anyone will disagree that I'm doing > so > in a timely manner. However, I cannot fix bugs or field questions on > biosql modules and would appreciate some cooperation/assistance from > the > biosql developers. > >> This is part of the huge mess introduced when the SeqFeatureI >> architecture was carelessly changed days before release. It's a >> prototypical example for what not to do in a project that's as widely >> used as bioperl. > > The SeqFeatureI changes were being gradually made in the 1-2 months > prior > to the 1.5 release. The release was, may I remind you, a *developer* > release and not expected to be bug free. > >> *Every single bit* of those changes need to be rolled back from the >> release and if nobody else has done it by then I will do so in two >> weeks. > > Fine for the 1.5.1 branch, although I don't agree that this should be > done > on the main trunk. > > -Allen > > >> -hilmar >> >> On Thursday, March 10, 2005, at 05:57 PM, Allen Day wrote: >> >>> I'm unable to test the code in PersistentObject.pm as I don't have >>> biosql >>> set up, but you might try adding this to Reference.pm >>> >>> use overload 'ne' => sub { "$_[0]" ne "$_[1]" } >>> >>> Please let me know if this fixes your error and I'll add this 'ne' >>> overload to all the Bio::Annotation::* classes on HEAD. >>> >>> -Allen >>> >>> >>> On Wed, 9 Mar 2005, Daniel Lang wrote: >>> >>>> Hi, >>>> I?m retrieving seq objects from a local biosql db (using the latest >>>> cvs >>>> verion of bioperl-db) and e.g. writing them with SeqIO. After >>>> changing >>>> from a cvs version ~ 12/04 to RC1.5 or latest cvs version, I get the >>>> following error: >>>> >>>> Operation `ne': no method found,!!left argument in overloaded >>>> package >>>> Bio::Annotation::Reference,!!right argument has no overloaded magic >>>> at >>>> /usr/lib/perl5/site_perl/5.6.1/Bio/DB/Persistent/PersistentObject.pm >>>> line 534, line 1.! >>>> >>>> The module PersistentObject.pm hasn?t changed and in Reference.pm >>>> there >>>> is only this change: >>>> >>>> diff bioperl-live-Dec04/Bio/Annotation/Reference.pm >>>> bioperl-live/Bio/Annotation/Reference.pm >>>> 1c1 >>>> < # $Id: Reference.pm,v 1.21 2004/08/19 20:13:32 lapp Exp $ >>>> --- >>>>> # $Id: Reference.pm,v 1.22 2005/02/02 22:13:22 allenday Exp $ >>>> 56c56,57 >>>> < # use overload '""' => \&as_text; >>>> --- >>>>> use overload '""' => sub { $_[0]->title || ''}; >>>>> use overload 'eq' => sub { "$_[0]" eq "$_[1]" }; >>>> >>>> I?ve reversed this, but no positive result - the error remains... >>>> Any hints? >>>> >>>> Thanks in advance, >>>> Daniel >>>> >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l@portal.open-bio.org >>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From hlapp at gmx.net Sat Mar 12 22:21:32 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat Mar 12 22:17:57 2005 Subject: [Bioperl-l] strange error after changing to RC1.5 In-Reply-To: Message-ID: In my recollection the previous proposal was to back them out of the branch to be created for v1.5.1., whereas they were to remain in the main trunk and thereby implicitly accepted for and included in all future development of bioperl. My proposal is that the basis for 1.5.1. is the main trunk and so they need to be backed out of the main trunk, and whoever (i.e., Allen) is in favor of those changes first prove their viability on a branch before bothering anybody else with them again. I do feel that there are no objections to my proposal other than from Allen, but I may be missing something or someone may not have spoken up yet. -hilmar On Saturday, March 12, 2005, at 07:01 PM, Brian Osborne wrote: > Hilmar, > > If I'm not mistaken this proposal to back out these changes was made > previously, and not by you. There were no objections to this proposal > at > that time. > > Brian O. > > > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Hilmar Lapp > Sent: Saturday, March 12, 2005 9:43 PM > To: Allen Day > Cc: Daniel Lang; BioPerl-List; OBDA BioSQL > Subject: Re: [Bioperl-l] strange error after changing to RC1.5 > > > My first response to this was a long rant about almost every single one > of your statements and which may have been mildly entertaining for > people while the TV is on commercial. In the end I calmed down and > thought people have probably better things to do than reading my rants > (should I start a bioperl blog?), so here is the same in a gist and > without (most of) the rant. > > - In my opinion the annotation system is core, like everything is by > definition that attaches to a Bio::SeqI. > > - I'm not ever going to turn away people who took to the code to fill > gaps or ambiguities in the documentation - API assumptions based on > what the code did for years count as a binding contract just as > expressly written contracts do. > > - I am strongly opposed to the notion that your customers should to > the testing for your wild innovations as opposed to yourself doing that > in advance, regardless of how fast or slow you respond to bug reports; > people have better things to do than ironing out your revolution. > > - I *am* going to back out the changes from the main trunk; > traditionally, in bioperl the main trunk has *not* been used for wild > experiments the repercussions of which were not really clear - instead > people opened their branches for that. > > Allen feel free to reintroduce your changes and overloads and all kinds > of crazy stuff on a branch that you open. We need the main trunk free > of debris as the road to the next releases to come. Feel free to wreck > the train elsewhere. People need the bugfixes now and Lincoln's > additions that aren't in 1.4.x. > > Of course, this being a community project, everybody who disagrees > please feel free to speak up and if people want to stop me I'll be more > than glad to step down - but then be prepared to step up yourself and > take care of the mess. > > -hilmar > > On Friday, March 11, 2005, at 11:36 AM, Allen Day wrote: > >> On Fri, 11 Mar 2005, Hilmar Lapp wrote: >> >>> I suggest that all the fancy overloading is removed from core bioperl >>> modules. If we need overloading for stringification or comparison >>> operators in one or our core modules I think we are making a mistake. >> >> The overloading is only there because assumptions have been made that >> annotations will be strings. This assumption was okay previously >> becasue >> the Bio::Annotation* modules were previously "non core" -- there was >> no >> unified annotation system in bioperl. Now these modules are being >> made >> core, and this is part of the growing pain. >> >> I'm doing what I can to address the bug reports related to these >> changes >> as they come in, and I don't think anyone will disagree that I'm doing >> so >> in a timely manner. However, I cannot fix bugs or field questions on >> biosql modules and would appreciate some cooperation/assistance from >> the >> biosql developers. >> >>> This is part of the huge mess introduced when the SeqFeatureI >>> architecture was carelessly changed days before release. It's a >>> prototypical example for what not to do in a project that's as widely >>> used as bioperl. >> >> The SeqFeatureI changes were being gradually made in the 1-2 months >> prior >> to the 1.5 release. The release was, may I remind you, a *developer* >> release and not expected to be bug free. >> >>> *Every single bit* of those changes need to be rolled back from the >>> release and if nobody else has done it by then I will do so in two >>> weeks. >> >> Fine for the 1.5.1 branch, although I don't agree that this should be >> done >> on the main trunk. >> >> -Allen >> >> >>> -hilmar >>> >>> On Thursday, March 10, 2005, at 05:57 PM, Allen Day wrote: >>> >>>> I'm unable to test the code in PersistentObject.pm as I don't have >>>> biosql >>>> set up, but you might try adding this to Reference.pm >>>> >>>> use overload 'ne' => sub { "$_[0]" ne "$_[1]" } >>>> >>>> Please let me know if this fixes your error and I'll add this 'ne' >>>> overload to all the Bio::Annotation::* classes on HEAD. >>>> >>>> -Allen >>>> >>>> >>>> On Wed, 9 Mar 2005, Daniel Lang wrote: >>>> >>>>> Hi, >>>>> I?m retrieving seq objects from a local biosql db (using the latest >>>>> cvs >>>>> verion of bioperl-db) and e.g. writing them with SeqIO. After >>>>> changing >>>>> from a cvs version ~ 12/04 to RC1.5 or latest cvs version, I get >>>>> the >>>>> following error: >>>>> >>>>> Operation `ne': no method found,!!left argument in overloaded >>>>> package >>>>> Bio::Annotation::Reference,!!right argument has no overloaded magic >>>>> at >>>>> /usr/lib/perl5/site_perl/5.6.1/Bio/DB/Persistent/ >>>>> PersistentObject.pm >>>>> line 534, line 1.! >>>>> >>>>> The module PersistentObject.pm hasn?t changed and in Reference.pm >>>>> there >>>>> is only this change: >>>>> >>>>> diff bioperl-live-Dec04/Bio/Annotation/Reference.pm >>>>> bioperl-live/Bio/Annotation/Reference.pm >>>>> 1c1 >>>>> < # $Id: Reference.pm,v 1.21 2004/08/19 20:13:32 lapp Exp $ >>>>> --- >>>>>> # $Id: Reference.pm,v 1.22 2005/02/02 22:13:22 allenday Exp $ >>>>> 56c56,57 >>>>> < # use overload '""' => \&as_text; >>>>> --- >>>>>> use overload '""' => sub { $_[0]->title || ''}; >>>>>> use overload 'eq' => sub { "$_[0]" eq "$_[1]" }; >>>>> >>>>> I?ve reversed this, but no positive result - the error remains... >>>>> Any hints? >>>>> >>>>> Thanks in advance, >>>>> Daniel >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l@portal.open-bio.org >>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l@portal.open-bio.org >>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>> >> >> > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp at gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From Mingyi.Liu at gpc-biotech.com Sat Mar 12 23:12:37 2005 From: Mingyi.Liu at gpc-biotech.com (Liu, Mingyi) Date: Sat Mar 12 23:08:14 2005 Subject: [Bioperl-l] Entrez Gene ASN parsers Message-ID: <15C0817A76D1B74C8E3EEA0FADE464A4CBAC2F@sw-wal-beta.gpc-biotech.com> > -----Original Message----- > From: Hilmar Lapp [mailto:hlapp@gmx.net] > Sent: Saturday, March 12, 2005 7:33 PM > To: Liu, Mingyi > Cc: Stefan Kirov; bioperl-l@portal.open-bio.org > Subject: Re: [Bioperl-l] Entrez Gene ASN parsers > > > I kind of like this approach, i.e., have a general purpose low-level > parser that you have reasonable confidence in will never be the > bottleneck, and then build a bioperl parser on top of it that now can > focus its code on assembling the desired data structure as opposed to > the file format itself. > That was my intention too. I saw plenty of requests that NCBI release Entrez Gene in XML format. But suppose that NCBI did release XML-formatted Entrez Gene files, then to build bioperl objects from the XML files one could take several approaches: 1. write a module that directly deals with (parses) the XML tags and code everything including object instantiations along with parsing code. Or, more likely, 2. write a module that utilizes the service of an XML parser, let it do its work and make a data structure, then create all objects using that data structure. This way there's a clear code separation, and one only needs to worry about the data, not the parsing. My parser does to NCBI's ASN.1 EntrezGene file what an XML parser does to a yet-to-exist XML-formatted EntrezGene file (or better than it, if NCBI decides to code Entrez Gene in the XML format that Eutils provide). And it performs better than XML parsers. So I really don't think there's any need for XML file from NCBI. > And if course assembling that data structure will slow things down a > lot but hey, either you want an object hierarchy in (bio-)perl or you > don't. I also agree that using external parser users could choose what they like: (bio)perl objects containing the Entrez Gene data, or just directly use the data structure to pick and choose data. More flexible for both developers and users. Just my two cents. Mingyi From hlapp at gmx.net Sat Mar 12 23:54:14 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat Mar 12 23:48:43 2005 Subject: [Bioperl-l] Entrez Gene ASN parsers In-Reply-To: <15C0817A76D1B74C8E3EEA0FADE464A4CBAC2F@sw-wal-beta.gpc-biotech.com> Message-ID: On Saturday, March 12, 2005, at 08:12 PM, Liu, Mingyi wrote: > > My parser does to NCBI's ASN.1 EntrezGene file what an XML parser does > to a yet-to-exist XML-formatted EntrezGene file (or better than it, if > NCBI decides to code Entrez Gene in the XML format that Eutils > provide). This is apparently what they will be doing, or at least my understanding of it. The discomforting thing is that it's taken them so long already to come up with that supposedly little tool. In fact, apparently the fact they weren't able to provide the off-line tool yet is the reason that they're still maintaining the LocusLink download. That's what they told me in a response to an inquiry. Although from Monday on they'll remove C.elegans and fruitfly from LL_tmpl. Not good. > And it performs better than XML parsers. Actually, even an expat-based XML parser would be by orders of magnitude slower than your regexp-based. The question is how safe are your regexps from possibly unexpected things like escaped quotes or an escaped curly brace that's part of a string and not end of an entity etc or whatever might confuse your regexps. Maybe in ASN.1 this isn't a big deal? I just have too little knowledge about ASN.1 to make any judgment here. > > So I really don't think there's any need for XML file from NCBI. Yeah, I actually started to change my mind w.r.t. waiting for the XML format. -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From Mingyi.Liu at gpc-biotech.com Sun Mar 13 00:17:59 2005 From: Mingyi.Liu at gpc-biotech.com (Liu, Mingyi) Date: Sun Mar 13 00:14:11 2005 Subject: [Bioperl-l] Entrez Gene ASN parsers Message-ID: <15C0817A76D1B74C8E3EEA0FADE464A4CBAC30@sw-wal-beta.gpc-biotech.com> > > My parser does to NCBI's ASN.1 EntrezGene file what an XML > parser does > > to a yet-to-exist XML-formatted EntrezGene file (or better > than it, if > > NCBI decides to code Entrez Gene in the XML format that Eutils > > provide). > > This is apparently what they will be doing, or at least my > understanding of it. That's logical, but not good. I really don't like the XML format Eutils provided. In fact, I heard few people did. > The question is how safe are your regexps from possibly unexpected > things like escaped quotes or an escaped curly brace that's part of a > string and not end of an entity etc or whatever might confuse your > regexps. It's not a problem. In my parsers these situations are dealt with already. So far, nothing in the latest human, mouse, rat breaks the parser. I didn't test on other genomes, but they should work fine. BTW, an unrelated question: Do you know why is it that my reply mails always started new threads in Bioperl-l mailing list archive, whereas others' (like yours) form a nice thread? Thanks Mingyi From hlapp at gmx.net Sun Mar 13 00:49:14 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun Mar 13 00:43:43 2005 Subject: [Bioperl-l] Entrez Gene ASN parsers In-Reply-To: <15C0817A76D1B74C8E3EEA0FADE464A4CBAC30@sw-wal-beta.gpc-biotech.com> Message-ID: On Saturday, March 12, 2005, at 09:17 PM, Liu, Mingyi wrote: > > BTW, an unrelated question: Do you know why is it that my reply mails > always started new threads in Bioperl-l mailing list archive, whereas > others' (like yours) form a nice thread? No idea. I have no idea by what kind of header mailman opens a new thread or recognizes an existing. However, I notice the following differences between your and my email headers that might pertain to threads: Only in your reply: Thread-Topic: Entrez Gene ASN parsers Thread-Index: AcUnGiosN/cq6k3WQYeV1nHcNmaGJg== Only in my reply: In-Reply-To: <15C0817A76D1B74C8E3EEA0FADE464A4CBAC2F@sw-wal-beta.gpc-biotech.com> The latter is precisely the message ID of your email that I replied to: Message-ID: <15C0817A76D1B74C8E3EEA0FADE464A4CBAC2D@sw-wal-beta.gpc-biotech.com> -hilmar > > Thanks > > Mingyi > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From cain at cshl.edu Sun Mar 13 04:11:17 2005 From: cain at cshl.edu (Scott Cain) Date: Sun Mar 13 04:07:04 2005 Subject: [Bioperl-l] strange error after changing to RC1.5 In-Reply-To: Message-ID: Hilmar, I sympathize with your frustration, but backing out those changes will break several tools that I've written for chado/gmod. This is an area of active development, and users a frequently advised to update to bioperl-live. Since there are users of that as well, how do we decide who gets to feel the pain? Scott ---------------------------------------------------------------------- Scott Cain, Ph. D. cain@cshl.org GMOD Coordinator, http://www.gmod.org/ (216)392-3087 ---------------------------------------------------------------------- On Sat, 12 Mar 2005, Hilmar Lapp wrote: > My first response to this was a long rant about almost every single one > of your statements and which may have been mildly entertaining for > people while the TV is on commercial. In the end I calmed down and > thought people have probably better things to do than reading my rants > (should I start a bioperl blog?), so here is the same in a gist and > without (most of) the rant. > > - In my opinion the annotation system is core, like everything is by > definition that attaches to a Bio::SeqI. > > - I'm not ever going to turn away people who took to the code to fill > gaps or ambiguities in the documentation - API assumptions based on > what the code did for years count as a binding contract just as > expressly written contracts do. > > - I am strongly opposed to the notion that your customers should to > the testing for your wild innovations as opposed to yourself doing that > in advance, regardless of how fast or slow you respond to bug reports; > people have better things to do than ironing out your revolution. > > - I *am* going to back out the changes from the main trunk; > traditionally, in bioperl the main trunk has *not* been used for wild > experiments the repercussions of which were not really clear - instead > people opened their branches for that. > > Allen feel free to reintroduce your changes and overloads and all kinds > of crazy stuff on a branch that you open. We need the main trunk free > of debris as the road to the next releases to come. Feel free to wreck > the train elsewhere. People need the bugfixes now and Lincoln's > additions that aren't in 1.4.x. > > Of course, this being a community project, everybody who disagrees > please feel free to speak up and if people want to stop me I'll be more > than glad to step down - but then be prepared to step up yourself and > take care of the mess. > > -hilmar > > On Friday, March 11, 2005, at 11:36 AM, Allen Day wrote: > > > On Fri, 11 Mar 2005, Hilmar Lapp wrote: > > > >> I suggest that all the fancy overloading is removed from core bioperl > >> modules. If we need overloading for stringification or comparison > >> operators in one or our core modules I think we are making a mistake. > > > > The overloading is only there because assumptions have been made that > > annotations will be strings. This assumption was okay previously > > becasue > > the Bio::Annotation* modules were previously "non core" -- there was no > > unified annotation system in bioperl. Now these modules are being made > > core, and this is part of the growing pain. > > > > I'm doing what I can to address the bug reports related to these > > changes > > as they come in, and I don't think anyone will disagree that I'm doing > > so > > in a timely manner. However, I cannot fix bugs or field questions on > > biosql modules and would appreciate some cooperation/assistance from > > the > > biosql developers. > > > >> This is part of the huge mess introduced when the SeqFeatureI > >> architecture was carelessly changed days before release. It's a > >> prototypical example for what not to do in a project that's as widely > >> used as bioperl. > > > > The SeqFeatureI changes were being gradually made in the 1-2 months > > prior > > to the 1.5 release. The release was, may I remind you, a *developer* > > release and not expected to be bug free. > > > >> *Every single bit* of those changes need to be rolled back from the > >> release and if nobody else has done it by then I will do so in two > >> weeks. > > > > Fine for the 1.5.1 branch, although I don't agree that this should be > > done > > on the main trunk. > > > > -Allen > > > > > >> -hilmar > >> > >> On Thursday, March 10, 2005, at 05:57 PM, Allen Day wrote: > >> > >>> I'm unable to test the code in PersistentObject.pm as I don't have > >>> biosql > >>> set up, but you might try adding this to Reference.pm > >>> > >>> use overload 'ne' => sub { "$_[0]" ne "$_[1]" } > >>> > >>> Please let me know if this fixes your error and I'll add this 'ne' > >>> overload to all the Bio::Annotation::* classes on HEAD. > >>> > >>> -Allen > >>> > >>> > >>> On Wed, 9 Mar 2005, Daniel Lang wrote: > >>> > >>>> Hi, > >>>> I?m retrieving seq objects from a local biosql db (using the latest > >>>> cvs > >>>> verion of bioperl-db) and e.g. writing them with SeqIO. After > >>>> changing > >>>> from a cvs version ~ 12/04 to RC1.5 or latest cvs version, I get the > >>>> following error: > >>>> > >>>> Operation `ne': no method found,!!left argument in overloaded > >>>> package > >>>> Bio::Annotation::Reference,!!right argument has no overloaded magic > >>>> at > >>>> /usr/lib/perl5/site_perl/5.6.1/Bio/DB/Persistent/PersistentObject.pm > >>>> line 534, line 1.! > >>>> > >>>> The module PersistentObject.pm hasn?t changed and in Reference.pm > >>>> there > >>>> is only this change: > >>>> > >>>> diff bioperl-live-Dec04/Bio/Annotation/Reference.pm > >>>> bioperl-live/Bio/Annotation/Reference.pm > >>>> 1c1 > >>>> < # $Id: Reference.pm,v 1.21 2004/08/19 20:13:32 lapp Exp $ > >>>> --- > >>>>> # $Id: Reference.pm,v 1.22 2005/02/02 22:13:22 allenday Exp $ > >>>> 56c56,57 > >>>> < # use overload '""' => \&as_text; > >>>> --- > >>>>> use overload '""' => sub { $_[0]->title || ''}; > >>>>> use overload 'eq' => sub { "$_[0]" eq "$_[1]" }; > >>>> > >>>> I?ve reversed this, but no positive result - the error remains... > >>>> Any hints? > >>>> > >>>> Thanks in advance, > >>>> Daniel > >>>> > >>>> > >>>> > >>>> _______________________________________________ > >>>> Bioperl-l mailing list > >>>> Bioperl-l@portal.open-bio.org > >>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l > >>>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l@portal.open-bio.org > >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l > >>> > >>> > >> > > > > > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp at gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From hlapp at gmx.net Sun Mar 13 05:06:46 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun Mar 13 05:02:01 2005 Subject: [Bioperl-l] strange error after changing to RC1.5 In-Reply-To: Message-ID: <9BD28A60-93A7-11D9-9542-000A959EB4C4@gmx.net> Can you formulate as a test case how you use the API such it depends on those changes that I may be targeting? Or are these test cases in a gmod package that can be set up reasonably simple? Are you sure that what you depend on are the changes to the Bio::SeqFeatureI and Bio::Annotation::* modules? If it's only SeqFeature::Annotated, my plan was to leave that as much intact as possible. -hilmar On Sunday, March 13, 2005, at 01:11 AM, Scott Cain wrote: > Hilmar, > > I sympathize with your frustration, but backing out those changes will > break several tools that I've written for chado/gmod. This is an > area of active development, and users a frequently advised to update > to bioperl-live. Since there are users of that as well, how do we > decide > who gets to feel the pain? > > Scott > > ---------------------------------------------------------------------- > Scott Cain, Ph. D. cain@cshl.org > GMOD Coordinator, http://www.gmod.org/ (216)392-3087 > ---------------------------------------------------------------------- > > > On Sat, 12 Mar 2005, Hilmar Lapp wrote: > >> My first response to this was a long rant about almost every single >> one >> of your statements and which may have been mildly entertaining for >> people while the TV is on commercial. In the end I calmed down and >> thought people have probably better things to do than reading my rants >> (should I start a bioperl blog?), so here is the same in a gist and >> without (most of) the rant. >> >> - In my opinion the annotation system is core, like everything is by >> definition that attaches to a Bio::SeqI. >> >> - I'm not ever going to turn away people who took to the code to fill >> gaps or ambiguities in the documentation - API assumptions based on >> what the code did for years count as a binding contract just as >> expressly written contracts do. >> >> - I am strongly opposed to the notion that your customers should to >> the testing for your wild innovations as opposed to yourself doing >> that >> in advance, regardless of how fast or slow you respond to bug reports; >> people have better things to do than ironing out your revolution. >> >> - I *am* going to back out the changes from the main trunk; >> traditionally, in bioperl the main trunk has *not* been used for wild >> experiments the repercussions of which were not really clear - instead >> people opened their branches for that. >> >> Allen feel free to reintroduce your changes and overloads and all >> kinds >> of crazy stuff on a branch that you open. We need the main trunk free >> of debris as the road to the next releases to come. Feel free to wreck >> the train elsewhere. People need the bugfixes now and Lincoln's >> additions that aren't in 1.4.x. >> >> Of course, this being a community project, everybody who disagrees >> please feel free to speak up and if people want to stop me I'll be >> more >> than glad to step down - but then be prepared to step up yourself and >> take care of the mess. >> >> -hilmar >> >> On Friday, March 11, 2005, at 11:36 AM, Allen Day wrote: >> >>> On Fri, 11 Mar 2005, Hilmar Lapp wrote: >>> >>>> I suggest that all the fancy overloading is removed from core >>>> bioperl >>>> modules. If we need overloading for stringification or comparison >>>> operators in one or our core modules I think we are making a >>>> mistake. >>> >>> The overloading is only there because assumptions have been made that >>> annotations will be strings. This assumption was okay previously >>> becasue >>> the Bio::Annotation* modules were previously "non core" -- there was >>> no >>> unified annotation system in bioperl. Now these modules are being >>> made >>> core, and this is part of the growing pain. >>> >>> I'm doing what I can to address the bug reports related to these >>> changes >>> as they come in, and I don't think anyone will disagree that I'm >>> doing >>> so >>> in a timely manner. However, I cannot fix bugs or field questions on >>> biosql modules and would appreciate some cooperation/assistance from >>> the >>> biosql developers. >>> >>>> This is part of the huge mess introduced when the SeqFeatureI >>>> architecture was carelessly changed days before release. It's a >>>> prototypical example for what not to do in a project that's as >>>> widely >>>> used as bioperl. >>> >>> The SeqFeatureI changes were being gradually made in the 1-2 months >>> prior >>> to the 1.5 release. The release was, may I remind you, a *developer* >>> release and not expected to be bug free. >>> >>>> *Every single bit* of those changes need to be rolled back from the >>>> release and if nobody else has done it by then I will do so in two >>>> weeks. >>> >>> Fine for the 1.5.1 branch, although I don't agree that this should be >>> done >>> on the main trunk. >>> >>> -Allen >>> >>> >>>> -hilmar >>>> >>>> On Thursday, March 10, 2005, at 05:57 PM, Allen Day wrote: >>>> >>>>> I'm unable to test the code in PersistentObject.pm as I don't have >>>>> biosql >>>>> set up, but you might try adding this to Reference.pm >>>>> >>>>> use overload 'ne' => sub { "$_[0]" ne "$_[1]" } >>>>> >>>>> Please let me know if this fixes your error and I'll add this 'ne' >>>>> overload to all the Bio::Annotation::* classes on HEAD. >>>>> >>>>> -Allen >>>>> >>>>> >>>>> On Wed, 9 Mar 2005, Daniel Lang wrote: >>>>> >>>>>> Hi, >>>>>> I?m retrieving seq objects from a local biosql db (using the >>>>>> latest >>>>>> cvs >>>>>> verion of bioperl-db) and e.g. writing them with SeqIO. After >>>>>> changing >>>>>> from a cvs version ~ 12/04 to RC1.5 or latest cvs version, I get >>>>>> the >>>>>> following error: >>>>>> >>>>>> Operation `ne': no method found,!!left argument in overloaded >>>>>> package >>>>>> Bio::Annotation::Reference,!!right argument has no overloaded >>>>>> magic >>>>>> at >>>>>> /usr/lib/perl5/site_perl/5.6.1/Bio/DB/Persistent/ >>>>>> PersistentObject.pm >>>>>> line 534, line 1.! >>>>>> >>>>>> The module PersistentObject.pm hasn?t changed and in Reference.pm >>>>>> there >>>>>> is only this change: >>>>>> >>>>>> diff bioperl-live-Dec04/Bio/Annotation/Reference.pm >>>>>> bioperl-live/Bio/Annotation/Reference.pm >>>>>> 1c1 >>>>>> < # $Id: Reference.pm,v 1.21 2004/08/19 20:13:32 lapp Exp $ >>>>>> --- >>>>>>> # $Id: Reference.pm,v 1.22 2005/02/02 22:13:22 allenday Exp $ >>>>>> 56c56,57 >>>>>> < # use overload '""' => \&as_text; >>>>>> --- >>>>>>> use overload '""' => sub { $_[0]->title || ''}; >>>>>>> use overload 'eq' => sub { "$_[0]" eq "$_[1]" }; >>>>>> >>>>>> I?ve reversed this, but no positive result - the error remains... >>>>>> Any hints? >>>>>> >>>>>> Thanks in advance, >>>>>> Daniel >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l@portal.open-bio.org >>>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l@portal.open-bio.org >>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>> >>> >>> >> -- >> ------------------------------------------------------------- >> Hilmar Lapp email: lapp at gnf.org >> GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 >> ------------------------------------------------------------- >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From mail2doreen at gmx.de Sun Mar 13 06:54:49 2005 From: mail2doreen at gmx.de (mail2doreen@gmx.de) Date: Sun Mar 13 06:50:13 2005 Subject: [Bioperl-l] gff2genbank Message-ID: <29912.1110714889@www58.gmx.net> Hello, i convert a gff file into genbank format using Bio::Tools::GFF Bio::SeqIO but the cds data in the output are not joined. CDS complement(17262..17813) /gene_id="899.t00001" /transcript_id="899.m00029" CDS complement(17879..18174) /gene_id="899.t00001" /transcript_id="899.m00029" This is what i need: CDS complement(join(17262..17813,17879..18174)). How can i solve this problem? Greetings -- SMS bei wichtigen e-mails und Ihre Gedanken sind frei ... Alle Infos zur SMS-Benachrichtigung: http://www.gmx.net/de/go/sms From mingyi.liu at gpc-biotech.com Sun Mar 13 10:03:41 2005 From: mingyi.liu at gpc-biotech.com (Mingyi Liu) Date: Sun Mar 13 09:59:43 2005 Subject: [Bioperl-l] Porting Entrez Gene parser to Biojava, Biopython, Biophp, even C++ In-Reply-To: References: Message-ID: <4234564D.7010906@gpc-biotech.com> I forgot to mention another advantage of having a purely regex based small parser means very easy porting into any language that supports perl styled regular expressions, like Java, Python, PHP, C++ with PCRE (used by php and python). There could potentially be performance hit to any perl parsers ported into those languages. Mainly because AFAIK there is a lack of full support for all the modifiers for Perl regex, so unless I missed something, we'd have to either code some modifier logic in the program or use string replacement. Nevertheless, the smaller the parser, the less pain in porting. Just some more cents (and advocation) :) From dalke at dalkescientific.com Sun Mar 13 14:07:55 2005 From: dalke at dalkescientific.com (Andrew Dalke) Date: Sun Mar 13 14:03:01 2005 Subject: [Bioperl-l] Porting Entrez Gene parser to Biojava, Biopython, Biophp, even C++ In-Reply-To: <4234564D.7010906@gpc-biotech.com> References: <4234564D.7010906@gpc-biotech.com> Message-ID: <97e5143a0b434ab721cd023556a1a5b2@dalkescientific.com> Mingyi Liu wrote: > I forgot to mention another advantage of having a purely regex based > small parser means very easy porting into any language that supports > perl styled regular expressions, like Java, Python, PHP, C++ with PCRE > (used by php and python). I developed Martel ( http://www.dalkescientific.com/Martel/ ) to do just this sort of thing - describe a typical bioinformatics file format as a set of declarations instead of as a set of code. It works but turns out to be hard to maintain. Here's a list of problems I came up with - regexps are hard to write and debug Could be improved with some sort of development/ testing environment - Martel's grammars are hard to edit When a grammar changes it's not possible to say "the new format is the old format but change this one bottom level node". I'm actually considering switching over to a DOM-style description of the tree so I can use XSLT as the editing language. Except that I think XSLT's grammar is clumsy and ugly. - Martel needs everything in memory I implemented a hack to parse a record at a time but it's a hack and fails (except on large memory machines) for people who want to read a chromosome at a time. I would also like it to be feed based instead of pull based. I found that normal regular expressions weren't quite powerful enough to handle the format so needed to implement a new feature for some file formats which include a count of the N, the number of records followed by N repeats of those counts. When I wrote my grammars I did so in strict mode, and reported a bunch of errors to the database providers. The advantage is that wrong formats aren't accidently parsed. The disadvantage is that minor changes break the parser. I don't see any solution to this other than having someone track the file formats over time. > There could potentially be performance hit to any perl parsers > ported into those languages. Mainly because AFAIK there is a > lack of full support for all the modifiers for Perl regex, so > unless I missed something, we'd have to either code some modifier > logic in the program or use string replacement. I looked at the regexps. The ones that Python doesn't support are \G and the compilation flags /cg . They won't be in Python because the start/end positions are available as local variables and not as implicit globals. It uses a different stylism. Years ago I did some timing tests for parsing SWISS-PROT records using a large number of parsers (~20). I found a wide range of timings, from 1 minute to 40 minutes. The diversity is because there are many different types of things that might be done with a file. If the task is simple ("how many record are in this file?") then a simple parser is all that's needed. http://biopython.org/pipermail/biopython/2001-January/000472.html http://biopython.org/pipermail/biopython-dev/2001-January/000257.html The first of these lists some tasks that can't be done with your approach, like being able to index all the records in a file by byte position. Parsers can also get better performance by assuming the file format is correct. Eg, your EntrezGene.pm doesn't detect if the file was truncated (I fed it only the first 1000 lines of the human genome file) while the context-free parsers you have will at least generate an error that the parenthesis are unbalanced. One thing I note, investigating a question of Hilmar's, is that your tokenization of strings isn't quite complete. Double-quoted "strings" that contain a double quote are escaped ""with doubled"" double quotes. Your tokenizer doesn't convert the double quotes into single ones. My Martel code has the same problem. It needed another layer to describe how to unescape strings and handle word spilling. > Just some more cents (and advocation) :) This email too is advocation. I like the idea of having one set of format definitions that can be shared across the different code bases. It's proved rather difficult and tedious to implement. I hope that some of my experience will help you or the next person working on the problem. Andrew Dalke dalke@dalkescientific.com From mingyi.liu at gpc-biotech.com Sun Mar 13 16:44:57 2005 From: mingyi.liu at gpc-biotech.com (Mingyi Liu) Date: Sun Mar 13 16:41:36 2005 Subject: [Bioperl-l] Porting Entrez Gene parser to Biojava, Biopython, Biophp, even C++ In-Reply-To: <97e5143a0b434ab721cd023556a1a5b2@dalkescientific.com> References: <4234564D.7010906@gpc-biotech.com> <97e5143a0b434ab721cd023556a1a5b2@dalkescientific.com> Message-ID: <4234B459.7020109@gpc-biotech.com> Andrew Dalke wrote: > When I wrote my grammars I did so in strict mode, and reported > a bunch of errors to the database providers. The advantage > is that wrong formats aren't accidently parsed. The disadvantage > is that minor changes break the parser. > > I don't see any solution to this other than having someone > track the file formats over time. > Sure. If there's arbitrary and drastic changes to file format, there must be someone watching the change . But one of my points was that my parser would likely stay valid even if NCBI changes their data definitions because it's very unlikely that NCBI changes their file structure/format, although they'd change data definitions (recall that I said my parser doesn't care about data content?) > I looked at the regexps. The ones that Python doesn't > support are \G and the compilation flags /cg . They won't > be in Python because the start/end positions are available > as local variables and not as implicit globals. It > uses a different stylism. > You're right. The /cg modifiers are exactly the ones I was talking about. \G is actually supprted by PCRE, so very likely in Python too since Python uses PCRE (please check again). Nonetheless, without /cg, \G means little. That's why I said there's gonna be a performance hit. > The first of these lists some tasks that can't be done > with your approach, like being able to index all the > records in a file by byte position. > Not really. If you really want those, my parser code can be easily modified to record the file byte position of each token. > Parsers can also get better performance by assuming the > file format is correct. Eg, your EntrezGene.pm doesn't > detect if the file was truncated (I fed it only the first > 1000 lines of the human genome file) while the context-free > parsers you have will at least generate an error that > the parenthesis are unbalanced. Yeah, my parser does not give much warnings at current stage. I certainly wouldn't mind someone taking my code and add exception handling. But frankly many parsers do not excel in this department. Even some XML parsers only warn when something breaks the parser. > > One thing I note, investigating a question of Hilmar's, > is that your tokenization of strings isn't quite complete. > Double-quoted "strings" that contain a double quote are > escaped ""with doubled"" double quotes. Your tokenizer > doesn't convert the double quotes into single ones. My > Martel code has the same problem. It needed another > layer to describe how to unescape strings and handle > word spilling. > You caught me. I was just being lazy - I noticed this a while ago, but decided to delay a bit since I have 4 different parsers that need to be modified. Then I forgot. (it's probably my fault that actually last night I remembered this too, and I just uploaded the files anyway 'cause it's so simple to fix by anybody anyway). I'd say you're really exaggerating when you said my tokenization of string isn't complete based on this. Not unescaping the "" escape has nothing to do with tokenization (it's a post-processing step after tokenization). It simply take one simple regex to fix it, no other layer needed. Thanks for your suggestions. I think problems specific to Martel might not apply in this case since Entrez Gene file structure/format is really simple, and they are likely to stay very stable. That's why I was proposing sharing this code base across languages. Thanks, Mingyi From mingyi.liu at gpc-biotech.com Sun Mar 13 17:47:08 2005 From: mingyi.liu at gpc-biotech.com (Mingyi Liu) Date: Sun Mar 13 17:42:58 2005 Subject: [Bioperl-l] Entrez Gene parsers updated In-Reply-To: <97e5143a0b434ab721cd023556a1a5b2@dalkescientific.com> References: <4234564D.7010906@gpc-biotech.com> <97e5143a0b434ab721cd023556a1a5b2@dalkescientific.com> Message-ID: <4234C2EC.1020805@gpc-biotech.com> Hi, Andrew, Thanks to your quick spotting of the unescaping "" issue, I realized unless I fix it, my parsers are gonna be tagged "incomplete" :-) . So I just released version 1.01 that fixed this. I guess I can't get away with the early-Linux/OSS-developer-mentality - "I've done the hard work, so for the small things like documentation or fixing tiny bug in open source software, it's user's responsibility", especially when I myself really dislike the attitude. :-) Thanks again & best, Mingyi From dalke at dalkescientific.com Sun Mar 13 19:34:11 2005 From: dalke at dalkescientific.com (Andrew Dalke) Date: Sun Mar 13 19:29:13 2005 Subject: [Bioperl-l] Porting Entrez Gene parser to Biojava, Biopython, Biophp, even C++ In-Reply-To: <4234B459.7020109@gpc-biotech.com> References: <4234564D.7010906@gpc-biotech.com> <97e5143a0b434ab721cd023556a1a5b2@dalkescientific.com> <4234B459.7020109@gpc-biotech.com> Message-ID: <8a66c16373c43a36b13b98acda3288a3@dalkescientific.com> Mingyi Liu wrote: > Sure. If there's arbitrary and drastic changes to file format, there > must be someone watching the change . But one of my points was that > my parser would likely stay valid even if NCBI changes their data > definitions because it's very unlikely that NCBI changes their file > structure/format, Ah, I was mixing two topics - using this set of regexps to parse this file format and the general topic of using regexps portably to parse a range of file formats. > \G is actually supprted by PCRE, so very likely in Python too since > Python uses PCRE (please check again). Nonetheless, without /cg, \G > means little. That's why I said there's gonna be a performance hit. Python used to use pcre but that was replaced with sre some years back, in part to support Unicode-based regexps. It looks like Java's java.util.regex does support the \G flag, says http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html Personally I don't like the lack of thread safety because that value depends on previous interactions of the pattern. I think perl solved it by making those values be thread local, but I'm not sure. >> The first of these lists some tasks that can't be done >> with your approach, like being able to index all the >> records in a file by byte position. >> > Not really. If you really want those, my parser code can be easily > modified to record the file byte position of each token. The code I looked at took a string and there was outer scaffolding to identify the record locations. my $parser = GI::Parser::EntrezGene->new(); open(IN, "Homo_sapiens") || die "..."; $/ = "Entrezgene ::= {"; while() { chomp; next unless /\S/; my $text = (/^\s*Entrezgene ::= ({.*)/si)? $1 : "{" . $_; my $value = $parser->parse($text, 2); .. do something with $value .... } The actual record extraction was not part of the EntrezGene library so I don't see what you could modify. Perhaps add an "offset" field to the parse method? If you do get the byte positions of terms in the ASN.1 (eg to report "syntax error at line 1234 column 56") then you would need to use the $` and $' fields, which perlvar warns is slow, so your timings would change. > Yeah, my parser does not give much warnings at current > stage. I certainly wouldn't mind someone taking my code > and add exception handling. But frankly many parsers do > not excel in this department. Even some XML parsers only > warn when something breaks the parser. Sadly the fun part for most people is making the parser work correctly with correct data. Few people like making parsing code correctly handle incorrect data. Hence all the parsers which "do not excel in this department." > You caught me. I was just being lazy - I noticed this a while ago, > but decided to delay a bit since I have 4 different parsers that need > to be modified. ... > I'd say you're really exaggerating when you said my tokenization of > string isn't complete based on this. There are several layers to parsing. One is identifying the lexical components, which can be done with regular expressions. The lexer should convert these into tokens that the parser can use, which may include things like unescaping quotes, concatenating strings, normalizing different numeric representations (0xa == 10 == 012 -> the integer 10). I don't actually know how to distinguish between these two parts of the lexer. One is the LHS of the pattern definition and the other is the result of applying the RHS actions to the matched components. If the actions were a null-op then there is no difference. Your parser though doesn't return a token stream, it returns a parse tree, so you've already passed the step where any sort of data conversion / normalization should take place. But if you define that your parse tree returns the raw text representation then it is complete. My question - which I haven't been able to resolve for Martel - is how should code like this, which tries to be cross-platform, handle what is semantically one item when it's represented as multiple components in the input format? Here are two examples to show how tricky that is url "http://www.ncbi.nlm.nih.gov/sutils/evv.cgi?taxid=9606&conti g=NT_009714.16&gene=A2M&lid=2&from=1979284&to=2027463" text "There is a significant genetic association of the 5 bp deletion and two novel polymorphisms in alpha-2-macroglobulin alpha-2-macroglobulin precursor with AD", In the first the "\n" should be removed while in the second it should be replaced with a space. It would be nice if this behavior was also the same cross-platform. > Not unescaping the "" escape has nothing to do with tokenization > (it's a post-processing step after tokenization). It simply take one > simple regex to fix it, no other layer needed. It's post tokenization and pre parse tree assembly. For this case it's a simple regexp search/replace but 1) how is that handled in a cross platform manner and 2) for the general problem it's not as simple as a regexp. > Thanks for your suggestions. I think problems specific to Martel > might not apply in this case since Entrez Gene file structure/format > is really simple, and they are likely to stay very stable. That's why > I was proposing sharing this code base across languages. Indeed some of the problems don't apply. But speaking solely for myself and not for the Biopython project I would rather use a validating parser that reported at least imbalanced parens, roughly equivalent to checking for well-formed XML. One question I have is that while I know the file format is stable, given that it's based on ASN.1, what are the chances of new tags being added which are still valid ASN.1 but which are not yet present in the existing files? For example, in reading the ASN.1 spec at http://asn1.elibel.tm.fr/en/standards/index.htm#x680 I see that ASN.1 could include a real number but the Homo_sapiens file doesn't have one and your parser doesn't handle it (it looks for [\w-]). Mmm, and there are many more data types in full ASN.1. As far as I can tell, if NCBI does add a new data type that your code doesn't support then it's very hard to tell that the code is ignoring problems. Consider a floating point date value (not legal according toe NCBI but legal ASN.1. .. I think - just testing the idea) track-info { geneid 1, status live, create-date std { year 2003.43, month 8, day 28, hour 20, minute 30, second 0 }, Your code converts that into 'track-info' => [ { 'geneid' => '1', 'create-date' => [ { 'std' => [ { 'year' => [ { '2003' => [ undef ] } ] } ] } ], 'status' => 'live' } ] That doesn't seem like the behavior it should do. BTW, looking at what you do, I don't understand why you handle the explicit types fields as you do. Why does tag id 9606 turn into 'tag' => [ { 'id' => '11' } ], As far as I can tell there's only a single data type there so what about omitting the list reference? 'tag' => { 'id' => '11' }, But I don't know enough about ASN.1. Andrew dalke@dalkescientific.com From mingyi.liu at gpc-biotech.com Sun Mar 13 21:44:57 2005 From: mingyi.liu at gpc-biotech.com (Mingyi Liu) Date: Sun Mar 13 21:40:53 2005 Subject: [Bioperl-l] Porting Entrez Gene parser to Biojava, Biopython, Biophp, even C++ In-Reply-To: <8a66c16373c43a36b13b98acda3288a3@dalkescientific.com> References: <4234564D.7010906@gpc-biotech.com> <97e5143a0b434ab721cd023556a1a5b2@dalkescientific.com> <4234B459.7020109@gpc-biotech.com> <8a66c16373c43a36b13b98acda3288a3@dalkescientific.com> Message-ID: <4234FAA9.9050102@gpc-biotech.com> Andrew Dalke wrote: > Python used to use pcre but that was replaced with sre some years > back, in part to support Unicode-based regexps. > I see. Doesn't matter anyway. I do want to note that this \G /cg is purely for parser efficiency, so s/// would work just fine except at least an order of magnitude slower with large Entrez Gene records. So just as I said, porting is fine, but performance will take a hit. Then again, any parser relying on regex would need \G /cg for performance, and would be hit when ported over. > The code I looked at took a string and there was outer > scaffolding to identify the record locations. > > The actual record extraction was not part of the EntrezGene > library so I don't see what you could modify. Perhaps add > an "offset" field to the parse method? > Seems what you're looking for in a parser is a do-it-all text processor. It parses, it indexes, and it adapts (read on for my comment on this one). But I strictly said my parser is parser only. Now with that out of the way, let me address your question: Yes, since my parser is parser only, if you want to use it for indexing purpose, then you'd have to keep position in outer scaffolding or custom programs, and make simple changes like calling pos function after token generation to record position of token in input string (a truncated Entrez Gene record). It's all doable, but I just wouldn't put the indexing code into a parser. > If you do get the byte positions of terms in the ASN.1 > (eg to report "syntax error at line 1234 column 56") then > you would need to use the $` and $' fields, which perlvar > warns is slow, so your timings would change. Yeah, I know. If my parser tries to do more, sure it'd get slower. ;-) > There are several layers to parsing. ... > > But if you define that your parse tree returns the raw text > representation then it is complete. My question - which I > haven't been able to resolve for Martel - is how should code > like this, which tries to be cross-platform, handle what > is semantically one item when it's represented as multiple > components in the input format? > > Here are two examples to show how tricky that is > > url "http://www.ncbi.nlm.nih.gov/sutils/evv.cgi?taxid=9606&conti > g=NT_009714.16&gene=A2M&lid=2&from=1979284&to=2027463" > > text "There is a significant genetic association of the 5 bp > deletion > and two novel polymorphisms in alpha-2-macroglobulin > alpha-2-macroglobulin > precursor with AD", > > In the first the "\n" should be removed while in the second > it should be replaced with a space. > > It would be nice if this behavior was also the same cross-platform. > I think the phrase you were looking for instead of "what is semantically one item when it's represented as multiple components in the input format?" is simply "context-sensitive rules". Context-sensitivity can be cross-platform, but my parser does not need to deal with it (note that how to replace the "\n" really is user's preference and none of parser's business. You might want to replace the 2nd one with space, but another person might want it to be replaced with "
"). Even if you find a better example, I could suggest you look to my Parse::RecDescent based parser, since Parse::RecDescent allows context-senstive grammar. And also one should know that coding context-sensitivity in regex is also not that hard, but you do need to have a well defined set of scenarios and rules. > > It's post tokenization and pre parse tree assembly. For this > case it's a simple regexp search/replace but 1) how is that handled > in a cross platform manner My parser is regex based. Any change in the perl parser could be reflected in other languages (I still prefer language instead of platforms though, since this is really the point. My parsers are already cross-platform, they're supported by any platform that supports Perl). There could be changes that are needed, like unsupported modifiers, but you wouldn't think that porting across languages should not ask developers to do anything, right? What needs to be done should be determined on a case-by-case manner. I can't think of a generic response that is panacea for all porting cases. > and 2) for the general problem it's > not as simple as a regexp. > Exactly. If you read my comments on my parsers, I mentioned that when things get more complex, use those grammar-based tools instead. Right now, for Entrez Gene, regex works and it works best, that's why I mostly talk about this one. But you're very welcome to check other ones out for completeness. > Indeed some of the problems don't apply. But speaking solely for > myself and not for the Biopython project I would rather use a > validating parser that reported at least imbalanced parens, > roughly equivalent to checking for well-formed XML. Of course. I could suggest that such checking can easily be added to my parser, with one variable tracking depth - that's all that's needed since Entrez Gene only has one type of block delimiter. I'll probably do it when I have time next week since it's only 3 lines of code or so. But then again, I start to realize that you would rather use some other parser ranyway. > > For example, in reading the ASN.1 spec at > http://asn1.elibel.tm.fr/en/standards/index.htm#x680 > I see that ASN.1 could include a real number but the > Homo_sapiens file doesn't have one and your parser doesn't > handle it (it looks for [\w-]). Mmm, and there are many > more data types in full ASN.1. > Mmm, you really tried hard to let me know that my parser can not do it all. ;-) Well, read on for my response. > As far as I can tell, if NCBI does add a new data type that > your code doesn't support then it's very hard to tell that > the code is ignoring problems. Good point. I'll add one line in the _parse function to do a catch-all error reporting. > > Consider a floating point date value (not legal according toe > NCBI but legal ASN.1. .. I think - just testing the idea) > ... > year 2003.43, > ... > > Your code converts that into > ... > '2003' => [ > undef > ] > ... > That doesn't seem like the behavior it should do. > Well, you point that my parser is not a general ASN.1 parser is well taken, especially since I never claimed it to be one. If you're looking for an ASN.1 perl parser, I heard that on the mailing list someone already made one, and it could be of help to you. > > BTW, looking at what you do, I don't understand why you handle > the explicit types fields as you do. Why does > > tag id 9606 > turn into > 'tag' => [ > { > 'id' => '11' > } > ], > > As far as I can tell there's only a single data type > there so what about omitting the list reference? > > 'tag' => { > 'id' => '11' > }, > > But I don't know enough about ASN.1. > This has nothing to do with ASN. It is all about how uniform the data structure could be. In fact, consider when NCBI decides to do { tag id 12345, tag str "whatever" } which is far more possible than the cases you considered in earlier criticisms, then the data sturcture would need to become: 'tag' => [ { 'id' => '12345', 'str' => 'whatever' } ], With your suggested approach, this would force the user to test what type of reference $hash{'tag'} is before dealing with it either as a hash or an array. With my approach, user always knows to deal with it as an array. This is also exactly the reason (I guess) why XML::Simple has option 'ForceArray', if you recall. Now the promised response to the criticism that my parser doesn't do: 1. Indexing of EntrezGene file. 2. Adaptive behavior when new format comes out. 3. (semi-?)Automatic cross-language porting. 4. Full support for ASN.1 parsing. It's really simple - if you haven't already known - my parser is just an Entrez Gene parser. It is not designed to do those things. You really went out of your way to show me that my parser simply doesn't do everything, but failed to show me that why my parser cannot be a reasonable Entrez Gene parser, which is your main point. Also I don't understand why you just dispatch my parser right away as a candidate for porting to other language while I could address your valid concern next week with a few lines. Why? I can understand that you were possibly offended by my may-seem-naive enthusiasm of thinking about the prospect of porting this fast parser to other languages. But I was pretty happy with the parser I made, simply because: 1. There are plenty of people talking about that they have a parser working for Entrez Gene, but probably due to various reasons like IP issues or specific projects, no one posted one yet (at least I couldn't find it after plenty of searching). Mine's the first one I could find that's in public domain and in Perl. 2. My parser is so short, and not written in guru-style (since I'm far from a Perl guru), so it's easy to understand. 3. It's OO with pod and example scripts, so very easy to use. 4. Most importantly, it's freakishly fast without making mistakes with the NCBI Entrez Gene downloads. My enthusiasm is based on the belief that there's not a Perl parser out there that's better than mine overall when points 2-4 are considered. And point 1 is just a trump card. I thought it'd be helpful to many who want to get a GPL-ed Entrez Gene parser. Nonetheless, if you just don't want to use my parser, you can simply say so (or tell me why it doesn't work as a portable Entrez Gene parser). Frankly, reading your emails, initially I was glad that we had a useful discussion on parsers, but the endless picking on the progressively absurd tasks for an Entrez Gene parser to do (like it's unable to index, adapt to arbitrary changes, auto-port, parse full ASN.1 specifications) just really changed my opinion, particularly because I doubt anyone using any language would be looking for those in an Entrez Gene parser. Again, FYI, it's only a parser, and I repeatedly said it's only a parser that only constructures a data structure. But I certainly welcome good suggestions, and I'll add some basic error reporting next week. I didn't think it was needed since again, I already parsed and checked results on human, mouse and rat. But it's still a good idea & thanks for the suggestion! If someday you work out a fast parser and/or one that does it all in either python or perl, I'd like to know too. I'm always thrilled to learn useful things. Thanks, Mingyi BTW, I realized that I was a bit overly broad in my last email in my criticism of early attitude that users have to do work to use their software. I should say it's just some of the early softwares that gave such impression, even though it's only a few, the impression could be big. If that's what's thrown you off, I apologize. From mingyi.liu at gpc-biotech.com Sun Mar 13 22:26:35 2005 From: mingyi.liu at gpc-biotech.com (Mingyi Liu) Date: Sun Mar 13 22:22:45 2005 Subject: [Bioperl-l] Porting Entrez Gene parser to Biojava, Biopython, Biophp, even C++ In-Reply-To: <4234FAA9.9050102@gpc-biotech.com> References: <4234564D.7010906@gpc-biotech.com> <97e5143a0b434ab721cd023556a1a5b2@dalkescientific.com> <4234B459.7020109@gpc-biotech.com> <8a66c16373c43a36b13b98acda3288a3@dalkescientific.com> <4234FAA9.9050102@gpc-biotech.com> Message-ID: <4235046B.7070002@gpc-biotech.com> Mingyi Liu wrote > This has nothing to do with ASN. It is all about how uniform the data > structure could be. In fact, consider when NCBI decides to do > { > tag id 12345, > tag str "whatever" > } oops, I really meant: { tag id 12345, tag str "whatever", tag id 34567 } I switched to str just as example but forgot that this renders my example incorrect. So now the structure has to become: 'tag' => [ { 'id' => '12345', 'str' => 'whatever' } { 'id' => 34567 } ] or one that makes more sense 'tag' => [ { 'id' => '12345' } { 'str' => 'whatever' } { 'id' => 34567 } ] which is my approach. Again your approach would demand users to test reference before dealing with content, and users have to design two ways of dealing with the content. While in my approach users always deal with it as array, just one design and no reference testing needed. If you read my comment for the data structure trimming function, you'll see some more consideration in this aspect. It's still not perfect, I hope that's not too surprising and not becoming a reason to dispatch my parser altogether. ;-) Regards, Mingyi From ewijaya at singnet.com.sg Sun Mar 13 22:04:40 2005 From: ewijaya at singnet.com.sg (Edward Wijaya) Date: Mon Mar 14 02:50:00 2005 Subject: [Bioperl-l] Getting IC & Consensus with Bio::Matrix::PSM::SiteMatrix In-Reply-To: <956ed650307183b2819321abc990543b@duke.edu> References: <002b01c519bc$bce75370$6600a8c0@GOLHARMOBILE1> <956ed650307183b2819321abc990543b@duke.edu> Message-ID: Hi, Why my code below fails to return the IC values? I thought the method is able to do that. Is there anything I miss here? My second question is about"consensus" method. The consensus is generated by choosing the highest probability OR *N if prob is too low* 1. How do you define when the probability is *too low*? 2. What is the reasoning behind this implementation? e.g. Why my code below gives 'TANGTA' instead of "TATGTA"? I find this particular module is very very useful. I really wish I can make best use of it. Thanks so much for your time. Hope to hear from you again. --- Regards, Edward WIJAYA SINGAPORE __BEGIN__ #!/usr/bin/perl -w use strict; use Data::Dumper; use Bio::Matrix::PSM::SiteMatrix; #Frequency matrix my @pA = (2,19,3,6,8,10); my @pT = (7,3,6,2,20,5); my @pC = (1,2,2,1,1,1); my @pG = (3,1,1,9,8,7); my %param =( -pA=>\@pA,-pC=>\@pC,-pG=>\@pG,-pT=>\@pT); my $site=new Bio::Matrix::PSM::SiteMatrix(%param); my $consensus = $site->consensus; my $ic = $site->IC; #Why it fails here? print Dumper $ic; print Dumper $consensus; __END__ From s0460205 at sms.ed.ac.uk Mon Mar 14 03:17:22 2005 From: s0460205 at sms.ed.ac.uk (SG Edwards) Date: Mon Mar 14 03:13:19 2005 Subject: [Bioperl-l] Full uniprot annotation extraction Message-ID: <1110788242.423548925f306@sms.ed.ac.uk> Hi, I am parsing uniprot flat files and I need to extract as many of the lines as possible for insertion into a RDBMS. I use Bio::DB::SwissProt to get the major annotation (e.g. primary accession number) but is there a way to get other annotation also (e.g. date of the last update?) From tex at biocompute.net Sun Mar 13 12:35:23 2005 From: tex at biocompute.net (James Thompson) Date: Mon Mar 14 03:39:20 2005 Subject: [Bioperl-l] Getting IC & Consensus with Bio::Matrix::PSM::SiteMatrix In-Reply-To: Message-ID: Edward, 1. There is no code in SiteMatrix (or any of other other Bio::Matrix::PSM modules as far as I know) that calculates information content for you. It's assumed to provided as a parameter to the constructor rather than calculated by the SiteMatrix object itself. 2. I don't know the exact reasoning behind this implementation for calculating ambiguity, but here's the algorithm to calculate the consensus for an individual position: - Take the frequencies for a given position, multiply them all by ten and divide by the total number of characters at that position. In your example for the third position, we would transform these numbers: { A => 3, T => 6, C => 2, G => 1 } into this set of numbers: { A => 2.5, T => 3, C => 1.667, G => 0.833 } - If none of these numbers are above the threshold (which defaults to 5), then return an N for this position. This algorithm is in the _to_cons method of the Bio::Matrix::PSM::SiteMatrix module if you'd like to take a peek. I'll defer your other questions to Stefan and the rest of the list. :) Cheers, James Thompson On Mon, 14 Mar 2005, Edward Wijaya wrote: > Hi, > > Why my code below fails to return the IC values? > I thought the method is able to do that. > Is there anything I miss here? > > My second question is about"consensus" method. > The consensus is generated by choosing the highest probability OR *N if > prob is too low* > > 1. How do you define when the probability is *too low*? > 2. What is the reasoning behind this implementation? > e.g. Why my code below gives 'TANGTA' instead of "TATGTA"? > > I find this particular module is very very useful. > I really wish I can make best use of it. > > Thanks so much for your time. > Hope to hear from you again. > > --- > Regards, > Edward WIJAYA > SINGAPORE > > > __BEGIN__ > > #!/usr/bin/perl -w > use strict; > use Data::Dumper; > use Bio::Matrix::PSM::SiteMatrix; > > #Frequency matrix > my @pA = (2,19,3,6,8,10); > my @pT = (7,3,6,2,20,5); > my @pC = (1,2,2,1,1,1); > my @pG = (3,1,1,9,8,7); > > > my %param =( -pA=>\@pA,-pC=>\@pC,-pG=>\@pG,-pT=>\@pT); > my $site=new Bio::Matrix::PSM::SiteMatrix(%param); > > my $consensus = $site->consensus; > my $ic = $site->IC; #Why it fails here? > > > print Dumper $ic; > print Dumper $consensus; From skirov at utk.edu Mon Mar 14 08:15:43 2005 From: skirov at utk.edu (Stefan Kirov) Date: Mon Mar 14 08:16:35 2005 Subject: [Bioperl-l] Getting IC & Consensus with Bio::Matrix::PSM::SiteMatrix In-Reply-To: References: Message-ID: <42358E7F.7020209@utk.edu> Edward, The rules for too low are: single base probability>0.7; combination of two>0.8 and three>0.9 for IUPAC consensus and >0.5 for simple consensus. Actually you can recalculate the consensus by doing: $matrix->_calculate_consensus(0.45) (naturally will set the consensus at 0.45). Probably I should document this, though generally speaking this method is internal use only. However if you do this, you will have A=>0.46,C=>0.01,G=>0.48,T=>0.05) and then you will get A in the consensus (which is obviously incorrect, first base to surpass the thresh). I can fix this, but do you really want to get in your consensus a position with proba less than 0.5? If you use IUPAC you will get H (A+T+C). We can easily add IC calculating method if you really need it. Please let me know if you have further questions. Stefan James Thompson wrote: >Edward, > >1. There is no code in SiteMatrix (or any of other other Bio::Matrix::PSM modules >as far as I know) that calculates information content for you. It's assumed to >provided as a parameter to the constructor rather than calculated by the >SiteMatrix object itself. > >2. I don't know the exact reasoning behind this implementation for calculating >ambiguity, but here's the algorithm to calculate the consensus for an individual >position: > > - Take the frequencies for a given position, multiply them all by ten and divide > by the total number of characters at that position. In your example for the third > position, we would transform these numbers: > { A => 3, T => 6, C => 2, G => 1 } > > into this set of numbers: > { A => 2.5, T => 3, C => 1.667, G => 0.833 } > > - If none of these numbers are above the threshold (which defaults to 5), > then return an N for this position. > >This algorithm is in the _to_cons method of the Bio::Matrix::PSM::SiteMatrix module >if you'd like to take a peek. > >I'll defer your other questions to Stefan and the rest of the list. :) > >Cheers, > >James Thompson > >On Mon, 14 Mar 2005, Edward Wijaya wrote: > > > >>Hi, >> >>Why my code below fails to return the IC values? >>I thought the method is able to do that. >>Is there anything I miss here? >> >>My second question is about"consensus" method. >>The consensus is generated by choosing the highest probability OR *N if >>prob is too low* >> >>1. How do you define when the probability is *too low*? >>2. What is the reasoning behind this implementation? >> e.g. Why my code below gives 'TANGTA' instead of "TATGTA"? >> >>I find this particular module is very very useful. >>I really wish I can make best use of it. >> >>Thanks so much for your time. >>Hope to hear from you again. >> >>--- >>Regards, >>Edward WIJAYA >>SINGAPORE >> >> >>__BEGIN__ >> >>#!/usr/bin/perl -w >>use strict; >>use Data::Dumper; >>use Bio::Matrix::PSM::SiteMatrix; >> >> #Frequency matrix >> my @pA = (2,19,3,6,8,10); >> my @pT = (7,3,6,2,20,5); >> my @pC = (1,2,2,1,1,1); >> my @pG = (3,1,1,9,8,7); >> >> >>my %param =( -pA=>\@pA,-pC=>\@pC,-pG=>\@pG,-pT=>\@pT); >>my $site=new Bio::Matrix::PSM::SiteMatrix(%param); >> >>my $consensus = $site->consensus; >>my $ic = $site->IC; #Why it fails here? >> >> >>print Dumper $ic; >>print Dumper $consensus; >> >> > > > -- Stefan Kirov, Ph.D. University of Tennessee/Oak Ridge National Laboratory 5700 bldg, PO BOX 2008 MS6164 Oak Ridge TN 37831-6164 USA tel +865 576 5120 fax +865-576-5332 e-mail: skirov@utk.edu sao@ornl.gov "And the wars go on with brainwashed pride For the love of God and our human rights And all these things are swept aside" From brian_osborne at cognia.com Mon Mar 14 08:20:12 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Mon Mar 14 08:16:52 2005 Subject: [Bioperl-l] Full uniprot annotation extraction In-Reply-To: <1110788242.423548925f306@sms.ed.ac.uk> Message-ID: SG, You should take a look at the Feature and Annotation HOWTO (http://bioperl.org/HOWTOs/Feature-Annotation). You might also want to consider using bioperl-db, it has scripts that load sequence into a BioSql database (Oracle, Mysql, Postgres). This package is available at http://bioperl.org/Core/Latest/index.shtml. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of SG Edwards Sent: Monday, March 14, 2005 3:17 AM To: bioperl-l@portal.open-bio.org Subject: [Bioperl-l] Full uniprot annotation extraction Hi, I am parsing uniprot flat files and I need to extract as many of the lines as possible for insertion into a RDBMS. I use Bio::DB::SwissProt to get the major annotation (e.g. primary accession number) but is there a way to get other annotation also (e.g. date of the last update?) _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From ewijaya at singnet.com.sg Mon Mar 14 04:26:05 2005 From: ewijaya at singnet.com.sg (Edward Wijaya) Date: Mon Mar 14 09:12:37 2005 Subject: [Bioperl-l] Getting IC & Consensus with Bio::Matrix::PSM::SiteMatrix In-Reply-To: <42358E7F.7020209@utk.edu> References: <42358E7F.7020209@utk.edu> Message-ID: Dear Stefan and James, Thanks so much for answering On Mon, 14 Mar 2005 21:15:43 +0800, Stefan Kirov wrote: > The rules for too low are: [snip] Got it Stef. > I can fix this, but do you really want to get in your consensus a > position with proba less than 0.5? Yes, Don't you think by default it should be that way? Besides it'll be nice to have an *option* of how are we going to get the Consensus. > We can easily add IC calculating method if you really need it. Yes, we definitely need that. I think naturally we would need the computation, same as e-value. Actually I have the subroutine to compute the IC given the frequency matrices. You can use it to incorporate it to the module if you want, although it isn't a great piece of work. I just thought it may save you time. > Please let me know if you have further questions. I'll save them till next time Stef ;-) -- Edward WIJAYA Singapore From skirov at utk.edu Mon Mar 14 09:47:21 2005 From: skirov at utk.edu (Stefan Kirov) Date: Mon Mar 14 09:41:56 2005 Subject: [Bioperl-l] Getting IC & Consensus with Bio::Matrix::PSM::SiteMatrix In-Reply-To: References: <42358E7F.7020209@utk.edu> Message-ID: <4235A3F9.6000208@utk.edu> Hi Edwars, Edward Wijaya wrote: > Dear Stefan and James, > > Thanks so much for answering > > On Mon, 14 Mar 2005 21:15:43 +0800, Stefan Kirov wrote: > >> The rules for too low are: > > [snip] > Got it Stef. > >> I can fix this, but do you really want to get in your consensus a >> position with proba less than 0.5? > > > Yes, Don't you think by default it should be that way? > Besides it'll be nice to have an *option* of how are we going to get > the Consensus. > I'll commit the code tomorrow, just update your bioperl-live >> We can easily add IC calculating method if you really need it. > > Yes, we definitely need that. I think naturally we would need the > computation, > same as e-value. You mean you want SiteMatrix to compute the e-value? Hmm... we are getting out a bit out of scope here. Essentially PSM modules were supposed only to provide data structure and parsers. And if IC is generally straightforward and contained in the PFM/PSM, this is not the case with e-val or p-val. Therefore I am reluctant to put it in PSM collection. > > Actually I have the subroutine to compute the IC given the frequency > matrices. > You can use it to incorporate it to the module if you want, although > it isn't a great piece of work. > I just thought it may save you time. > Sure, that would be great. Just send it and I will optimize it if I can and put it in. But maybe it should go to Bio::Tools... Any thoughts from anyone else? >> Please let me know if you have further questions. > > I'll save them till next time Stef ;-) > > Stefan From amtd9 at umr.edu Mon Mar 14 08:29:30 2005 From: amtd9 at umr.edu (Mane, Ajay (UMR-Student)) Date: Mon Mar 14 10:10:12 2005 Subject: [Bioperl-l] query Message-ID: <58AF0CF509606A49B1770AB5DFF811CE13B4DF@UMR-CMAIL1.umr.edu> Hi, I am Ajay from University of Missouri - Rolla, doing research in bioinformatics. The bl2seq tool takes 2 sequences to align. I am interested in a list of sequences and want to compare them. Instead of putting 2 at a time, I have a large list of pairs to be analysed. How do I automate the process. Everytime running the tool and manually looking for the point where the coding of proteins start is time consuming. Can I write a perl file to automate the process. Can I get any help on this. I have gone through the bioperl modules, but could not find on bl2seq. Thanks, Ajay From lstein at cshl.edu Thu Mar 10 16:49:05 2005 From: lstein at cshl.edu (Lincoln Stein) Date: Mon Mar 14 10:10:42 2005 Subject: [Bioperl-l] Aggressive aggregation? In-Reply-To: References: <3503c6582ad58219fe9c590fe09a0f46@pcbi.upenn.edu> Message-ID: <200503101649.11352.lstein@cshl.edu> The problem is tied up with the need for better handling of GFF3 by Bio::DB::GFF. In GFF3 you can separate the Name of a thing and its parentage: ID=match0001;Target=cdna0123 12 462 ID=match0001;Target=cdna0123 463 963 ID=match0001;Target=cdna0123 964 2964 ID=match0002;Target=cdna0123 1 129 ID=match0002;Target=cdna0123 463 960 This is what the alignment GFF emitter should produce. Unfortunately, when you load this into Bio::DB::GFF, the distinction between the ID and the Target is lost and all the lines get aggregated together again on the target name cdna0123. I've got lots of notes on a better Bio::DB::GFF and a sample schema and queries. If someone wants to work on this, I'll hand it over to them. ...Alternatively, perhaps this can be fixed by a much less invasive change to the Bio::DB::GFF module. Perhaps the Target should simply be converted into an alias so that it can be identified. Lincoln On Thursday 10 March 2005 12:21 pm, Chad Matsalla wrote: > On Wed, 9 Mar 2005, Aaron J. Mackey wrote: > > > chr1 aafcest HSP 200 275 . - . Target > > > "Sequence:chad1" 200 275 > > > chr1 aafcest HSP 300 450 . - . Target > > > "Sequence:chad1" 300 450 > > > chr1 aafcest match 200 450 . - . Target > > > "Sequence:chad1" 200 450 > > > > These need to be Target "Sequence:chad1-1" and "Sequence:chad1-2" > > or some such. This also means that if you're saving the ESTs in > > the database (for sequence alignment display), you'll have to > > save them redundantly under chad1-1, chad1-2, etc. > > This is horrible. I want to fix this. > > > Now, you could write a custom aggregator that de-aggregated > > multiple chad1 "match" features, assigning the contained HSPs to > > each, but there is no such "default" behavior. Let me know if > > there's general interest for this ... > > I think there is, and I volunteer to write it. I'm new to the > Bio::DB subsystem but I'm eager to dive in. Can you help me by > providing a general flowchart on what you'd do to create this? What > should the Aggregator be called? Hmm. > Bio::DB::GFF::Aggregator::manymatch ? > > Chad Matsalla > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 NOTE: Please copy Sandra Michelsen on all emails regarding scheduling and other time-critical topics. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050310/0d70e8a1/attachment.bin From elia at tigem.it Sat Mar 12 05:08:08 2005 From: elia at tigem.it (Elia Stupka) Date: Mon Mar 14 10:10:49 2005 Subject: [Bioperl-l] strange error after changing to RC1.5 In-Reply-To: References: Message-ID: <59ff1e0691dce94b58f4bc0a0432ca4a@tigem.it> >> *Every single bit* of those changes need to be rolled back from the >> release and if nobody else has done it by then I will do so in two >> weeks. > > Fine for the 1.5.1 branch, although I don't agree that this should be > done > on the main trunk. I couldn't agree more with Hilmar. I am writing this comment almost as an outsider considering my minor development involvement in bioperl since 1.4 was rolled out. As an external observer I can assure you that the 1.5 changes are causing a lot of trouble in the real world, many of which you don't get on the mailing list. Quite a few people are keeping 1.4 for their day to day work and using 1.5 only when it is required (e.g. gbrowse). Bioperl, because of its wide usage by a non-developer crowd has most definitely become the sort of project where code elegance and efficiency and conceptual issues are much less of a priority than stability and usability. Elia --- Telethon Institute of Genetics and Medicine Via Pietro Castellino, 111 80131 Napoli Tel. +39 081 6132 335 Fax. +39 081 560 98 77 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 1115 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050312/b5b47162/attachment.bin From jason.stajich at duke.edu Mon Mar 14 10:13:27 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Mon Mar 14 10:10:52 2005 Subject: [Bioperl-l] 1.6 release In-Reply-To: <4231628B.4010007@ed.ac.uk> References: <4231628B.4010007@ed.ac.uk> Message-ID: <7ce85432cbc83b38f79a3aa5320bfeea@duke.edu> [using this post to also advocate for volunteers even though you were just trying to read on when your module changes needed to go in] At least from my POV there isn't really a plan for a 1.6 release date. I was hoping it could released before BOSC this summer. We still need a release-master to do 1.6 and a lot of recently added stuff needs to be cleaned up and re-tested before we will think about doing a stable release. I don't know when we will start a 1.6 branch in preparation for the release. I think this time around we will branch and make the stable release off the branch instead of our normal releasing off the main trunk. This gives us the flexibility to prune modules which are too new or add ports to support backwards compatibility. It was decided that the new Feature/Annotation stuff won't be part of the stable release 1.6 but would be considered for 1.8 once it is proved to be stable. If backwards compatible patches can be made so the API established in Bioperl 1.4 is still respected (and no additional XML or Graph modules are needed for the core Feature and Annotation objects to work) we can consider some compromises. [Scott] I realize that GMOD/Gbrowse has begun relying on this so a plan will need to be discussed, outlining exactly what new functionality is expected. We will need a volunteer to be the release master/pumpkin and several people to help in the testing and bug fixing prior to the release. -jason -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ On Mar 11, 2005, at 4:19 AM, Richard Adams wrote: > Hello, > Is there any schedule for the 1.6 release? > just to know by when I have to get by modules working..... > > Richard > > -- > Dr Richard Adams > Psychiatric Genetics Group, > Medical Genetics, > Molecular Medicine Centre, > Western General Hospital, > Crewe Rd West, > Edinburgh UK > EH4 2XU > > Tel: 44 131 651 1084 > richard.adams@ed.ac.uk > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -------------- next part -------------- A non-text attachment was scrubbed... Name: PGP.sig Type: application/pgp-signature Size: 186 bytes Desc: This is a digitally signed message part Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050314/69b51667/PGP.bin From ak at ebi.ac.uk Mon Mar 14 11:02:23 2005 From: ak at ebi.ac.uk (Andreas Kahari) Date: Mon Mar 14 10:56:53 2005 Subject: [Bioperl-l] query In-Reply-To: <58AF0CF509606A49B1770AB5DFF811CE13B4DF@UMR-CMAIL1.umr.edu> References: <58AF0CF509606A49B1770AB5DFF811CE13B4DF@UMR-CMAIL1.umr.edu> Message-ID: <20050314160223.GB1160@ebi.ac.uk> I might be a bit na?ve, but wouldn't this be solved by putting all sequences in one file and then blasting it against itself? I didn't quite get the part where you mention the coding of proteins, but maybe someone else knows exactly what you mean... Andreas On Mon, Mar 14, 2005 at 07:29:30AM -0600, Mane, Ajay (UMR-Student) wrote: > > Hi, > > I am Ajay from University of Missouri - Rolla, doing research in > bioinformatics. The bl2seq tool takes 2 sequences to align. I am > interested in a list of sequences and want to compare them. Instead of > putting 2 at a time, I have a large list of pairs to be analysed. How do > I automate the process. Everytime running the tool and manually looking > for the point where the coding of proteins start is time consuming. Can > I write a perl file to automate the process. Can I get any help on this. > I have gone through the bioperl modules, but could not find on bl2seq. -- Andreas K?h?ri EMBL-EBI/ensembl 1024D/C2E163CB From razi at genet.sickkids.on.ca Mon Mar 14 11:06:10 2005 From: razi at genet.sickkids.on.ca (Razi Khaja) Date: Mon Mar 14 11:00:40 2005 Subject: [Bioperl-l] query In-Reply-To: 6667 Message-ID: <20050314160611.51919.qmail@web51605.mail.yahoo.com> There is documentation available for this at http://doc.bioperl.org/releases/bioperl-1.4/Bio/AlignIO/bl2seq.html Razi "Mane, Ajay (UMR-Student)" wrote: Hi, I am Ajay from University of Missouri - Rolla, doing research in bioinformatics. The bl2seq tool takes 2 sequences to align. I am interested in a list of sequences and want to compare them. Instead of putting 2 at a time, I have a large list of pairs to be analysed. How do I automate the process. Everytime running the tool and manually looking for the point where the coding of proteins start is time consuming. Can I write a perl file to automate the process. Can I get any help on this. I have gone through the bioperl modules, but could not find on bl2seq. Thanks, Ajay _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l /** * Razi Khaja, Bioinformatics Analyst * The Hospital for Sick Children, Toronto * The Centre for Applied Genomics, www.tcag.ca * Tel 416-813-7032, Fax 416-813-8319 */ From palmeida at igc.gulbenkian.pt Mon Mar 14 11:25:17 2005 From: palmeida at igc.gulbenkian.pt (Paulo Almeida) Date: Mon Mar 14 11:20:33 2005 Subject: [Bioperl-l] query In-Reply-To: <58AF0CF509606A49B1770AB5DFF811CE13B4DF@UMR-CMAIL1.umr.edu> References: <58AF0CF509606A49B1770AB5DFF811CE13B4DF@UMR-CMAIL1.umr.edu> Message-ID: <20050314162517.GA3026@bioinf.igc.gulbenkian.pt> Hi, I have never used bl2seq on perl, but this page may help you: http://doc.bioperl.org/bioperl-run/Bio/Tools/Run/PiseApplication/bl2seq.html (you can get bioperl-run from: http://bioperl.org/Core/Latest/index.shtml ) -Paulo On Mon, Mar 14, 2005 at 07:29:30AM -0600, Mane, Ajay (UMR-Student) wrote: > > Hi, > > I am Ajay from University of Missouri - Rolla, doing research in > bioinformatics. The bl2seq tool takes 2 sequences to align. I am > interested in a list of sequences and want to compare them. Instead of > putting 2 at a time, I have a large list of pairs to be analysed. How do > I automate the process. Everytime running the tool and manually looking > for the point where the coding of proteins start is time consuming. Can > I write a perl file to automate the process. Can I get any help on this. > I have gone through the bioperl modules, but could not find on bl2seq. > > Thanks, > Ajay -- Paulo Almeida Instituto Gulbenkian de Ciencia Apartado 14, 2781-901, Oeiras, PORTUGAL tel +351 21 446 46 35 fax +351 21 440 79 70 http://www.igc.gulbenkian.pt From palmeida at igc.gulbenkian.pt Mon Mar 14 11:25:17 2005 From: palmeida at igc.gulbenkian.pt (Paulo Almeida) Date: Mon Mar 14 11:21:53 2005 Subject: [Bioperl-l] query In-Reply-To: <58AF0CF509606A49B1770AB5DFF811CE13B4DF@UMR-CMAIL1.umr.edu> References: <58AF0CF509606A49B1770AB5DFF811CE13B4DF@UMR-CMAIL1.umr.edu> Message-ID: <20050314162517.GA3026@bioinf.igc.gulbenkian.pt> Hi, I have never used bl2seq on perl, but this page may help you: http://doc.bioperl.org/bioperl-run/Bio/Tools/Run/PiseApplication/bl2seq.html (you can get bioperl-run from: http://bioperl.org/Core/Latest/index.shtml ) -Paulo On Mon, Mar 14, 2005 at 07:29:30AM -0600, Mane, Ajay (UMR-Student) wrote: > > Hi, > > I am Ajay from University of Missouri - Rolla, doing research in > bioinformatics. The bl2seq tool takes 2 sequences to align. I am > interested in a list of sequences and want to compare them. Instead of > putting 2 at a time, I have a large list of pairs to be analysed. How do > I automate the process. Everytime running the tool and manually looking > for the point where the coding of proteins start is time consuming. Can > I write a perl file to automate the process. Can I get any help on this. > I have gone through the bioperl modules, but could not find on bl2seq. > > Thanks, > Ajay -- Paulo Almeida Instituto Gulbenkian de Ciencia Apartado 14, 2781-901, Oeiras, PORTUGAL tel +351 21 446 46 35 fax +351 21 440 79 70 http://www.igc.gulbenkian.pt From skirov at utk.edu Mon Mar 14 11:28:40 2005 From: skirov at utk.edu (Stefan Kirov) Date: Mon Mar 14 11:24:30 2005 Subject: [Bioperl-l] Entrez Gene ASN In-Reply-To: References: Message-ID: <4235BBB8.2060900@utk.edu> Hilmar, Hilmar Lapp wrote: > > On Friday, March 11, 2005, at 11:02 AM, Stefan Kirov wrote: > >> >> >> Hilmar Lapp wrote: >> >>> Gene shouldn't be fundamentally different from LocusLink, and >>> LocusLink was represented as an annotated SeqI within bioperl. >> >> >> It is not, you are right. >> >>> >>> If at all possible I'd still like it to remain that way for Gene in >>> order to allow for a smooth transition from LL to Gene for code >>> that's been using the former. >>> >> hmmmm, back compatibility is good thing, but sometimes it may be hard >> to achieve. > > > Well, now you contradict yourself. Above you agree that Gene and > LocusLink are fundamentally the same, and here you say representing > them in a compatible fashion may be hard to achieve ... Not really. They are fairly similar, but not completely and moreover, I believe LocusLink parser wouldn't deal with hierarchies.... It just puts everything in Annotation objects, thus loosing the relationships (correct me if I am wrong here). Same with homologs. > > There are problems indeed though, read on ... > >> >>> If you want to emphasize the fact that it's a container for >>> sequences, then that sounds like a ClusterI to me, which can be >>> richly annotated too. >> >> >> Let me disagree here. Cluster is designed for independent sequences, >> where Gene should deal with sequences, that have hierarchical >> relationship among themselves. > > > Two notes here. First, ClusterI is not designed for independent > sequences. It is just meant as a container for sequences, be those > related to each other or not. OK, I meant independent as in "I don't know what is your relationship". My point is it is not fit to describe the hierarchy here. > > Second, the ability to represent hierarchical relationships between > sequences is basically absent from bioperl, not just from ClusterI > (aside from ClusterI representing a relationship between the > containing seq and the contained seqs). > > We should think seriously before we add that capability. Most of the > people and effort in the field towards hierarchical relationships > between biological entities with sequence takes place in the domain of > feature hierarchies, *not* sequence hierarchies. See GFF3, SO, > GBrowse, Chado, and related efforts. I belive it is reasonable to have this functionality. Anyway I see sequence vs sequence feature hierarchy more as a philosophical question with a little practical value (unless I am missing something important). By the ways isn't GBrowse mysql based? > > The only place I know where sequence heirarchies are extensively used > is in our local adaptation of Biosql, and we do all of this in SQL (as > bioperl and therefore bioperl-db has zero support for it). > > It's possible but I'm not sure also wise to duplicate the support for > feature hierarchies to sequences ... Wouldn't it in the end benefit > more people if you were able to tie in Gene into the Unflattener that > Chris wrote? > >> This is one of the issues I think Seq object is not designed to >> deal with. What we need is: >> genome--(Bio::Seq)- >> |--transcript(Bio::Seq) >> |--protein(Bio::Seq) >> |--transcript(Bio::Seq) >> |--protein(Bio::Seq) > > > Well, yeah, if you replace Bio::Seq with Bio::SeqFeatureI you are > pretty close to GFF3 and a growing wealth of support for it. > >> >> Another significant concern I have is that if we store everything as >> SeqFeature or the overhead may become huge (some records have >> hundreds of different features) > > > Have you talked to Lincoln about this? I believe GBrowse is dealing > pretty well with this huge overhead but I may be missing something here. > No, I have not, I guess I should... > >> [...] and any user of the parser will have to do quite of a data >> mining to find the relevant feature. One approach would be to add >> more Bio::Annotation:: objects (for example Bio::Annotation::STS, >> Bio::Annotation::GRIF, etc). > > > Possibly. Bio::Annotation objects was in fact what I was primarily > referring to when I spoke about annotation. > So do we agree that Bio::Annotation needs some expansion? What other people think? >> We may decide to create a simplified (Bio::Seq, no relationships) or >> more complex object (Gene), based on the user request. > > > Just as an aside, I guess you know that there is a Gene object > already, but it's feature based. Yes, but actually Bio::LiveSeq::Gene (vs Bio::SeqFeature::Gene) is more like what I had in mind (it lacks documentation and relationships I think, but is a good start), but still what about phylogeny? > > -hilmar From amackey at pcbi.upenn.edu Mon Mar 14 11:57:23 2005 From: amackey at pcbi.upenn.edu (Aaron J. Mackey) Date: Mon Mar 14 11:53:54 2005 Subject: [Bioperl-l] strange error after changing to RC1.5 In-Reply-To: <59ff1e0691dce94b58f4bc0a0432ca4a@tigem.it> References: <59ff1e0691dce94b58f4bc0a0432ca4a@tigem.it> Message-ID: <4235C273.7090305@pcbi.upenn.edu> Elia Stupka wrote: > As an external observer I can assure you that > the 1.5 changes are causing a lot of trouble in the real world, many of > which you don't get on the mailing list. Quite a few people are keeping > 1.4 for their day to day work and using 1.5 only when it is required > (e.g. gbrowse). So how can we possibly address these issues if we don't know about them? 1.5 is a developer's, not stable release. It wouldn't surprise me that "critical" code bases are not ready to use 1.5 > Bioperl, because of its wide usage by a non-developer > crowd has most definitely become the sort of project where code elegance > and efficiency and conceptual issues are much less of a priority than > stability and usability. So is BioPerl a stable project, or a dead project? BioPerl has hardly ever been (greatly) concerned with usability ... -Aaron From tembe at bioanalysis.org Mon Mar 14 12:19:59 2005 From: tembe at bioanalysis.org (Waibhav Tembe) Date: Mon Mar 14 12:25:19 2005 Subject: [Bioperl-l] Help with String::Approx In-Reply-To: <422208AA.1000709@cenix-bioscience.com> References: <421F79EE.2080503@bioanalysis.org> <422208AA.1000709@cenix-bioscience.com> Message-ID: <4235C7BF.8090100@bioanalysis.org> Hello, Thanks for the advice to use String::Approx. I installed String::Approx and it seems to be functional. Just as a check, I am trying to run different utilities such as adist, aindex etc. by following the examples on CPAN's string::approx page and can't seem to run aslice utility. For the following code, adist and aindex seem to work fine. But aslice outputs something unexpected. $F = "xxxx"; $S = "zzzxxyxyyy"; print "Edit = ", adist($F, $S), "\n"; $index = aindex($F, $S); print "Matches at ", $index, "\n"; ($index, $size) = aslice($F, $S); print "Matches at ", $index, "\tSize is ", $size, "\n"; ($index, $size, $d) = aslice($F, $S); print "Matches at ", $index, "\tSize is ", $size, "Distance is ", $d, "\n"; output: Edit = 1 Matches at 3 Matches at ARRAY(0x9cc9d98) Size is Matches at ARRAY(0x9ddfbc0) Size is Distance is Any help to fix this and to use Approx utility for: 1. Extracting the approximate match from $S 2. At least finding the length of the match and correct index in $S will be appreciated. Thanks. Tembe Andrew Walsh wrote: > Hello, > > The following cpan module may be of interest: > > String::Approx > > Cheers, > > Andrew > > > Waibhav Tembe wrote: > >> Hello, >> >> I was wondering if there is any Perl implementation for >> "k-differences" string matching algorithm using dynamic programming. >> More precisely, given two string s1 and s2, the program finds an >> alignment, if one exists, that has less than or equal to k (a >> parameter) no. of differences. The differences include mismatches and >> indels. >> >> Any pointers will be welcome. >> >> Thanks. >> >> Tembe >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> > > From allenday at ucla.edu Mon Mar 14 12:48:08 2005 From: allenday at ucla.edu (Allen Day) Date: Mon Mar 14 12:42:38 2005 Subject: [Bioperl-l] 1.6 release In-Reply-To: <7ce85432cbc83b38f79a3aa5320bfeea@duke.edu> References: <4231628B.4010007@ed.ac.uk> <7ce85432cbc83b38f79a3aa5320bfeea@duke.edu> Message-ID: On Mon, 14 Mar 2005, Jason Stajich wrote: > [using this post to also advocate for volunteers even though you were > just trying to read on when your module changes needed to go in] > > At least from my POV there isn't really a plan for a 1.6 release date. > I was hoping it could released before BOSC this summer. > > We still need a release-master to do 1.6 and a lot of recently added > stuff needs to be cleaned up and re-tested before we will think about > doing a stable release. I don't know when we will start a 1.6 branch > in preparation for the release. I think this time around we will > branch and make the stable release off the branch instead of our normal > releasing off the main trunk. This gives us the flexibility to prune > modules which are too new or add ports to support backwards > compatibility. > > > It was decided that the new Feature/Annotation stuff won't be part of > the stable release 1.6 but would be considered for 1.8 once it is > proved to be stable. If backwards compatible patches can be made so > the API established in Bioperl 1.4 is still respected (and no > additional XML or Graph modules are needed for the core Feature and > Annotation objects to work) we can consider some compromises. [Scott] No problem. I can remove these dependencies. > I realize that GMOD/Gbrowse has begun relying on this so a plan will > need to be discussed, outlining exactly what new functionality is > expected. > > We will need a volunteer to be the release master/pumpkin and several > people to help in the testing and bug fixing prior to the release. > > -jason > -- > Jason Stajich > jason.stajich at duke.edu > http://www.duke.edu/~jes12/ > > On Mar 11, 2005, at 4:19 AM, Richard Adams wrote: > > > Hello, > > Is there any schedule for the 1.6 release? > > just to know by when I have to get by modules working..... > > > > Richard > > > > -- > > Dr Richard Adams > > Psychiatric Genetics Group, > > Medical Genetics, > > Molecular Medicine Centre, > > Western General Hospital, > > Crewe Rd West, > > Edinburgh UK > > EH4 2XU > > > > Tel: 44 131 651 1084 > > richard.adams@ed.ac.uk > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > From pstogios at uhnres.utoronto.ca Mon Mar 14 14:54:35 2005 From: pstogios at uhnres.utoronto.ca (Peter J Stogios) Date: Mon Mar 14 14:49:02 2005 Subject: [Bioperl-l] Refseq and Splice Variants Message-ID: Hi, I am wondering if there is a way of easily identifying Refseq sequences that are splice variants of the same gene. If a gene has multiple splice products that are supported by experimental evidence, they get their own Refseq identifier, but there is no explicit reference to the underlying gene they came from (outside of the identifier line). What I am trying to do is group sets of Refseq sequences in FASTA format into sets of splice variants of the same gene. Does anyone know of a way, using Bioperl, that I can accomplish this? Thanks, ~ Peter J Stogios Ph.D. candidate, Priv? Lab Dept. of Medical Biophysics, University of Toronto Ontario Cancer Institute, Princess Margaret Hospital e: pstogios@uhnres.utoronto.ca w: http://xtal.uhnres.utoronto.ca/prive p: (416) 946-4501x3280 From yanfeng at csit.fsu.edu Mon Mar 14 15:29:03 2005 From: yanfeng at csit.fsu.edu (yanfeng) Date: Mon Mar 14 15:23:14 2005 Subject: [Bioperl-l] How to use trnascan.pm Message-ID: <4235F40F.7020808@csit.fsu.edu> Hi, Is there anyone knows how to use trnascan.pm. I want to use that to locate the tRNA of my lab seuqnces. Thanks. Fisher From skirov at utk.edu Mon Mar 14 15:34:32 2005 From: skirov at utk.edu (Stefan Kirov) Date: Mon Mar 14 15:29:07 2005 Subject: [Bioperl-l] Refseq and Splice Variants In-Reply-To: References: Message-ID: <4235F558.1060602@utk.edu> What is your initial id- refseq or gene? Do you want all of them or just some. In any case LL_tmpl (locuslink file) has this data and there is a parser for it (hopefully an Entrez gene parser will be there soon). Also you can get gene2refseq file from here ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/. It is tab delimited and pretty easy to use. You need columns 1 and 4 I think. Stefan Peter J Stogios wrote: > Hi, > > I am wondering if there is a way of easily identifying Refseq > sequences that are splice variants of the same gene. If a gene has > multiple splice products that are supported by experimental evidence, > they get their own Refseq identifier, but there is no explicit > reference to the underlying gene they came from (outside of the > identifier line). > > What I am trying to do is group sets of Refseq sequences in FASTA > format into sets of splice variants of the same gene. Does anyone > know of a way, using Bioperl, that I can accomplish this? > > Thanks, > > ~ > Peter J Stogios > Ph.D. candidate, Priv? Lab > Dept. of Medical Biophysics, University of Toronto > Ontario Cancer Institute, Princess Margaret Hospital > e: pstogios@uhnres.utoronto.ca > w: http://xtal.uhnres.utoronto.ca/prive > p: (416) 946-4501x3280 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From mingyi.liu at gpc-biotech.com Mon Mar 14 16:15:32 2005 From: mingyi.liu at gpc-biotech.com (Mingyi Liu) Date: Mon Mar 14 16:11:34 2005 Subject: [Bioperl-l] Error reporting/Validation implemented In-Reply-To: <97e5143a0b434ab721cd023556a1a5b2@dalkescientific.com> References: <4234564D.7010906@gpc-biotech.com> <97e5143a0b434ab721cd023556a1a5b2@dalkescientific.com> Message-ID: <4235FEF4.2070901@gpc-biotech.com> Hi, there, I just implemented basic error reporting and validation functionalities in my Entrez Gene parser in Perl (the regex version). The validation will catch all non-conforming data, while error reporting reports line number, error type, and the first 20 (customizable) characters of the offending data (but the line number could be incorrect if the format resulted in an exception, which is hard to deal with for ASN.1-formatted data, although easy for XML parsers). The speed for the parser of course slowed down, but I'd say it'd still beat most parsers hands down. The full human genome now takes a bit over 12 minutes instead of 11 minutes to process on one Intel Xeon 2.4 GHz CPU. So I don't think my parser's speed has much to do with performing validation or not. I had also communicated with Stefan Kirov and turns out the dead entries and 0-sized (should be 1-sized) arrays were simply related to data trimming options. So far, so good. If anyone is interested, check it out at http://www.sourceforge.net/projects/egparser/. Regards, Mingyi From iluminati at earthlink.net Mon Mar 14 16:22:40 2005 From: iluminati at earthlink.net (iluminati@earthlink.net) Date: Mon Mar 14 16:14:57 2005 Subject: [Bioperl-l] Strange problem with Bio::Seq::LargePrimarySeq Message-ID: <423600A0.2080109@earthlink.net> I'm having this unuusal problem with loading this particular module. I need b/c I'm working with chromosome-sized sequence files as a part of my project, but yet it seems to not want to load properly even when it's loaded using the following statement: use Bio::Seq::LargePrimarySeq; I checked my modules, and the necessary module is there. It seems to just not want to load. Can anyone be of service? From garrettsorensen at gmail.com Mon Mar 14 19:03:49 2005 From: garrettsorensen at gmail.com (Garrett Sorensen) Date: Mon Mar 14 18:58:27 2005 Subject: [Bioperl-l] Strange problem with Bio::Seq::LargePrimarySeq In-Reply-To: <423600A0.2080109@earthlink.net> References: <423600A0.2080109@earthlink.net> Message-ID: I've had the same issue... I ended up breaking down the sequences into manageable fragments but would really like to get the largePrimarySeq working. When I tried loading a chrom size sequence I just sat back and watched my RAM get used up (2 gigs), then the swap, then the crash.... So if anyone can help it'd benefit both of us! Thanks for any help, Garrett On Mon, 14 Mar 2005 16:22:40 -0500, iluminati@earthlink.net wrote: > I'm having this unuusal problem with loading this particular module. I > need b/c I'm working with chromosome-sized sequence files as a part of > my project, but yet it seems to not want to load properly even when it's > loaded using the following statement: > > use Bio::Seq::LargePrimarySeq; > > I checked my modules, and the necessary module is there. It seems to > just not want to load. Can anyone be of service? > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From skirov at utk.edu Mon Mar 14 19:50:04 2005 From: skirov at utk.edu (Stefan Kirov) Date: Mon Mar 14 19:44:52 2005 Subject: [Bioperl-l] Strange problem with Bio::Seq::LargePrimarySeq] Message-ID: <4236313C.4010006@utk.edu> First you have to answer few questions: how do you get the object?/ use Bio::Seq::LargePrimarySeq does not create an object it merely makes the code available. /If you post you code here it will be much easier to answer your questions. How do you access the sequence (I hope you have read the documentation, which states that it is not generally a good idea to call $seq->seq). How big is you /tmp? What are trying to accomplish and why you need the whole seq in memory? Stefan I've had the same issue... I ended up breaking down the sequences into manageable fragments but would really like to get the largePrimarySeq working. When I tried loading a chrom size sequence I just sat back and watched my RAM get used up (2 gigs), then the swap, then the crash.... So if anyone can help it'd benefit both of us! Thanks for any help, Garrett On Mon, 14 Mar 2005 16:22:40 -0500, iluminati at earthlink.net > wrote: >/ I'm having this unuusal problem with loading this particular module. I />/ need b/c I'm working with chromosome-sized sequence files as a part of />/ my project, but yet it seems to not want to load properly even when it's />/ loaded using the following statement: />/ />/ use Bio::Seq::LargePrimarySeq; />/ />/ I checked my modules, and the necessary module is there. It seems to />/ just not want to load. Can anyone be of service? />/ />/ _______________________________________________ />/ Bioperl-l mailing list />/ Bioperl-l at portal.open-bio.org />/ http://portal.open-bio.org/mailman/listinfo/bioperl-l /> From iluminati at earthlink.net Mon Mar 14 21:39:43 2005 From: iluminati at earthlink.net (iluminati@earthlink.net) Date: Mon Mar 14 21:32:14 2005 Subject: [Bioperl-l] Strange problem with Bio::Seq::LargePrimarySeq] In-Reply-To: <4236313C.4010006@utk.edu> References: <4236313C.4010006@utk.edu> Message-ID: <42364AEF.6050106@earthlink.net> Thanks for asking the questions! In hindsight, I realized that I glossed over the problem in my frustration. Anyway, here's the drill. I created a seq object from a chromosome-sized fasta file like so... my $seqio = new Bio::SeqIO('-format'=>'largefasta', '-file' =>Bio::Root::IO->catfile("/Thesis Stuff/Chr$Chromosome/chr$Chromosome.fa")); #Create the seq object my $seq = $seqio->next_seq(); From there, I want to manipulate the sequence and use the functions generally available to a seq object. Now, in order to the build the seq object, I have to use the Bio::Seq::largefasta module. The reason I need the Bio::Seq::LargePrimarySeq module is so that I can manipulate the sequence and get to the necessary functions. However, I get this error running the script despite including the Bio::Seq:::LargePrimarySeq module: Can't locate object method "add_SeqFeature" via package "Bio::Seq::LargePrimaryS eq" (perhaps you forgot to load "Bio::Seq::LargePrimarySeq"?) at ThesisFrontEndS cript.pl line 94, line 33294. I can send you the code in question if you want to get a better look-see. Now, the reason I need the whole sequence is two-fold. For one, I need to be able to calculate CG% of genes as an experimental control of my project. The other part is that I need to be able to scan the genome for polyA sites with respect to their orientation to L1 sites, and there's no simple way to do that other than flat-out scanning the code. I'll definitely look into tweaking the /$tmp directory if that helps, but other than that, I have to at least try and make it work. Stefan Kirov wrote: > First you have to answer few questions: how do you get the object?/ > use Bio::Seq::LargePrimarySeq does not create an object it merely > makes the code available. > /If you post you code here it will be much easier to answer your > questions. How do you access the sequence (I hope you have read the > documentation, which states that it is not generally a good idea to > call $seq->seq). > How big is you /tmp? What are trying to accomplish and why you need > the whole seq in memory? > Stefan > > I've had the same issue... I ended up breaking down the sequences into > manageable fragments but would really like to get the largePrimarySeq > working. When I tried loading a chrom size sequence I just sat back > and watched my RAM get used up (2 gigs), then the swap, then the > crash.... So if anyone can help it'd benefit both of us! > > Thanks for any help, > Garrett > > > On Mon, 14 Mar 2005 16:22:40 -0500, iluminati at earthlink.net > > > wrote: > >> / I'm having this unuusal problem with loading this particular >> module. I > > />/ need b/c I'm working with chromosome-sized sequence files as a > part of > />/ my project, but yet it seems to not want to load properly even > when it's > />/ loaded using the following statement: > />/ />/ use Bio::Seq::LargePrimarySeq; > />/ />/ I checked my modules, and the necessary module is there. It > seems to > />/ just not want to load. Can anyone be of service? > />/ />/ _______________________________________________ > />/ Bioperl-l mailing list > />/ Bioperl-l at portal.open-bio.org > > />/ http://portal.open-bio.org/mailman/listinfo/bioperl-l > /> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From amackey at pcbi.upenn.edu Mon Mar 14 12:39:26 2005 From: amackey at pcbi.upenn.edu (Aaron J. Mackey) Date: Mon Mar 14 21:43:29 2005 Subject: [Bioperl-l] Aggressive aggregation? In-Reply-To: <200503101649.11352.lstein@cshl.edu> References: <3503c6582ad58219fe9c590fe09a0f46@pcbi.upenn.edu> <200503101649.11352.lstein@cshl.edu> Message-ID: <4235CC4E.4060000@pcbi.upenn.edu> In the "FWIW" category: This is what I did to break the "aggressive aggregation" (attached patch); it relies on the fact that when aggregation occurs, the base feature's range always (at least in my use cases so far) contains (or at least overlaps) the subfeature's ranges. So in the code below, when more than one base feature is detected, then range checking kicks in. This won't help you if, for instance, you're saving separate HSP linking information as different hits (because the hits will still overlap), but it does solve the more common case of one protein/EST matching in multiple, distinct locations on the genome. -Aaron -------------- next part -------------- diff -u -r1.30 Aggregator.pm --- Aggregator.pm 3 Aug 2004 09:17:23 -0000 1.30 +++ Aggregator.pm 14 Mar 2005 17:45:35 -0000 @@ -303,7 +303,7 @@ ? join ($;,$feature->group,$feature->refseq,$feature->source) : join ($;,$feature->group,$feature->refseq); if ($main_method && lc $feature->method eq lc $main_method) { - $aggregates{$key}{base} ||= $feature->clone; + push @{$aggregates{$key}{base}}, $feature->clone; } else { push @{$aggregates{$key}{subparts}},$feature; } @@ -321,18 +321,29 @@ if ($require_whole_object && $self->components) { next unless $aggregates{$_}{base}; # && $aggregates{$_}{subparts}; } - my $base = $aggregates{$_}{base}; + + my $base = shift @{$aggregates{$_}{base} || []}; unless ($base) { # no base, so create one my $first = $aggregates{$_}{subparts}[0]; $base = $first->clone; # to inherit parent coordinate system, etc $base->score(undef); $base->phase(undef); } - $base->method($pseudo_method); - $base->add_subfeature($_) foreach @{$aggregates{$_}{subparts}}; - $base->adjust_bounds; - $base->compound(1); # set the compound flag - push @result,$base; + while ($base) { + $base->method($pseudo_method); + if (@{$aggregates{$_}{base} || []}) { + # only capture those subfeatures that overlap the base + for my $part (@{$aggregates{$_{subparts}}}) { + $base->add_subfeature($part) if $part->overlaps($base, "strong"); + } + } else { + $base->add_subfeature($_) foreach @{$aggregates{$_}{subparts}}; + } + $base->adjust_bounds; + $base->compound(1); # set the compound flag + push @result,$base; + $base = shift @{$aggregates{$_}{base} || []} + } } @$features = @result; } From jason.stajich at duke.edu Mon Mar 14 21:48:40 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Mon Mar 14 21:43:34 2005 Subject: [Bioperl-l] Strange problem with Bio::Seq::LargePrimarySeq] In-Reply-To: <42364AEF.6050106@earthlink.net> References: <4236313C.4010006@utk.edu> <42364AEF.6050106@earthlink.net> Message-ID: <5e75fad6beff228d45e99a6da3129418@duke.edu> -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ On Mar 14, 2005, at 9:39 PM, iluminati@earthlink.net wrote: > Thanks for asking the questions! In hindsight, I realized that I > glossed over the problem in my frustration. > Anyway, here's the drill. I created a seq object from a > chromosome-sized fasta file like so... > > my $seqio = new Bio::SeqIO('-format'=>'largefasta', > '-file' =>Bio::Root::IO->catfile("/Thesis > Stuff/Chr$Chromosome/chr$Chromosome.fa")); > #Create the seq object > my $seq = $seqio->next_seq(); > > From there, I want to manipulate the sequence and use the functions > generally available to a seq object. Now, in order to the build the > seq object, I have to use the Bio::Seq::largefasta module. The reason > I need the Bio::Seq::LargePrimarySeq module is so that I can > manipulate the sequence and get to the necessary functions. However, > I get this error running the script despite including the > Bio::Seq:::LargePrimarySeq module: > > Can't locate object method "add_SeqFeature" via package > "Bio::Seq::LargePrimaryS > eq" (perhaps you forgot to load "Bio::Seq::LargePrimarySeq"?) at > ThesisFrontEndS > cript.pl line 94, line 33294. > > > I can send you the code in question if you want to get a better > look-see. > Now, the reason I need the whole sequence is two-fold. For one, I > need to be able to calculate CG% of genes as an experimental control > of my project. The other part is that I need to be able to scan the > genome for polyA sites with respect to their orientation to L1 sites, > and there's no simple way to do that other than flat-out scanning the > code. I'll definitely look into tweaking the /$tmp directory if that > helps, but other than that, I have to at least try and make it work. > You are still going to need to chunk it into pieces to do the scanning anyways - if you call $seq->seq() you will pull the entire chromosome into memory. You should consider doing things with Bio::DB::Fasta which implements an efficient indexed version of getting the sequences. If you want to add annotation consider Bio::DB::GFF system for doing all of this it is really more efficient. -jason > Stefan Kirov wrote: > >> First you have to answer few questions: how do you get the object?/ >> use Bio::Seq::LargePrimarySeq does not create an object it merely >> makes the code available. >> /If you post you code here it will be much easier to answer your >> questions. How do you access the sequence (I hope you have read the >> documentation, which states that it is not generally a good idea to >> call $seq->seq). >> How big is you /tmp? What are trying to accomplish and why you need >> the whole seq in memory? >> Stefan >> >> I've had the same issue... I ended up breaking down the sequences into >> manageable fragments but would really like to get the largePrimarySeq >> working. When I tried loading a chrom size sequence I just sat back >> and watched my RAM get used up (2 gigs), then the swap, then the >> crash.... So if anyone can help it'd benefit both of us! >> >> Thanks for any help, >> Garrett >> >> >> On Mon, 14 Mar 2005 16:22:40 -0500, iluminati at earthlink.net >> >> > > wrote: >> >>> / I'm having this unuusal problem with loading this particular >>> module. I >> >> />/ need b/c I'm working with chromosome-sized sequence files as a >> part of >> />/ my project, but yet it seems to not want to load properly even >> when it's >> />/ loaded using the following statement: >> />/ />/ use Bio::Seq::LargePrimarySeq; >> />/ />/ I checked my modules, and the necessary module is there. It >> seems to >> />/ just not want to load. Can anyone be of service? >> />/ />/ _______________________________________________ >> />/ Bioperl-l mailing list >> />/ Bioperl-l at portal.open-bio.org >> >> />/ http://portal.open-bio.org/mailman/listinfo/bioperl-l >> /> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -------------- next part -------------- A non-text attachment was scrubbed... Name: PGP.sig Type: application/pgp-signature Size: 186 bytes Desc: This is a digitally signed message part Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050314/3406d818/PGP.bin From hlapp at gmx.net Tue Mar 15 03:34:37 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue Mar 15 03:29:15 2005 Subject: [Bioperl-l] Full uniprot annotation extraction In-Reply-To: <1110788242.423548925f306@sms.ed.ac.uk> Message-ID: <10B58B33-952D-11D9-BBA7-000A959EB4C4@gmx.net> On Monday, March 14, 2005, at 12:17 AM, SG Edwards wrote: > I use Bio::DB::SwissProt to get the major annotation (e.g. primary > accession > number) but is there a way to get other annotation also (e.g. date of > the last > update?) > Swissprot (and uniprot) entries are parsed by the Bio::SeqIO::swiss parser which returns a Bio::Seq::RichSeqI object. Check out it's POD for some shortcut methods to get at specific annotation. $seq->get_dates() will return an array of dates as present in the entry; the date of last update will be the last element. -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From birney at ebi.ac.uk Tue Mar 15 03:52:22 2005 From: birney at ebi.ac.uk (Ewan Birney) Date: Tue Mar 15 04:21:50 2005 Subject: [Bioperl-l] Refseq and Splice Variants In-Reply-To: References: Message-ID: <4236A246.9080506@ebi.ac.uk> Peter J Stogios wrote: > Hi, > > I am wondering if there is a way of easily identifying Refseq sequences > that are splice variants of the same gene. If a gene has multiple > splice products that are supported by experimental evidence, they get > their own Refseq identifier, but there is no explicit reference to the > underlying gene they came from (outside of the identifier line). > > What I am trying to do is group sets of Refseq sequences in FASTA format > into sets of splice variants of the same gene. Does anyone know of a > way, using Bioperl, that I can accomplish this? > One way to handle this is to use Ensembl's genes/transcript links, and each transcript is linked to its RefSeq if it has one. The easiest way to do this is via Mart. Go to Ensembl, Click on Mart, Click on Human and Ensembl Genes, in filter make sure you don't have a genome filter on, optional select "Genes with RefSeq IDs" if you are only interested in the RefSeq subset, then click on next, and in Output, select Ensembl Gene-ID, Ensembl Transcript-ID, RefSeq-ID this will give you the 3 way table to use (you can get this as tab delimited). On route, you can note in filter how many different constraints you can make on this :) > Thanks, > > ~ > Peter J Stogios > Ph.D. candidate, Priv? Lab > Dept. of Medical Biophysics, University of Toronto > Ontario Cancer Institute, Princess Margaret Hospital > e: pstogios@uhnres.utoronto.ca > w: http://xtal.uhnres.utoronto.ca/prive > p: (416) 946-4501x3280 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From skirov at utk.edu Tue Mar 15 08:51:02 2005 From: skirov at utk.edu (Stefan Kirov) Date: Tue Mar 15 08:46:16 2005 Subject: [Bioperl-l] Strange problem with Bio::Seq::LargePrimarySeq] In-Reply-To: <42364AEF.6050106@earthlink.net> References: <4236313C.4010006@utk.edu> <42364AEF.6050106@earthlink.net> Message-ID: <4236E846.4020202@utk.edu> Your first problem is that you cannot access SeqFeature methods (Annotation as well) as LargePrimarySeq inherits (unlike LargeSeq) only from PrimarySeq. Therefore you don't have these methods available. Two approaches: create an empty Seq object to hold the annotation, or read the file into a new LargeSeq object: my $id=<>; while (<>) { chomp; $largeseq->add_sequence_as_string; } I don't know what the performance will be with the second object when you add features. First one is far safer I think. Next problem performance screening the sequence. Unless you have something BIG it is likely you will have to split the sequence in at least several chunks (then you can see if this disrupted a signal, site..etc.) or get few gigs more for your RAM (best would be some shared memory and a grid if you want to kill a fly with a tank :-) ). Let me know if you have further questions. Hope this helps and good luck. Stefan iluminati@earthlink.net wrote: > Thanks for asking the questions! In hindsight, I realized that I > glossed over the problem in my frustration. > Anyway, here's the drill. I created a seq object from a > chromosome-sized fasta file like so... > > my $seqio = new Bio::SeqIO('-format'=>'largefasta', > '-file' =>Bio::Root::IO->catfile("/Thesis > Stuff/Chr$Chromosome/chr$Chromosome.fa")); > #Create the seq object > my $seq = $seqio->next_seq(); > > From there, I want to manipulate the sequence and use the functions > generally available to a seq object. Now, in order to the build the > seq object, I have to use the Bio::Seq::largefasta module. The reason > I need the Bio::Seq::LargePrimarySeq module is so that I can > manipulate the sequence and get to the necessary functions. However, > I get this error running the script despite including the > Bio::Seq:::LargePrimarySeq module: > > Can't locate object method "add_SeqFeature" via package > "Bio::Seq::LargePrimaryS > eq" (perhaps you forgot to load "Bio::Seq::LargePrimarySeq"?) at > ThesisFrontEndS > cript.pl line 94, line 33294. > > > I can send you the code in question if you want to get a better look-see. > Now, the reason I need the whole sequence is two-fold. For one, I > need to be able to calculate CG% of genes as an experimental control > of my project. The other part is that I need to be able to scan the > genome for polyA sites with respect to their orientation to L1 sites, > and there's no simple way to do that other than flat-out scanning the > code. I'll definitely look into tweaking the /$tmp directory if that > helps, but other than that, I have to at least try and make it work. > > Stefan Kirov wrote: > >> First you have to answer few questions: how do you get the object?/ >> use Bio::Seq::LargePrimarySeq does not create an object it merely >> makes the code available. >> /If you post you code here it will be much easier to answer your >> questions. How do you access the sequence (I hope you have read the >> documentation, which states that it is not generally a good idea to >> call $seq->seq). >> How big is you /tmp? What are trying to accomplish and why you need >> the whole seq in memory? >> Stefan >> >> I've had the same issue... I ended up breaking down the sequences into >> manageable fragments but would really like to get the largePrimarySeq >> working. When I tried loading a chrom size sequence I just sat back >> and watched my RAM get used up (2 gigs), then the swap, then the >> crash.... So if anyone can help it'd benefit both of us! >> >> Thanks for any help, >> Garrett >> >> >> On Mon, 14 Mar 2005 16:22:40 -0500, iluminati at earthlink.net >> >> > > wrote: >> >>> / I'm having this unuusal problem with loading this particular >>> module. I >> >> >> />/ need b/c I'm working with chromosome-sized sequence files as a >> part of >> />/ my project, but yet it seems to not want to load properly even >> when it's >> />/ loaded using the following statement: >> />/ />/ use Bio::Seq::LargePrimarySeq; >> />/ />/ I checked my modules, and the necessary module is there. It >> seems to >> />/ just not want to load. Can anyone be of service? >> />/ />/ _______________________________________________ >> />/ Bioperl-l mailing list >> />/ Bioperl-l at portal.open-bio.org >> >> />/ http://portal.open-bio.org/mailman/listinfo/bioperl-l >> /> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> > From iluminati at earthlink.net Tue Mar 15 09:27:05 2005 From: iluminati at earthlink.net (iluminati@earthlink.net) Date: Tue Mar 15 09:22:52 2005 Subject: [Bioperl-l] Strange problem with Bio::Seq::LargePrimarySeq] In-Reply-To: <4236E846.4020202@utk.edu> References: <4236313C.4010006@utk.edu> <42364AEF.6050106@earthlink.net> <4236E846.4020202@utk.edu> Message-ID: <4236F0B9.2040509@earthlink.net> I see your point with regard to splitting it up. I was debating how exactly to do the split myself, and I'll work on that later as I look at the code. One quick question about the code snipet you posted, though. Are you trying to create a separate stream from the extant SeqIO? If so, why? Wouldn't it be redundant? Stefan Kirov wrote: > Your first problem is that you cannot access SeqFeature methods > (Annotation as well) as LargePrimarySeq inherits (unlike LargeSeq) > only from PrimarySeq. Therefore you don't have these methods > available. Two approaches: create an empty Seq object to hold the > annotation, or read the file into a new LargeSeq object: > my $id=<>; > while (<>) { > chomp; > $largeseq->add_sequence_as_string; > } > I don't know what the performance will be with the second object when > you add features. First one is far safer I think. > Next problem performance screening the sequence. Unless you have > something BIG it is likely you will have to split the sequence in at > least several chunks (then you can see if this disrupted a signal, > site..etc.) or get few gigs more for your RAM (best would be some > shared memory and a grid if you want to kill a fly with a tank :-) ). > Let me know if you have further questions. > Hope this helps and good luck. > Stefan > > iluminati@earthlink.net wrote: > >> Thanks for asking the questions! In hindsight, I realized that I >> glossed over the problem in my frustration. >> Anyway, here's the drill. I created a seq object from a >> chromosome-sized fasta file like so... >> >> my $seqio = new Bio::SeqIO('-format'=>'largefasta', >> '-file' =>Bio::Root::IO->catfile("/Thesis >> Stuff/Chr$Chromosome/chr$Chromosome.fa")); >> #Create the seq object >> my $seq = $seqio->next_seq(); >> >> From there, I want to manipulate the sequence and use the functions >> generally available to a seq object. Now, in order to the build the >> seq object, I have to use the Bio::Seq::largefasta module. The >> reason I need the Bio::Seq::LargePrimarySeq module is so that I can >> manipulate the sequence and get to the necessary functions. However, >> I get this error running the script despite including the >> Bio::Seq:::LargePrimarySeq module: >> >> Can't locate object method "add_SeqFeature" via package >> "Bio::Seq::LargePrimaryS >> eq" (perhaps you forgot to load "Bio::Seq::LargePrimarySeq"?) at >> ThesisFrontEndS >> cript.pl line 94, line 33294. >> >> >> I can send you the code in question if you want to get a better >> look-see. >> Now, the reason I need the whole sequence is two-fold. For one, I >> need to be able to calculate CG% of genes as an experimental control >> of my project. The other part is that I need to be able to scan the >> genome for polyA sites with respect to their orientation to L1 sites, >> and there's no simple way to do that other than flat-out scanning the >> code. I'll definitely look into tweaking the /$tmp directory if that >> helps, but other than that, I have to at least try and make it work. >> >> Stefan Kirov wrote: >> >>> First you have to answer few questions: how do you get the object?/ >>> use Bio::Seq::LargePrimarySeq does not create an object it merely >>> makes the code available. >>> /If you post you code here it will be much easier to answer your >>> questions. How do you access the sequence (I hope you have read the >>> documentation, which states that it is not generally a good idea to >>> call $seq->seq). >>> How big is you /tmp? What are trying to accomplish and why you need >>> the whole seq in memory? >>> Stefan >>> >>> I've had the same issue... I ended up breaking down the sequences into >>> manageable fragments but would really like to get the largePrimarySeq >>> working. When I tried loading a chrom size sequence I just sat back >>> and watched my RAM get used up (2 gigs), then the swap, then the >>> crash.... So if anyone can help it'd benefit both of us! >>> >>> Thanks for any help, >>> Garrett >>> >>> >>> On Mon, 14 Mar 2005 16:22:40 -0500, iluminati at earthlink.net >>> >>> >> > wrote: >>> >>>> / I'm having this unuusal problem with loading this particular >>>> module. I >>> >>> >>> >>> />/ need b/c I'm working with chromosome-sized sequence files as a >>> part of >>> />/ my project, but yet it seems to not want to load properly even >>> when it's >>> />/ loaded using the following statement: >>> />/ />/ use Bio::Seq::LargePrimarySeq; >>> />/ />/ I checked my modules, and the necessary module is there. It >>> seems to >>> />/ just not want to load. Can anyone be of service? >>> />/ />/ _______________________________________________ >>> />/ Bioperl-l mailing list >>> />/ Bioperl-l at portal.open-bio.org >>> >>> />/ http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> /> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From skirov at utk.edu Tue Mar 15 09:53:43 2005 From: skirov at utk.edu (Stefan Kirov) Date: Tue Mar 15 09:48:14 2005 Subject: [Bioperl-l] Strange problem with Bio::Seq::LargePrimarySeq] In-Reply-To: <4236F0B9.2040509@earthlink.net> References: <4236313C.4010006@utk.edu> <42364AEF.6050106@earthlink.net> <4236E846.4020202@utk.edu> <4236F0B9.2040509@earthlink.net> Message-ID: <4236F6F7.9070103@utk.edu> iluminati@earthlink.net wrote: > I see your point with regard to splitting it up. I was debating how > exactly to do the split myself, and I'll work on that later as I look > at the code. One quick question about the code snipet you posted, > though. Are you trying to create a separate stream from the extant > SeqIO? If so, why? Wouldn't it be redundant? > No, this just reads standard file from STDIN (not through bioperl SeqIO) and puts it in a LargeSeq instead of LargePrimarySeq. I don't know why largefasta has been implemented to create LargePrimarySeq, not LargeSeq object, but this is the current state. Stefan > Stefan Kirov wrote: > >> Your first problem is that you cannot access SeqFeature methods >> (Annotation as well) as LargePrimarySeq inherits (unlike LargeSeq) >> only from PrimarySeq. Therefore you don't have these methods >> available. Two approaches: create an empty Seq object to hold the >> annotation, or read the file into a new LargeSeq object: >> my $id=<>; >> while (<>) { >> chomp; >> $largeseq->add_sequence_as_string; >> } >> I don't know what the performance will be with the second object when >> you add features. First one is far safer I think. >> Next problem performance screening the sequence. Unless you have >> something BIG it is likely you will have to split the sequence in at >> least several chunks (then you can see if this disrupted a signal, >> site..etc.) or get few gigs more for your RAM (best would be some >> shared memory and a grid if you want to kill a fly with a tank :-) ). >> Let me know if you have further questions. >> Hope this helps and good luck. >> Stefan >> >> iluminati@earthlink.net wrote: >> >>> Thanks for asking the questions! In hindsight, I realized that I >>> glossed over the problem in my frustration. >>> Anyway, here's the drill. I created a seq object from a >>> chromosome-sized fasta file like so... >>> >>> my $seqio = new Bio::SeqIO('-format'=>'largefasta', >>> '-file' >>> =>Bio::Root::IO->catfile("/Thesis >>> Stuff/Chr$Chromosome/chr$Chromosome.fa")); >>> #Create the seq object >>> my $seq = $seqio->next_seq(); >>> >>> From there, I want to manipulate the sequence and use the functions >>> generally available to a seq object. Now, in order to the build the >>> seq object, I have to use the Bio::Seq::largefasta module. The >>> reason I need the Bio::Seq::LargePrimarySeq module is so that I can >>> manipulate the sequence and get to the necessary functions. >>> However, I get this error running the script despite including the >>> Bio::Seq:::LargePrimarySeq module: >>> >>> Can't locate object method "add_SeqFeature" via package >>> "Bio::Seq::LargePrimaryS >>> eq" (perhaps you forgot to load "Bio::Seq::LargePrimarySeq"?) at >>> ThesisFrontEndS >>> cript.pl line 94, line 33294. >>> >>> >>> I can send you the code in question if you want to get a better >>> look-see. >>> Now, the reason I need the whole sequence is two-fold. For one, I >>> need to be able to calculate CG% of genes as an experimental control >>> of my project. The other part is that I need to be able to scan the >>> genome for polyA sites with respect to their orientation to L1 >>> sites, and there's no simple way to do that other than flat-out >>> scanning the code. I'll definitely look into tweaking the /$tmp >>> directory if that helps, but other than that, I have to at least try >>> and make it work. >>> >>> Stefan Kirov wrote: >>> >>>> First you have to answer few questions: how do you get the object?/ >>>> use Bio::Seq::LargePrimarySeq does not create an object it merely >>>> makes the code available. >>>> /If you post you code here it will be much easier to answer your >>>> questions. How do you access the sequence (I hope you have read the >>>> documentation, which states that it is not generally a good idea to >>>> call $seq->seq). >>>> How big is you /tmp? What are trying to accomplish and why you need >>>> the whole seq in memory? >>>> Stefan >>>> >>>> I've had the same issue... I ended up breaking down the sequences into >>>> manageable fragments but would really like to get the largePrimarySeq >>>> working. When I tried loading a chrom size sequence I just sat back >>>> and watched my RAM get used up (2 gigs), then the swap, then the >>>> crash.... So if anyone can help it'd benefit both of us! >>>> >>>> Thanks for any help, >>>> Garrett >>>> >>>> >>>> On Mon, 14 Mar 2005 16:22:40 -0500, iluminati at earthlink.net >>>> >>>> >>> > wrote: >>>> >>>>> / I'm having this unuusal problem with loading this particular >>>>> module. I >>>> >>>> >>>> >>>> >>>> />/ need b/c I'm working with chromosome-sized sequence files as a >>>> part of >>>> />/ my project, but yet it seems to not want to load properly even >>>> when it's >>>> />/ loaded using the following statement: >>>> />/ />/ use Bio::Seq::LargePrimarySeq; >>>> />/ />/ I checked my modules, and the necessary module is there. >>>> It seems to >>>> />/ just not want to load. Can anyone be of service? >>>> />/ />/ _______________________________________________ >>>> />/ Bioperl-l mailing list >>>> />/ Bioperl-l at portal.open-bio.org >>>> >>>> />/ http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>> /> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l@portal.open-bio.org >>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> > From sasalacolla at libero.it Tue Mar 15 10:18:17 2005 From: sasalacolla at libero.it (sasalacolla@libero.it) Date: Tue Mar 15 10:14:05 2005 Subject: [Bioperl-l] help me with "blastall call crashed:-1" Message-ID: Hi, please help me. I tried to use psort, but i only got this message: Fatal error: ------------- EXCEPTION ------------- MSG: blastall call crashed: -1 /usr/bin/blastall -p blastp -d /usr/local/psort//conf/analysis/sclblast/gramneg/sclblast -i /tmp/IBcvdGTw4w -e 1e-09 -o /tmp/NaRiG0fdH8 -F F STACK Bio::Tools::Run::StandAloneBlast::_runblast /usr/share/perl5/Bio/Tools/Run/StandAloneBlast.pm:732 STACK Bio::Tools::Run::StandAloneBlast::_generic_local_blast /usr/share/perl5/Bio/Tools/Run/StandAloneBlast.pm:680STACK Bio::Tools::Run::StandAloneBlast::blastall /usr/share/perl5/Bio/Tools/Run/StandAloneBlast.pm:536 STACK Bio::Tools::Run::SCLBlast::blast /usr/local/share/perl/5.8.4/Bio/Tools/Run/SCLBlast.pm:134 STACK Bio::Tools::PSort::Module::SCLBlast::run /usr/local/share/perl/5.8.4/Bio/Tools/PSort/Module/SCLBlast.pm:72 STACK Bio::Tools::PSort::Pathway::__ANON__ /usr/local/share/perl/5.8.4/Bio/Tools/PSort/Pathway.pm:194 STACK Bio::Tools::PSort::Pathway::traverse /usr/local/share/perl/5.8.4/Bio/Tools/PSort/Pathway.pm:157 STACK Bio::Tools::PSort::classify /usr/local/share/perl/5.8.4/Bio/Tools/PSort.pm:160 STACK (eval) /usr/local/bin/psort:318 STACK toplevel /usr/local/bin/psort:318 -------------------------------------- ____________________________________________________________ 6X velocizzare la tua navigazione a 56k? 6X Web Accelerator di Libero! Scaricalo su INTERNET GRATIS 6X http://www.libero.it From sanges at biogem.it Tue Mar 15 11:13:41 2005 From: sanges at biogem.it (Remo Sanges) Date: Tue Mar 15 11:10:02 2005 Subject: [Bioperl-l] help me with "blastall call crashed:-1" In-Reply-To: References: Message-ID: <667061348983a652590b02efabc6637d@biogem.it> On Mar 15, 2005, at 4:18 PM, sasalacolla@@libero..it wrote: > Hi, please help me. I tried to use psort, but i only got this message: > > Fatal error: > ------------- EXCEPTION ------------- > MSG: blastall call crashed: -1 /usr/bin/blastall -p blastp > -d /usr/local/psort//conf/analysis/sclblast/gramneg/sclblast > -i /tmp/IBcvdGTw4w -e 1e-09 -o /tmp/NaRiG0fdH8 -F F > > STACK > Bio::Tools::Run::StandAloneBlast::_runblast > /usr/share/perl5/Bio/Tools/Run/StandAloneBlast.pm:732 > STACK > Bio::Tools::Run::StandAloneBlast::_generic_local_blast > /usr/share/perl5/Bio/Tools/Run/StandAloneBlast.pm:680STACK > Bio::Tools::Run::StandAloneBlast::blastall > /usr/share/perl5/Bio/Tools/Run/StandAloneBlast.pm:536 > STACK > Bio::Tools::Run::SCLBlast::blast > /usr/local/share/perl/5.8.4/Bio/Tools/Run/SCLBlast.pm:134 > STACK > Bio::Tools::PSort::Module::SCLBlast::run > /usr/local/share/perl/5.8.4/Bio/Tools/PSort/Module/SCLBlast.pm:72 > STACK > Bio::Tools::PSort::Pathway::__ANON__ > /usr/local/share/perl/5.8.4/Bio/Tools/PSort/Pathway.pm:194 > STACK > Bio::Tools::PSort::Pathway::traverse > /usr/local/share/perl/5.8.4/Bio/Tools/PSort/Pathway.pm:157 > STACK > Bio::Tools::PSort::classify > /usr/local/share/perl/5.8.4/Bio/Tools/PSort.pm:160 > STACK (eval) /usr/local/bin/psort:318 > STACK toplevel /usr/local/bin/psort:318 Please when you ask for help, consider to send the code that is failing, in other ways we don't have a good starting point to help you and many people will trash your message... BTW I have never used PSort.pm but your problem come from a blast call, so my two cents from StandAloneBlast.pm considerations: in your error message It seems to me that you at least has a wrong definition of the local database directory, see here: > -d /usr/local/psort//conf/analysis/sclblast/gramneg/sclblast this probably means that: 1 your BLASTDATADIR is defined to be /usr/local/psort/ Even if this is the right location you should avoid the final '/' But probably your database is in /conf/analysis/sclblast/gramneg folder right? 2 you have passed the database with the full path into your params: /conf/analysis/sclblast/gramneg/sclblast when you needed to simply use 'sclblast' This is a cut from the code of the module: If local BLAST databases are not stored in the standard /data directory, the variable BLASTDATADIR will need to be set explicitly You need to enable Blast to find the directory containing the databases. This can be done in (at least) two different ways: 1. define an environmental variable BLASTDATADIR: export BLASTDATADIR=/conf/analysis/sclblast/gramneg or 2. include a definition of an environmental variable BLASTDATADIR in every script that will use StandAloneBlast.pm. BEGIN {$ENV{BLASTDATADIR} = ''/conf/analysis/sclblast/gramneg"; } HTH Remo From chad at dieselwurks.com Tue Mar 15 11:31:21 2005 From: chad at dieselwurks.com (Chad Matsalla) Date: Tue Mar 15 11:25:53 2005 Subject: [Bioperl-l] naive question about Bio::Tools::Primer3 Message-ID: Greetings! > How do I best get the individual result lines from the primer3 output > file, using Bio::Tools::Primer3? > > I'll include the script I tried and the file it parsed. > When I run it, I get "HASH(0xccfc)". > > #!/usr/bin/perl -w > use lib "/Users/Ned/Documents/Perl/bioperl_source/bioperl-1.4"; > use Bio::AlignIO; > use Bio::Tools::Primer3;# read a primer3 output file > my $primer3=Bio::Tools::Primer3->new(-file=>"p3test1.out"); > #put the left- and right-primer stuff into hashes. > my $primer=$primer3->next_primer; > print "The right primer in the stream is ", > $primer->get_primer('-right_primer')->seq->seq, "\n"; > # to return results > print $primer3->primer_results(0,'PRIMER_LEFT_INPUT'); ^^^^^^^^^^^^^^ I added a couple of examples into t/primer3.t on how this can be done. Everything is fine in your script until the primer_results line. The answer to your question is to *not* use the method primer_results() because that method does not actually create Bio::Seq::PrimedSeq objects. You should be accessing primers from the stream by the next_primer method. Further to that I think that the method primer_results should be renamed to _primer_results to indicate that it is a private method. Does anybody object? Chad Matsalla From s0460205 at sms.ed.ac.uk Tue Mar 15 13:37:36 2005 From: s0460205 at sms.ed.ac.uk (SG Edwards) Date: Tue Mar 15 13:32:00 2005 Subject: [Bioperl-l] Swissprot query - Help! Message-ID: <1110911856.42372b7016563@sms.ed.ac.uk> Hi, sorry for the obvious question but I'm really new to Perl/BioPerl!! I want to run a script that sends a query to swissprot and returns the list of sequences as Seq objects. I have tried the following code which throws an exception 'MSG: Must speciy a value for uids to query'. use Bio::DB::SwissProt; $query = "Arabidopsis[ORGN] AND topoisomerase[TITL]"; $sp_obj = Bio::DB::SwissProt->new; $stream_obj = $sp_obj->get_Stream_by_query($query); while ($seq_obj = $stream_obj->next_seq) { #print out the id print $seq_obj->display_id, "\n"; } exit; Any help is greatly appreciated! From j1gregor at biomail.ucsd.edu Tue Mar 15 16:07:46 2005 From: j1gregor at biomail.ucsd.edu (James Gregory) Date: Tue Mar 15 16:02:27 2005 Subject: [Bioperl-l] cannot find path to blastall Message-ID: I'm trying to set up a standalone blast and I'm getting an error message that says MSG: cannot find path to blastall code: my @params = (program => 'blastp', database => 'db.psq'); my $factory = Bio::Tools::Run::StandAloneBlast->new(@params); my $path = '/path/to/blastall/'; $path = $factory->program_path($path); #BLAST my $blast_report = $factory->blastall($blast_seq); any help would be appreciated. James Gregory University of California, San Diego Department of Biological Sciences From brian_osborne at cognia.com Tue Mar 15 16:16:59 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Tue Mar 15 16:12:58 2005 Subject: [Bioperl-l] cannot find path to blastall In-Reply-To: Message-ID: James, On Unix? Windows? Cygwin? Something else? Also is "/path/to/blastall" the actual location of blastall? On a Linux machine this might be something like "/usr/local/bin/blastall". We need to know a bit more. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of James Gregory Sent: Tuesday, March 15, 2005 4:08 PM To: bioperl-l@bioperl.org Subject: [Bioperl-l] cannot find path to blastall I'm trying to set up a standalone blast and I'm getting an error message that says MSG: cannot find path to blastall code: my @params = (program => 'blastp', database => 'db.psq'); my $factory = Bio::Tools::Run::StandAloneBlast->new(@params); my $path = '/path/to/blastall/'; $path = $factory->program_path($path); #BLAST my $blast_report = $factory->blastall($blast_seq); any help would be appreciated. James Gregory University of California, San Diego Department of Biological Sciences _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From j1gregor at biomail.ucsd.edu Tue Mar 15 16:30:25 2005 From: j1gregor at biomail.ucsd.edu (James Gregory) Date: Tue Mar 15 16:24:50 2005 Subject: [Bioperl-l] cannot find path to blastall In-Reply-To: References: Message-ID: fixed it, i thought you could manually set the path within the perl script. I just added blastall to usr/bin/local (on a unix machine) and it works, except now I'm having other troubles. i ran formatdb -i SLR16.1_prot.txt -o T -n subtilis where SLR16.1_prot.txt is a fasta formatted file. I get these files from formatdb. subtilis.phr subtilis.pin subtilis.psd subtilis.psi subtilis.psq from my understanding the .psq file is the one you want so i have #create seq object for blast my $blast_seq = Bio::Seq->new( '-id' => "$seq_name", '-seq' => "$prot_seq"); #set BLAST params my @params = (program => 'blastp', database => '/path/to/file/subtilis.psq'); my $factory = Bio::Tools::Run::StandAloneBlast->new(@params); #BLAST my $blast_report = $factory->blastall($blast_seq); but i'm getting this error: Could not find index files for database /home/j1gregor/transposon/subtilis.psq ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: blastall call crashed: 256 /usr/local/bin/blastall -p blastp -d "/home/j1gregor/transposon/subtilis.psq" -i /tmp/onOzNhelp8 -o /tmp/HlsmLpIFrT STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.3/Bio/Root/Root.pm:328 STACK: Bio::Tools::Run::StandAloneBlast::_runblast /usr/lib/perl5/site_perl/5.8.3/Bio/Tools/Run/StandAloneBlast.pm:732 STACK: Bio::Tools::Run::StandAloneBlast::_generic_local_blast /usr/lib/perl5/site_perl/5.8.3/Bio/Tools/Run/StandAloneBlast.pm:680 STACK: Bio::Tools::Run::StandAloneBlast::blastall /usr/lib/perl5/site_perl/5.8.3/Bio/Tools/Run/StandAloneBlast.pm:536 STACK: GFP_find.pl:85 thanks again, James On Tue, 15 Mar 2005, Brian Osborne wrote: > James, > > On Unix? Windows? Cygwin? Something else? Also is "/path/to/blastall" the > actual location of blastall? On a Linux machine this might be something like > "/usr/local/bin/blastall". We need to know a bit more. > > Brian O. > > > > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of James Gregory > Sent: Tuesday, March 15, 2005 4:08 PM > To: bioperl-l@bioperl.org > Subject: [Bioperl-l] cannot find path to blastall > > > > I'm trying to set up a standalone blast and I'm getting an error message > that says > > MSG: cannot find path to blastall > > code: > > my @params = (program => 'blastp', > database => 'db.psq'); > my $factory = Bio::Tools::Run::StandAloneBlast->new(@params); > my $path = '/path/to/blastall/'; > $path = $factory->program_path($path); > > #BLAST > my $blast_report = $factory->blastall($blast_seq); > > any help would be appreciated. > > James Gregory > University of California, San Diego > Department of Biological Sciences > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > From brian_osborne at cognia.com Tue Mar 15 16:43:26 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Tue Mar 15 16:37:49 2005 Subject: [Bioperl-l] Swissprot query - Help! In-Reply-To: <1110911856.42372b7016563@sms.ed.ac.uk> Message-ID: SG, Well, I think my example code got you into this, I think I should help you out! You can't actually query Swissprot this way, using those text values and field names, you can only do these text queries using Genbank currently. You'd use Bio::DB::Query::GenBank for this, not Bio::DB::GenBank. If you want to query Swissprot you're limited to ids and accession numbers. I will clarify the HOWTO, it's a bit unclear on this point. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of SG Edwards Sent: Tuesday, March 15, 2005 1:38 PM To: bioperl-l@bioperl.org Subject: [Bioperl-l] Swissprot query - Help! Hi, sorry for the obvious question but I'm really new to Perl/BioPerl!! I want to run a script that sends a query to swissprot and returns the list of sequences as Seq objects. I have tried the following code which throws an exception 'MSG: Must speciy a value for uids to query'. use Bio::DB::SwissProt; $query = "Arabidopsis[ORGN] AND topoisomerase[TITL]"; $sp_obj = Bio::DB::SwissProt->new; $stream_obj = $sp_obj->get_Stream_by_query($query); while ($seq_obj = $stream_obj->next_seq) { #print out the id print $seq_obj->display_id, "\n"; } exit; Any help is greatly appreciated! _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From skirov at utk.edu Tue Mar 15 16:46:13 2005 From: skirov at utk.edu (Stefan Kirov) Date: Tue Mar 15 16:40:49 2005 Subject: [Bioperl-l] Error reporting/Validation implemented In-Reply-To: <4235FEF4.2070901@gpc-biotech.com> References: <4234564D.7010906@gpc-biotech.com> <97e5143a0b434ab721cd023556a1a5b2@dalkescientific.com> <4235FEF4.2070901@gpc-biotech.com> Message-ID: <423757A5.8050405@utk.edu> Mingyi, Few things: I used your parser to produce Bioperl objects based on some of the high level features and compared it ot what I have. Your parser is considerably faster (about twice), but it is still hard to tell as I am descending further in the hierarchy with mine. At the same time I don't think the difference will vanish, so I will start building over your parser to produce bioperl objects. I am not sure exactly how I am going to deal with the relationships that are necessary, but I'll deal with it when I finsih everything else. By the way it took 9 minutes on a 64 bit Xeon 3.4GHz even with Bioperl objects construction on the whole Homo_sapiens ASN file. The data that went inside the objects was: general desc of the genes (symbol, name, summary, etc.), organsism descr. but none of the truly big parts. Unfortunately, I am leaving tomorrow for a conference, so I will have some more next week earliest. Thanks for sharing the code! Stefan Mingyi Liu wrote: > Hi, there, > > I just implemented basic error reporting and validation > functionalities in my Entrez Gene parser in Perl (the regex version). > The validation will catch all non-conforming data, while error > reporting reports line number, error type, and the first 20 > (customizable) characters of the offending data (but the line number > could be incorrect if the format resulted in an exception, which is > hard to deal with for ASN.1-formatted data, although easy for XML > parsers). > The speed for the parser of course slowed down, but I'd say it'd still > beat most parsers hands down. The full human genome now takes a bit > over 12 minutes instead of 11 minutes to process on one Intel Xeon 2.4 > GHz CPU. So I don't think my parser's speed has much to do with > performing validation or not. > > I had also communicated with Stefan Kirov and turns out the dead > entries and 0-sized (should be 1-sized) arrays were simply related to > data trimming options. So far, so good. > > If anyone is interested, check it out at > http://www.sourceforge.net/projects/egparser/. > > Regards, > > Mingyi > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From brian_osborne at cognia.com Tue Mar 15 16:46:03 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Tue Mar 15 16:40:55 2005 Subject: [Bioperl-l] cannot find path to blastall In-Reply-To: Message-ID: James, Does: my @params = (program => 'blastp', database => '"/home/j1gregor/transposon/subtilis'); work? Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of James Gregory Sent: Tuesday, March 15, 2005 4:30 PM To: Brian Osborne Cc: bioperl-l@bioperl.org Subject: RE: [Bioperl-l] cannot find path to blastall fixed it, i thought you could manually set the path within the perl script. I just added blastall to usr/bin/local (on a unix machine) and it works, except now I'm having other troubles. i ran formatdb -i SLR16.1_prot.txt -o T -n subtilis where SLR16.1_prot.txt is a fasta formatted file. I get these files from formatdb. subtilis.phr subtilis.pin subtilis.psd subtilis.psi subtilis.psq from my understanding the .psq file is the one you want so i have #create seq object for blast my $blast_seq = Bio::Seq->new( '-id' => "$seq_name", '-seq' => "$prot_seq"); #set BLAST params my @params = (program => 'blastp', database => '/path/to/file/subtilis.psq'); my $factory = Bio::Tools::Run::StandAloneBlast->new(@params); #BLAST my $blast_report = $factory->blastall($blast_seq); but i'm getting this error: Could not find index files for database /home/j1gregor/transposon/subtilis.psq ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: blastall call crashed: 256 /usr/local/bin/blastall -p blastp -d "/home/j1gregor/transposon/subtilis.psq" -i /tmp/onOzNhelp8 -o /tmp/HlsmLpIFrT STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.3/Bio/Root/Root.pm:328 STACK: Bio::Tools::Run::StandAloneBlast::_runblast /usr/lib/perl5/site_perl/5.8.3/Bio/Tools/Run/StandAloneBlast.pm:732 STACK: Bio::Tools::Run::StandAloneBlast::_generic_local_blast /usr/lib/perl5/site_perl/5.8.3/Bio/Tools/Run/StandAloneBlast.pm:680 STACK: Bio::Tools::Run::StandAloneBlast::blastall /usr/lib/perl5/site_perl/5.8.3/Bio/Tools/Run/StandAloneBlast.pm:536 STACK: GFP_find.pl:85 thanks again, James On Tue, 15 Mar 2005, Brian Osborne wrote: > James, > > On Unix? Windows? Cygwin? Something else? Also is "/path/to/blastall" the > actual location of blastall? On a Linux machine this might be something like > "/usr/local/bin/blastall". We need to know a bit more. > > Brian O. > > > > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of James Gregory > Sent: Tuesday, March 15, 2005 4:08 PM > To: bioperl-l@bioperl.org > Subject: [Bioperl-l] cannot find path to blastall > > > > I'm trying to set up a standalone blast and I'm getting an error message > that says > > MSG: cannot find path to blastall > > code: > > my @params = (program => 'blastp', > database => 'db.psq'); > my $factory = Bio::Tools::Run::StandAloneBlast->new(@params); > my $path = '/path/to/blastall/'; > $path = $factory->program_path($path); > > #BLAST > my $blast_report = $factory->blastall($blast_seq); > > any help would be appreciated. > > James Gregory > University of California, San Diego > Department of Biological Sciences > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From j1gregor at biomail.ucsd.edu Tue Mar 15 17:00:18 2005 From: j1gregor at biomail.ucsd.edu (James Gregory) Date: Tue Mar 15 16:54:42 2005 Subject: [Bioperl-l] cannot find path to blastall In-Reply-To: References: Message-ID: i think that solved the problem. but more problems. [blastall] WARNING: [000.000] >BA-124-EB41_B01.ab1.Seq: Unable to open BLOSUM62 [blastall] WARNING: [000.000] >BA-124-EB41_B01.ab1.Seq: BlastScoreBlkMatFill returned non-zero status [blastall] WARNING: [000.000] >BA-124-EB41_B01.ab1.Seq: SetUpBlastSearch failed. do i need to put the entre ./ncbi_toolbox/ncbi/bin into my path (usr/local/bin)? although I don't think BLOSUM62 is there.. couldn't find it in the ncbi toolbox. James On Tue, 15 Mar 2005, Brian Osborne wrote: > James, > > Does: > > my @params = (program => 'blastp', > database => '"/home/j1gregor/transposon/subtilis'); > > work? > > Brian O. > > > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of James Gregory > Sent: Tuesday, March 15, 2005 4:30 PM > To: Brian Osborne > Cc: bioperl-l@bioperl.org > Subject: RE: [Bioperl-l] cannot find path to blastall > > > fixed it, i thought you could manually set the path within the perl > script. I just added blastall to usr/bin/local (on a unix machine) and it > works, except now I'm having other troubles. > > i ran > formatdb -i SLR16.1_prot.txt -o T -n subtilis > > where SLR16.1_prot.txt is a fasta formatted file. I get these files from > formatdb. > > subtilis.phr subtilis.pin subtilis.psd subtilis.psi subtilis.psq > > from my understanding the .psq file is the one you want so i have > > #create seq object for blast > my $blast_seq = Bio::Seq->new( '-id' => "$seq_name", > '-seq' => "$prot_seq"); > > > #set BLAST params > my @params = (program => 'blastp', > database => '/path/to/file/subtilis.psq'); > my $factory = Bio::Tools::Run::StandAloneBlast->new(@params); > > > #BLAST > my $blast_report = $factory->blastall($blast_seq); > > > but i'm getting this error: > > Could not find index files for database > /home/j1gregor/transposon/subtilis.psq > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: blastall call crashed: 256 /usr/local/bin/blastall -p blastp -d > "/home/j1gregor/transposon/subtilis.psq" -i /tmp/onOzNhelp8 -o > /tmp/HlsmLpIFrT > > STACK: Error::throw > STACK: Bio::Root::Root::throw > /usr/lib/perl5/site_perl/5.8.3/Bio/Root/Root.pm:328 > STACK: Bio::Tools::Run::StandAloneBlast::_runblast > /usr/lib/perl5/site_perl/5.8.3/Bio/Tools/Run/StandAloneBlast.pm:732 > STACK: Bio::Tools::Run::StandAloneBlast::_generic_local_blast > /usr/lib/perl5/site_perl/5.8.3/Bio/Tools/Run/StandAloneBlast.pm:680 > STACK: Bio::Tools::Run::StandAloneBlast::blastall > /usr/lib/perl5/site_perl/5.8.3/Bio/Tools/Run/StandAloneBlast.pm:536 > STACK: GFP_find.pl:85 > > thanks again, > James > > On Tue, 15 Mar 2005, Brian Osborne wrote: > >> James, >> >> On Unix? Windows? Cygwin? Something else? Also is "/path/to/blastall" the >> actual location of blastall? On a Linux machine this might be something > like >> "/usr/local/bin/blastall". We need to know a bit more. >> >> Brian O. >> >> >> >> -----Original Message----- >> From: bioperl-l-bounces@portal.open-bio.org >> [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of James Gregory >> Sent: Tuesday, March 15, 2005 4:08 PM >> To: bioperl-l@bioperl.org >> Subject: [Bioperl-l] cannot find path to blastall >> >> >> >> I'm trying to set up a standalone blast and I'm getting an error message >> that says >> >> MSG: cannot find path to blastall >> >> code: >> >> my @params = (program => 'blastp', >> database => 'db.psq'); >> my $factory = Bio::Tools::Run::StandAloneBlast->new(@params); >> my $path = '/path/to/blastall/'; >> $path = $factory->program_path($path); >> >> #BLAST >> my $blast_report = $factory->blastall($blast_seq); >> >> any help would be appreciated. >> >> James Gregory >> University of California, San Diego >> Department of Biological Sciences >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >> >> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From mingyi.liu at gpc-biotech.com Tue Mar 15 17:01:36 2005 From: mingyi.liu at gpc-biotech.com (Mingyi Liu) Date: Tue Mar 15 16:56:07 2005 Subject: [Bioperl-l] Error reporting/Validation implemented In-Reply-To: <423757A5.8050405@utk.edu> References: <4234564D.7010906@gpc-biotech.com> <97e5143a0b434ab721cd023556a1a5b2@dalkescientific.com> <4235FEF4.2070901@gpc-biotech.com> <423757A5.8050405@utk.edu> Message-ID: <42375B40.4070207@gpc-biotech.com> Stefan Kirov wrote: > Mingyi, > Few things: > I used your parser to produce Bioperl objects based on some of the > high level features and compared it ot what I have. Your parser is > considerably faster (about twice), but it is still hard to tell as I > am descending further in the hierarchy with mine. At the same time I > don't think the difference will vanish, so I will start building over > your parser to produce bioperl objects. I am not sure exactly how I am > going to deal with the relationships that are necessary, but I'll deal > with it when I finsih everything else. Hi, Stefan, Thanks for the comparison result! That was fast! Please let me know if you need some help using the data structure of my parser. I'll try to provide a skeleton code tonight for you (or maybe in the next couple of days since you're away anyway) that comes from my code that extracts all data (as far as I can tell) from Entrez Gene. This way although it still does not construct objects for you, at least it's going to be easier to find the stuff you want for object construction, which is definitely the toughest step of creating a bioperl parser for Entrez Gene. BTW, I just released version 1.04 with some simple improvements such as attempts (only on *NIX) to open file over 2 GB even if the perl version used does not support it (so that the file 'All_Data' to work for me without recompiling my Perl), 'file' option in 'new' method, etc. It's more convenient to use (check the "regex_parser_test.pl" in V1.04 for usage example), somewhat like SeqIO's usage (send in 'file' in new() and call next_seq to get next record). > > By the way it took 9 minutes on a 64 bit Xeon 3.4GHz even with > Bioperl objects construction on the whole Homo_sapiens ASN file. Thanks for sharing the benchmark! It's definitely faster than my Xeon 2.4 GHz. I just ran my parser V1.04 on the file All_Data that contains all Entrez Gene genomes (about 7.4 GB) and it took the parser 98 minutes to finish with no error found. > The data that went inside the objects was: general desc of the genes > (symbol, name, summary, etc.), organsism descr. but none of the truly > big parts. Unfortunately, I am leaving tomorrow for a conference, so I > will have some more next week earliest. Thanks for sharing the code! > Stefan Glad to be of help! Best, Mingyi From amtd9 at umr.edu Tue Mar 15 17:04:07 2005 From: amtd9 at umr.edu (Mane, Ajay (UMR-Student)) Date: Tue Mar 15 17:01:57 2005 Subject: [Bioperl-l] bl2seq of NCBI Message-ID: <58AF0CF509606A49B1770AB5DFF811CE110813@UMR-CMAIL1.umr.edu> Hi, I am new to bioperl. I want to use the bl2seq tool of NCBI giving the input query sequences in a perl script. I have gone through the documentation, but not clear how to start. Can anyone send a sample perl script which uses the bl2seq tool. What all needs to be installed. Thanks, Ajay ________________________________ From: bioperl-l-bounces@portal.open-bio.org on behalf of Stefan Kirov Sent: Tue 3/15/2005 3:46 PM To: Mingyi Liu Cc: bioperl-l@portal.open-bio.org; Andrew Dalke Subject: Re: [Bioperl-l] Error reporting/Validation implemented Mingyi, Few things: I used your parser to produce Bioperl objects based on some of the high level features and compared it ot what I have. Your parser is considerably faster (about twice), but it is still hard to tell as I am descending further in the hierarchy with mine. At the same time I don't think the difference will vanish, so I will start building over your parser to produce bioperl objects. I am not sure exactly how I am going to deal with the relationships that are necessary, but I'll deal with it when I finsih everything else. By the way it took 9 minutes on a 64 bit Xeon 3.4GHz even with Bioperl objects construction on the whole Homo_sapiens ASN file. The data that went inside the objects was: general desc of the genes (symbol, name, summary, etc.), organsism descr. but none of the truly big parts. Unfortunately, I am leaving tomorrow for a conference, so I will have some more next week earliest. Thanks for sharing the code! Stefan Mingyi Liu wrote: > Hi, there, > > I just implemented basic error reporting and validation > functionalities in my Entrez Gene parser in Perl (the regex version). > The validation will catch all non-conforming data, while error > reporting reports line number, error type, and the first 20 > (customizable) characters of the offending data (but the line number > could be incorrect if the format resulted in an exception, which is > hard to deal with for ASN.1-formatted data, although easy for XML > parsers). > The speed for the parser of course slowed down, but I'd say it'd still > beat most parsers hands down. The full human genome now takes a bit > over 12 minutes instead of 11 minutes to process on one Intel Xeon 2.4 > GHz CPU. So I don't think my parser's speed has much to do with > performing validation or not. > > I had also communicated with Stefan Kirov and turns out the dead > entries and 0-sized (should be 1-sized) arrays were simply related to > data trimming options. So far, so good. > > If anyone is interested, check it out at > http://www.sourceforge.net/projects/egparser/. > > Regards, > > Mingyi > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From diriano at rz.uni-potsdam.de Tue Mar 15 17:52:31 2005 From: diriano at rz.uni-potsdam.de (=?iso-8859-1?Q?Diego_Mauricio_Ria=F1o_Pach=F3n?=) Date: Tue Mar 15 17:48:18 2005 Subject: [Bioperl-l] cannot find path to blastall References: Message-ID: <002601c529b1$ac87d360$ac4bfea9@diegoriano> Hi James BLOSUM and PAM matrices should be in the data subdir of the main blast directory. I would say that the best it to kept all the blast executables together with your data dir in on directory, something like: blast_dir data In blast_dir you leave the execs, and in data the matrices, then add blast_dir to your path, in unix usung bash you do it like this: export PATH=$PATH:/path/to/blast_dir I hope this helps Diego _______________________________________ Diego Mauricio Riano Pachon Biologist Institute of Biology and Biochemistry Potsdam University Karl-Liebknecht-Str. 24-25 Haus 20 14476 Golm Germany Tel:0331/977-2809 http://www.geocities.com/dmrp.geo/ From charlesh at admin.stedwards.edu Tue Mar 15 18:44:16 2005 From: charlesh at admin.stedwards.edu (chauser) Date: Tue Mar 15 18:39:32 2005 Subject: [Bioperl-l] SeqIO - masked seqs Message-ID: <20aed013cebdc496f11e367760508075@admin.stedwards.edu> All, I ran into a glitch when reading sets of EST reads where some reads are masked in their entirety - i.e. all bases are X's. Is there a way to either modify the alphabet to accept X or some other solution? thanks, chuck ------------- EXCEPTION ------------- MSG: Got a sequence with no letters in it cannot guess alphabet [] STACK Bio::PrimarySeq::_guess_alphabet /usr/local/src/bioperl/core/Bio/PrimarySeq.pm:837 STACK Bio::Seq::SeqFastaSpeedFactory::create /usr/local/src/bioperl/core/Bio/Seq/SeqFastaSpeedFactory.pm:137 STACK Bio::SeqIO::fasta::next_seq /usr/local/src/bioperl/core/Bio/SeqIO/fasta.pm:143 STACK main::RAW ESTcleanup.pl:81 STACK toplevel ESTcleanup.pl:49 From lzhtom at hotmail.com Wed Mar 16 01:49:03 2005 From: lzhtom at hotmail.com (zhihua li) Date: Wed Mar 16 01:43:42 2005 Subject: [Bioperl-l] help on getting annotations Message-ID: Hi netter! I have a series of GenBank Accession Numbers(GB78091, GB90876,...) and wanna get as much information as possible about each items. I want to know their UniGene ID so that I can tell if there are redundancies among them; I want to get their gene descriptions or GO annotations so as to group them into functional groups; I want to know their KEGG pathway IDs so that I can tell which of them are in the same biological pathway, etc..... Of course I could submit the seris of accession numbers to each different database (GenBank, GO, KEGG...) and get their annotations respectively. But as the seris contains a large number of items, I think it's better to write a perl script (or use an existing bioperl function) to have it done automatically. Could anyone give me a hint about how i can write the script or use the corresponding bioperl function? I'm new to both perl and bioperl. Thanks a lot! _________________________________________________________________ ÓëÁª»úµÄÅóÓѽøÐн»Á÷£¬ÇëʹÓà MSN Messenger: http://messenger.msn.com/cn From Marc.Logghe at devgen.com Wed Mar 16 02:53:14 2005 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Wed Mar 16 02:48:18 2005 Subject: [Bioperl-l] cannot find path to blastall Message-ID: > executables together with your data dir in on directory, > something like: > blast_dir > data > In blast_dir you leave the execs, and in data the matrices, > then add blast_dir to your path, in unix usung bash you do it > like this: > export PATH=$PATH:/path/to/blast_dir You can say to blastall where to find the data files by setting the environmental variable BLASTMAT. If you are not sure what that should be, do a search for BLOSUM62. In my case it is in /usr/share/ncbi/data/. Then you do 'export BLASTMAT=/usr/share/ncbi/data/' Or you set it in you Perl script $ENV{'BLASTMAT'} = '/usr/share/ncbi/data/'; HTH, Marc From Marc.Logghe at devgen.com Wed Mar 16 03:16:09 2005 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Wed Mar 16 03:11:56 2005 Subject: [Bioperl-l] SeqIO - masked seqs Message-ID: > All, > > I ran into a glitch when reading sets of EST reads where some > reads are masked in their entirety - i.e. all bases are X's. > Is there a way to either modify the alphabet to accept X or > some other solution? I was not able to trace the actual fix. But there was a thread in december/january about that. In one of the last messages Nathan was about the fix this: http://bioperl.org/pipermail/bioperl-l/2005-January/017829.html Brian added a comment on this alphabet() issue. http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/SeqI O.pm?cvsroot=bioperl Have you tried bioperl release 1.5.0 or bioperl-release-1-5-0-rc2 ? Guess it should be fixed there. Is bioperl-release-1-5-0-rc2 behaving better than 1.5.0 related to the Bio::SeqFeatureI architecture ? Marc From sdavis2 at mail.nih.gov Wed Mar 16 06:32:56 2005 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Wed Mar 16 06:27:50 2005 Subject: [Bioperl-l] help on getting annotations In-Reply-To: References: Message-ID: <75f3e4f27a029fd291be8551157eecfc@mail.nih.gov> Try http://source.stanford.edu/cgi-bin/source/sourceSearch if you are using human, mouse, or rat. If not, then this will be a multi-step process (there isn't a bioperl function to do this--you will have to write some code). Sean On Mar 16, 2005, at 1:49 AM, zhihua li wrote: > Hi netter! > > I have a series of GenBank Accession Numbers(GB78091, GB90876,...) and > wanna get as much information as possible about each items. I want to > know their UniGene ID so that I can tell if there are redundancies > among them; I want to get their gene descriptions or GO annotations so > as to group them into functional groups; I want to know their KEGG > pathway IDs so that I can tell which of them are in the same > biological pathway, etc..... > > Of course I could submit the seris of accession numbers to each > different database (GenBank, GO, KEGG...) and get their annotations > respectively. But as the seris contains a large number of items, I > think it's better to write a perl script (or use an existing bioperl > function) to have it done automatically. > > Could anyone give me a hint about how i can write the script or use > the corresponding bioperl function? I'm new to both perl and bioperl. > > Thanks a lot! > > _________________________________________________________________ > ?????????????? MSN Messenger: http://messenger.msn.com/cn > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From sdavis2 at mail.nih.gov Wed Mar 16 06:38:34 2005 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Wed Mar 16 06:33:02 2005 Subject: [Bioperl-l] help on getting annotations In-Reply-To: References: Message-ID: <7a5ae3bc874b1fe959f80d01d60a4fbf@mail.nih.gov> A bit of an off-topic reply coming! I just noticed that you are also posting to the BioConductor list about some microarray-related issues. The BioConductor project has a package called AnnBuilder that will build a HUGE annotation package for you based on your genbank accession numbers. Then, you can have your annotation AND microarray data available via R without having to read various text files, etc. There are also R functions to perform all the usual statistical analyses (enriched ontology categories, KEGG pathways, etc.). If you are using R/Bioconductor to do your analyses, you should really look at AnnBuilder and annotate packages (and related GOStats). Sean On Mar 16, 2005, at 1:49 AM, zhihua li wrote: > Hi netter! > > I have a series of GenBank Accession Numbers(GB78091, GB90876,...) and > wanna get as much information as possible about each items. I want to > know their UniGene ID so that I can tell if there are redundancies > among them; I want to get their gene descriptions or GO annotations so > as to group them into functional groups; I want to know their KEGG > pathway IDs so that I can tell which of them are in the same > biological pathway, etc..... > > Of course I could submit the seris of accession numbers to each > different database (GenBank, GO, KEGG...) and get their annotations > respectively. But as the seris contains a large number of items, I > think it's better to write a perl script (or use an existing bioperl > function) to have it done automatically. > > Could anyone give me a hint about how i can write the script or use > the corresponding bioperl function? I'm new to both perl and bioperl. > > Thanks a lot! > > _________________________________________________________________ > ?????????????? MSN Messenger: http://messenger.msn.com/cn > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From kenji at vettatech.com Wed Mar 16 14:16:26 2005 From: kenji at vettatech.com (Leonardo Kenji Shikida) Date: Wed Mar 16 14:13:00 2005 Subject: [Bioperl-l] how to parse the GenPept sequence object to get the 'DBSOURCE' field Message-ID: <4238860A.8050705@vettatech.com> does anyone know how to parse the GenPept sequence object to get the 'DBSOURCE' field? e.g. human.protein.gpff LOCUS NP_000358 245 aa linear PRI 31-OCT-2000 DEFINITION thiopurine S-methyltransferase [Homo sapiens]. ACCESSION NP_000358 VERSION NP_000358.1 GI:4507653 DBSOURCE REFSEQ: accession NM_000367.1 <<== KEYWORDS . SOURCE Homo sapiens (human) I found no answer reading the docs, and there is the same unanswered question in this list archives at http://bioperl.org/pipermail/bioperl-l/2003-June/012438.html thanks in advance K. From khoueiry at lgpd.univ-mrs.fr Tue Mar 15 04:23:25 2005 From: khoueiry at lgpd.univ-mrs.fr (khoueiry) Date: Wed Mar 16 17:25:06 2005 Subject: [Bioperl-l] Xmfa2GFF Message-ID: <1110878606.888.3.camel@DavidLinux> Hello everybody, I want to know if there is a bioperl script that convert xmfa files format into a GFF format to use with Gbrowse. The idea is to create the GFF file to browse alignements with Gbrowse. Thanks... From charlesh at admin.stedwards.edu Wed Mar 16 08:48:46 2005 From: charlesh at admin.stedwards.edu (chauser) Date: Wed Mar 16 17:25:09 2005 Subject: [Bioperl-l] SeqIO - masked seqs In-Reply-To: References: Message-ID: Hi Marc, I updated to the current CVS and get the same error. If I tack on a single valid base to the offending clone(below) SeqIO reads it. # $Id: README,v 1.37 2005/03/01 16:56:02 amackey Exp $ o Version This is Bioperl version 1.5 from CVS HEAD >1115008E10.y1 CHROMAT_FILE: 1115008E10.y1 PHD_FILE: 1115008E10.y1.phd.1 CHEM: term XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXX ------------- EXCEPTION ------------- MSG: Got a sequence with no letters in it cannot guess alphabet [] STACK Bio::PrimarySeq::_guess_alphabet /usr/local/src/bioperl/core/Bio/PrimarySeq.pm:837 STACK Bio::Seq::SeqFastaSpeedFactory::create /usr/local/src/bioperl/core/Bio/Seq/SeqFastaSpeedFactory.pm:137 STACK Bio::SeqIO::fasta::next_seq /usr/local/src/bioperl/core/Bio/SeqIO/fasta.pm:143 STACK main::RAW ESTcount.pl:81 STACK toplevel ESTcount.pl:49 Chuck On Mar 16, 2005, at 2:16 AM, Marc Logghe wrote: > >> All, >> >> I ran into a glitch when reading sets of EST reads where some >> reads are masked in their entirety - i.e. all bases are X's. >> Is there a way to either modify the alphabet to accept X or >> some other solution? > > I was not able to trace the actual fix. But there was a thread in > december/january about that. > In one of the last messages Nathan was about the fix this: > http://bioperl.org/pipermail/bioperl-l/2005-January/017829.html > > Brian added a comment on this alphabet() issue. > http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/ > SeqI > O.pm?cvsroot=bioperl > Have you tried bioperl release 1.5.0 or bioperl-release-1-5-0-rc2 ? > Guess it should be fixed there. > Is bioperl-release-1-5-0-rc2 behaving better than 1.5.0 related to the > Bio::SeqFeatureI architecture ? > Marc > > -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 2538 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050316/975dbc52/attachment.bin From Marc.Logghe at devgen.com Thu Mar 17 03:29:25 2005 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Thu Mar 17 03:24:10 2005 Subject: [Bioperl-l] SeqIO - masked seqs Message-ID: Hi chuck, It seems to be fixed after all. The original problem was actually when you explicitely set the alphabet yourself, bioperl tries to guess the alphabet anyhow. Meaning, when you set the alphabet now, it will work. I tested it like this: #!/usr/bin/perl use strict; use Bio::SeqIO; my $in = Bio::SeqIO->new(-fh => \*DATA, -format => 'fasta'); $in->alphabet('dna'); # it fails when you comment out this line !!! my $out = Bio::SeqIO->new(-fh => \*STDOUT, -format => 'fasta'); my $seq = $in->next_seq; $out->write_seq($seq); __DATA__ >1115008E10.y1 CHROMAT_FILE: 1115008E10.y1 PHD_FILE: 1115008E10.y1.phd.1 CHEM: term XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXX HTH Marc ________________________________ From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of chauser Sent: Wednesday, March 16, 2005 2:49 PM To: Marc Logghe Cc: bioperl-l@portal.open-bio.org Subject: Re: [Bioperl-l] SeqIO - masked seqs Hi Marc, I updated to the current CVS and get the same error. If I tack on a single valid base to the offending clone(below) SeqIO reads it. # $Id: README,v 1.37 2005/03/01 16:56:02 amackey Exp $ o Version This is Bioperl version 1.5 from CVS HEAD >1115008E10.y1 CHROMAT_FILE: 1115008E10.y1 PHD_FILE: 1115008E10.y1.phd.1 CHEM: term XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXX ------------- EXCEPTION ------------- MSG: Got a sequence with no letters in it cannot guess alphabet [] STACK Bio::PrimarySeq::_guess_alphabet /usr/local/src/bioperl/core/Bio/PrimarySeq.pm:837 STACK Bio::Seq::SeqFastaSpeedFactory::create /usr/local/src/bioperl/core/Bio/Seq/SeqFastaSpeedFactory.pm:137 STACK Bio::SeqIO::fasta::next_seq /usr/local/src/bioperl/core/Bio/SeqIO/fasta.pm:143 STACK main::RAW ESTcount.pl:81 STACK toplevel ESTcount.pl:49 Chuck On Mar 16, 2005, at 2:16 AM, Marc Logghe wrote: All, I ran into a glitch when reading sets of EST reads where some reads are masked in their entirety - i.e. all bases are X's. Is there a way to either modify the alphabet to accept X or some other solution? I was not able to trace the actual fix. But there was a thread in december/january about that. In one of the last messages Nathan was about the fix this: http://bioperl.org/pipermail/bioperl-l/2005-January/017829.html Brian added a comment on this alphabet() issue. http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/SeqI O.pm?cvsroot=bioperl Have you tried bioperl release 1.5.0 or bioperl-release-1-5-0-rc2 ? Guess it should be fixed there. Is bioperl-release-1-5-0-rc2 behaving better than 1.5.0 related to the Bio::SeqFeatureI architecture ? Marc From nathanhaigh at ukonline.co.uk Thu Mar 17 04:06:28 2005 From: nathanhaigh at ukonline.co.uk (Nathan Haigh) Date: Thu Mar 17 04:01:33 2005 Subject: [Bioperl-l] SeqIO - masked seqs In-Reply-To: References: Message-ID: <42394894.80506@ukonline.co.uk> Without going back and double checking, i think this is how things stand with the current CVS (and probably the 1.5 release). There was a modification in the module that trys to guess the alphabet of the sequence in question (X was added to the set of characters that were removed from the sequence prior to attempting to guess the alphabet) this resulted in the error shown when you have a fully masked sequence. I think the fix i implemented was in Bio::SeqIO::fasta which allowed you to do set the alphabet manually thus not allowing Bioperl to guess the alphabet. soemthing like this should curcumvent this problem: $in = Bio::SeqIO->new(-file => "inputfilename" , -format => 'Fasta', -alphabet => 'dna'); Let us know how you get on Nathan chauser wrote: > Hi Marc, > > I updated to the current CVS and get the same error. If I tack on a > single valid base to the offending clone(below) SeqIO reads it. > > # $Id: README,v 1.37 2005/03/01 16:56:02 amackey Exp $ > > o Version > > This is Bioperl version 1.5 from CVS HEAD > > > > >1115008E10.y1 CHROMAT_FILE: 1115008E10.y1 PHD_FILE: > 1115008E10.y1.phd.1 CHEM: term > XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX > XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX > XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX > XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX > XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX > XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX > XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX > XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX > XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX > XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX > XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX > XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX > XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX > XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX > XXXXXXXXXXX > > > > ------------- EXCEPTION ------------- > MSG: Got a sequence with no letters in it cannot guess alphabet [] > STACK Bio::PrimarySeq::_guess_alphabet > /usr/local/src/bioperl/core/Bio/PrimarySeq.pm:837 > STACK Bio::Seq::SeqFastaSpeedFactory::create > /usr/local/src/bioperl/core/Bio/Seq/SeqFastaSpeedFactory.pm:137 > STACK Bio::SeqIO::fasta::next_seq > /usr/local/src/bioperl/core/Bio/SeqIO/fasta.pm:143 > STACK main::RAW ESTcount.pl:81 > STACK toplevel ESTcount.pl:49 > > > Chuck > > > > On Mar 16, 2005, at 2:16 AM, Marc Logghe wrote: > > > All, > > I ran into a glitch when reading sets of EST reads where some > reads are masked in their entirety - i.e. all bases are X's. > Is there a way to either modify the alphabet to accept X or > some other solution? > > > I was not able to trace the actual fix. But there was a thread in > december/january about that. > In one of the last messages Nathan was about the fix this: > http://bioperl.org/pipermail/bioperl-l/2005-January/017829.html > > Brian added a comment on this alphabet() issue. > http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/SeqI > > O.pm?cvsroot=bioperl > Have you tried bioperl release 1.5.0 or bioperl-release-1-5-0-rc2 ? > Guess it should be fixed there. > Is bioperl-release-1-5-0-rc2 behaving better than 1.5.0 related to > the > Bio::SeqFeatureI architecture ? > Marc > > >------------------------------------------------------------------------ > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l > From Anthony.Underwood at hpa.org.uk Thu Mar 17 05:17:47 2005 From: Anthony.Underwood at hpa.org.uk (SRMD, Col - Underwood, Anthony) Date: Thu Mar 17 05:28:58 2005 Subject: [Bioperl-l] Non-implemented methods and bioperl documentation/ContigAnalysis Message-ID: Hi Bioperlers The ContigAnaysis method "single_stand" is not implemented even though documented in the documentation. For methods that are not implemented should they not be highlighted as such within the documented so that people do not write code reliant on the method only to find that it throws an error saying this isn't implemented but it's not your fault. It would have been handier to know this at an earlier stage! Any thoughts? Anthony Dr Anthony Underwood Bioinformatics Group | Genomics, Proteomics and Bioinformatics Unit Centre for Infections Health Protection Agency 61 Colindale Avenue London NW9 5HT t: 0208 3276466 f: 0208 3276738 e:anthony.underwood@hpa.org.uk ----------------------------------------- ************************************************************************** The information contained in the EMail and any attachments is confidential and intended solely and for the attention and use of the named addressee(s). It may not be disclosed to any other person without the express authority of the HPA, or the intended recipient, or both. If you are not the intended recipient, you must not disclose, copy, distribute or retain this message or any part of it. This footnote also confirms that this EMail has been swept for computer viruses, but please re-sweep any attachments before opening or saving. HTTP://www.HPA.org.uk ************************************************************************** From brian_osborne at cognia.com Thu Mar 17 09:06:49 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Thu Mar 17 09:04:08 2005 Subject: [Bioperl-l] how to parse the GenPept sequence object to get the'DBSOURCE' field In-Reply-To: <4238860A.8050705@vettatech.com> Message-ID: K, I've added some code to SeqIO/genbank.pm that appears to work but I can't commit it until I ask the Bioperl designers a question. Namely, it appears that this DBSOURCE field is specific to Genbank Protein, so the work of creating the Annotation::SimpleValue should be in genbank.pm, not RichSeq.pm, right? Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Leonardo Kenji Shikida Sent: Wednesday, March 16, 2005 2:16 PM To: bioperl-l@portal.open-bio.org Subject: [Bioperl-l] how to parse the GenPept sequence object to get the'DBSOURCE' field does anyone know how to parse the GenPept sequence object to get the 'DBSOURCE' field? e.g. human.protein.gpff LOCUS NP_000358 245 aa linear PRI 31-OCT-2000 DEFINITION thiopurine S-methyltransferase [Homo sapiens]. ACCESSION NP_000358 VERSION NP_000358.1 GI:4507653 DBSOURCE REFSEQ: accession NM_000367.1 <<== KEYWORDS . SOURCE Homo sapiens (human) I found no answer reading the docs, and there is the same unanswered question in this list archives at http://bioperl.org/pipermail/bioperl-l/2003-June/012438.html thanks in advance K. _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From sasalacolla at libero.it Thu Mar 17 09:24:11 2005 From: sasalacolla at libero.it (sasalacolla@libero.it) Date: Thu Mar 17 09:19:09 2005 Subject: [Bioperl-l] help me with Bio-Tools-PSort-2.0.4 Message-ID: Hi Bioperler, please rescue me. I tried to use a local PSORTb program in $PSORT_ROOT/bin contained in the Psortb module: Bio-Tools-PSort-2.0.4 (available at http://www.psort.org/downloads/index.html), but I only got this error message: Fatal error: ------------- EXCEPTION ------------- MSG: blastall call crashed: -1 /usr/bin/blastall -p blastp -d /usr/local/psort/conf/analysis/sclblast/gramneg/sclblast -i /tmp/OsUPf63CDs -e 1e-09 -o /tmp/8JSpO7hLfz -F F STACK Bio::Tools::Run::StandAloneBlast::_runblast /usr/share/perl5/Bio/Tools/Run/StandAloneBlast.pm:732 STACK Bio::Tools::Run::StandAloneBlast::_generic_local_blast /usr/share/perl5/Bio/Tools/Run/StandAloneBlast.pm:680STACK Bio::Tools::Run::StandAloneBlast::blastall /usr/share/perl5/Bio/Tools/Run/StandAloneBlast.pm:536 STACK Bio::Tools::Run::SCLBlast::blast /usr/local/share/perl/5.8.4/Bio/Tools/Run/SCLBlast.pm:134 STACK Bio::Tools::PSort::Module::SCLBlast::run /usr/local/share/perl/5.8.4/Bio/Tools/PSort/Module/SCLBlast.pm:72 STACK Bio::Tools::PSort::Pathway::__ANON__ /usr/local/share/perl/5.8.4/Bio/Tools/PSort/Pathway.pm:194 STACK Bio::Tools::PSort::Pathway::traverse /usr/local/share/perl/5.8.4/Bio/Tools/PSort/Pathway.pm:157 STACK Bio::Tools::PSort::classify /usr/local/share/perl/5.8.4/Bio/Tools/PSort.pm:160 STACK (eval) /usr/local/bin/psort:318 STACK toplevel /usr/local/bin/psort:318 -------------------------------------- Reading the configuration instructions of this module i set the following environmental variables in bashrc: export PSORT_ROOT='/usr/local/psort' export PSORT_HMMTOP='/home/sandro/Inst_tools_Rev_Vacc/PSORTb/hmmtop2.1' export PSORT_PFTOOLS='/usr/bin' export BLASTDIR='/usr/bin' since blastall is located in my machine in /usr/bin/blastall. I tried to solve the problem setting in bashrc: export BLASTDATADIR='/conf/analysis/sclblast/gramneg'. Anyway nothing changed in the error message, and nothing changes even setting absurd addresses for BLASTDATADIR! Any suggestion? thank you all,guys ____________________________________________________________ 6X velocizzare la tua navigazione a 56k? 6X Web Accelerator di Libero! Scaricalo su INTERNET GRATIS 6X http://www.libero.it From sanges at biogem.it Thu Mar 17 09:49:58 2005 From: sanges at biogem.it (Remo Sanges) Date: Thu Mar 17 09:45:09 2005 Subject: [Bioperl-l] help me with Bio-Tools-PSort-2.0.4 In-Reply-To: References: Message-ID: <43c88d82c20dcbfe6c4ddbe3810fec9b@biogem.it> On Mar 17, 2005, at 3:24 PM, sasalacolla@@libero..it wrote: > Reading the configuration > instructions of this module i set the following environmental > variables in > bashrc: > > export PSORT_ROOT='/usr/local/psort' > export PSORT_HMMTOP='/home/sandro/Inst_tools_Rev_Vacc/PSORTb/hmmtop2.1' > export PSORT_PFTOOLS='/usr/bin' > export BLASTDIR='/usr/bin' > > since blastall is located in my machine in /usr/bin/blastall. > I tried to solve the problem setting in bashrc: > export BLASTDATADIR='/conf/analysis/sclblast/gramneg'. > Anyway nothing changed in the error message, and nothing changes even > setting > absurd addresses for BLASTDATADIR! > > Any suggestion? thank you all,guys Sorry my fault.... Our coffe' machine is broken... ;-) BLASTDATADIR should point to the 'data' directory of your blast installation, that one in which you have the matrixes used by blast binaries. HTH Remo From ewijaya at singnet.com.sg Wed Mar 16 15:47:43 2005 From: ewijaya at singnet.com.sg (Edward Wijaya) Date: Thu Mar 17 10:09:45 2005 Subject: [Bioperl-l] Getting IC & Consensus with Bio::Matrix::PSM::SiteMatrix - The Code In-Reply-To: <4235A3F9.6000208@utk.edu> References: <4235A3F9.6000208@utk.edu> <42358E7F.7020209@utk.edu> Message-ID: On Mon, 14 Mar 2005 22:47:21 +0800, Stefan Kirov wrote: >> > Sure, that would be great. Just send it and I will optimize it if I can > and put it in. But maybe it should go to Bio::Tools... Any thoughts from > anyone else? Stef, Sorry for the delay. Attached is the code that compute PWM and IC, given an array of strings. Hope it maybe useful. -- Edward WIJAYA Singapore -------------- next part -------------- A non-text attachment was scrubbed... Name: compute_ic_pwm.pl Type: application/octet-stream Size: 7075 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050317/de22092c/compute_ic_pwm.obj From faga at cshl.org Thu Mar 17 13:42:19 2005 From: faga at cshl.org (Ben Faga) Date: Thu Mar 17 16:25:14 2005 Subject: [Bioperl-l] Symlink on install Message-ID: <1111084939.6085.163.camel@ricotta> Hello everyone, I've replaced the bp_bulk_load_gff.pl with a script that takes the place of both itself (mysql version) and bp_pg_bulk_load_gff.pl (postgres version). Upon install of bioperl, I want to create a symbolic link from postgres version to bp_bulk_load_gff.pl so that this change will be transparent to people who have been using the postgres version. I have a working solution but I wouldn't mind hearing suggestions and critiques. The way that it works is on make, an external script symlink_scripts.pl gets created with all the necessary path info. In the postamble of Makefile.PL, I inserted a line to call the symlink_scripts.pl file. Then on install, symlink_scripts.pl is run and creates the symbolic link. I used the Perl symlink function to create the link. On systems where symlink doesn't work, it catches the error and prints a note to the user. That is untested though since I have only tested it on a fedora box. If all of this sounds good, I have a question about where I should place the symlink_scripts.PLS file. It has been suggested that I might put it in the maintenance directory. Any thoughts. Ben From lopaki at gmail.com Thu Mar 17 17:19:28 2005 From: lopaki at gmail.com (Scott Lambdin) Date: Thu Mar 17 17:21:09 2005 Subject: [Bioperl-l] Does BioPerl like mpiBlast? Message-ID: <529e768305031714193ab15b9d@mail.gmail.com> Help please. The scientists have found a blast job that eats all the user memory (~4Gigabytes) on the little 32-bit blast server I set up for them. I was looking at giving them mpiBLAST so that they can spread the database over some processes, but a requirement is to have the BLAST program usable by the BioPerl. Would it be hard for them to use mpiBLAST in BioPerl? That is, harder than using regular NCBI BLAST? --Scott From lopaki at gmail.com Thu Mar 17 17:19:28 2005 From: lopaki at gmail.com (Scott Lambdin) Date: Thu Mar 17 17:40:27 2005 Subject: [Bioperl-l] Does BioPerl like mpiBlast? Message-ID: <529e768305031714193ab15b9d@mail.gmail.com> Help please. The scientists have found a blast job that eats all the user memory (~4Gigabytes) on the little 32-bit blast server I set up for them. I was looking at giving them mpiBLAST so that they can spread the database over some processes, but a requirement is to have the BLAST program usable by the BioPerl. Would it be hard for them to use mpiBLAST in BioPerl? That is, harder than using regular NCBI BLAST? --Scott From hlapp at gmx.net Thu Mar 17 19:17:39 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu Mar 17 19:13:26 2005 Subject: [Bioperl-l] how to parse the GenPept sequence object to get the'DBSOURCE' field In-Reply-To: Message-ID: <233951CE-9743-11D9-8711-000A959EB4C4@gmx.net> Isn't this a dbxref? So, yes the work should be in genbank.pm but it should create a Bio::Annotation::DBLink object instead of a SimpleValue. DBLink will also properly represent version, accession, and database, instead of just a flat string. -hilmar On Thursday, March 17, 2005, at 06:06 AM, Brian Osborne wrote: > K, > > I've added some code to SeqIO/genbank.pm that appears to work but I > can't > commit it until I ask the Bioperl designers a question. Namely, it > appears > that this DBSOURCE field is specific to Genbank Protein, so the work of > creating the Annotation::SimpleValue should be in genbank.pm, not > RichSeq.pm, right? > > Brian O. > > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Leonardo > Kenji Shikida > Sent: Wednesday, March 16, 2005 2:16 PM > To: bioperl-l@portal.open-bio.org > Subject: [Bioperl-l] how to parse the GenPept sequence object to get > the'DBSOURCE' field > > > does anyone know how to parse the GenPept sequence object to get the > 'DBSOURCE' field? > > e.g. human.protein.gpff > > LOCUS NP_000358 245 aa linear PRI > 31-OCT-2000 > DEFINITION thiopurine S-methyltransferase [Homo sapiens]. > ACCESSION NP_000358 > VERSION NP_000358.1 GI:4507653 > DBSOURCE REFSEQ: accession NM_000367.1 <<== > KEYWORDS . > SOURCE Homo sapiens (human) > > I found no answer reading the docs, and there is the same unanswered > question in this list archives at > > http://bioperl.org/pipermail/bioperl-l/2003-June/012438.html > > thanks in advance > > K. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From s0460205 at sms.ed.ac.uk Fri Mar 18 06:44:56 2005 From: s0460205 at sms.ed.ac.uk (SG Edwards) Date: Fri Mar 18 06:39:51 2005 Subject: [Bioperl-l] Loading taxonomy data into BioSQL Message-ID: <1111146296.423abf385d6e9@sms.ed.ac.uk> Hi, Can you please help me with an error message? I have just installed a BioSQL database and am trying to run the load_ncbi_taxonomy.pl script to get taxonomy data into my database before I start to load sequences in. The database has been created and is empty, however, I get the following error message: Cannot open Local file taxdata/taxdump.tar.gz: No such file or directory at load_ncbi_taxonom.pl line 628 gunzip: taxdata/taxdump.tar.gz: No such file or directory sh: line 1: cd: taxdata: No such file or directory tar: taxdump.tar: cannot open: No such file or directory tar: error is not recoverable: exiting now loading NCBI taxon database in taxdata: ... retrieving all taxon nodes in the database ... reading in taxon nodes from nodes.dmp Couldn't open data file taxdata/nodes.dmp: No such file or directory rollback ineffective with AutoCommit enabled at load_ncbi_taxonomy.pl line 818. Use of uninitialized value in concatenation (.) or string at load_ncbi_taxonomy.pl line 820. rollback failed From brian_osborne at cognia.com Fri Mar 18 07:33:00 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Fri Mar 18 07:31:29 2005 Subject: [Bioperl-l] how to parse the GenPept sequence object to get the'DBSOURCE' field In-Reply-To: <233951CE-9743-11D9-8711-000A959EB4C4@gmx.net> Message-ID: Hilmar, Excellent. OK, I need some suggestions as to values, this is an annotation that I've never constructed. Here's an example: DATABASE GenBank PRIMARY_ID AAC12345 OPTIONAL_ID AAC12345.2 COMMENT: ? TAGNAME: dblink NAMESPACE: ? AUTHORITY: ? VERSION: 2 Brian O. -----Original Message----- From: Hilmar Lapp [mailto:hlapp@gmx.net] Sent: Thursday, March 17, 2005 7:18 PM To: Brian Osborne Cc: Leonardo Kenji Shikida; bioperl-l@portal.open-bio.org Subject: Re: [Bioperl-l] how to parse the GenPept sequence object to get the'DBSOURCE' field Isn't this a dbxref? So, yes the work should be in genbank.pm but it should create a Bio::Annotation::DBLink object instead of a SimpleValue. DBLink will also properly represent version, accession, and database, instead of just a flat string. -hilmar On Thursday, March 17, 2005, at 06:06 AM, Brian Osborne wrote: > K, > > I've added some code to SeqIO/genbank.pm that appears to work but I > can't > commit it until I ask the Bioperl designers a question. Namely, it > appears > that this DBSOURCE field is specific to Genbank Protein, so the work of > creating the Annotation::SimpleValue should be in genbank.pm, not > RichSeq.pm, right? > > Brian O. > > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Leonardo > Kenji Shikida > Sent: Wednesday, March 16, 2005 2:16 PM > To: bioperl-l@portal.open-bio.org > Subject: [Bioperl-l] how to parse the GenPept sequence object to get > the'DBSOURCE' field > > > does anyone know how to parse the GenPept sequence object to get the > 'DBSOURCE' field? > > e.g. human.protein.gpff > > LOCUS NP_000358 245 aa linear PRI > 31-OCT-2000 > DEFINITION thiopurine S-methyltransferase [Homo sapiens]. > ACCESSION NP_000358 > VERSION NP_000358.1 GI:4507653 > DBSOURCE REFSEQ: accession NM_000367.1 <<== > KEYWORDS . > SOURCE Homo sapiens (human) > > I found no answer reading the docs, and there is the same unanswered > question in this list archives at > > http://bioperl.org/pipermail/bioperl-l/2003-June/012438.html > > thanks in advance > > K. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From brian_osborne at cognia.com Fri Mar 18 08:18:31 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Fri Mar 18 08:14:23 2005 Subject: [Bioperl-l] Loading taxonomy data into BioSQL In-Reply-To: <1111146296.423abf385d6e9@sms.ed.ac.uk> Message-ID: SG, =head1 DESCRIPTION This script loads or updates a biosql schema with the NCBI Taxon Database. There are a number of options to do with where the biosql database is (i.e., database name, hostname, user for database, password, database name). This script may download the NCBI Taxon Database from the NCBI FTP server on-the-fly (ftp://ftp.ncbi.nih.gov/pub/taxonomy/). Otherwise it expects the files to be downloaded already. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of SG Edwards Sent: Friday, March 18, 2005 6:45 AM To: bioperl-l@portal.open-bio.org Subject: [Bioperl-l] Loading taxonomy data into BioSQL Hi, Can you please help me with an error message? I have just installed a BioSQL database and am trying to run the load_ncbi_taxonomy.pl script to get taxonomy data into my database before I start to load sequences in. The database has been created and is empty, however, I get the following error message: Cannot open Local file taxdata/taxdump.tar.gz: No such file or directory at load_ncbi_taxonom.pl line 628 gunzip: taxdata/taxdump.tar.gz: No such file or directory sh: line 1: cd: taxdata: No such file or directory tar: taxdump.tar: cannot open: No such file or directory tar: error is not recoverable: exiting now loading NCBI taxon database in taxdata: ... retrieving all taxon nodes in the database ... reading in taxon nodes from nodes.dmp Couldn't open data file taxdata/nodes.dmp: No such file or directory rollback ineffective with AutoCommit enabled at load_ncbi_taxonomy.pl line 818. Use of uninitialized value in concatenation (.) or string at load_ncbi_taxonomy.pl line 820. rollback failed _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From s0460205 at sms.ed.ac.uk Fri Mar 18 08:45:52 2005 From: s0460205 at sms.ed.ac.uk (SG Edwards) Date: Fri Mar 18 08:40:23 2005 Subject: [Bioperl-l] Loading taxonomy data into BioSQL In-Reply-To: References: Message-ID: <1111153552.423adb90e3e0d@sms.ed.ac.uk> I have been trying: perl load_ncbi_taxonomy.pl -dbname milk -driver Pg -dbuser s0460205 -dbpass password -download and this gave me the error message below. If I download the ncbi_taxonomy data manually it and direct the perl script to this using: perl load_ncbi_taxonomy.pl -dbname milk -driver Pg -dbuser s0460205 -dbpass password -directory /home/s0460205/ This seems to get a bit further but still results in error, "loading NCBI taxon database in /home/s0460205: ... retrieving all taxon nodes in the database ... reading in taxon nodes from nodes.dmp Couldn't open data file taxdata/nodes.dmp: No such file or directory rollback ineffective with AutoCommit enabled at load_ncbi_taxonomy.pl line 818. Use of uninitialized value in concatenation (.) or string at load_ncbi_taxonomy.pl line 820. rollback failed It seems to be choking on finding the nodes.dmp but I'm not sure why?! Quoting Brian Osborne : > SG, > > =head1 DESCRIPTION > > This script loads or updates a biosql schema with the NCBI Taxon > Database. There are a number of options to do with where the biosql > database is (i.e., database name, hostname, user for database, > password, database name). > > This script may download the NCBI Taxon Database from the NCBI FTP > server on-the-fly (ftp://ftp.ncbi.nih.gov/pub/taxonomy/). Otherwise it > expects the files to be downloaded already. > > > > Brian O. > > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of SG Edwards > Sent: Friday, March 18, 2005 6:45 AM > To: bioperl-l@portal.open-bio.org > Subject: [Bioperl-l] Loading taxonomy data into BioSQL > > > > > Hi, > > Can you please help me with an error message? I have just installed a BioSQL > database and am trying to run the load_ncbi_taxonomy.pl script to get > taxonomy > data into my database before I start to load sequences in. The database has > been created and is empty, however, I get the following error message: > > > Cannot open Local file taxdata/taxdump.tar.gz: No such file or directory at > load_ncbi_taxonom.pl line 628 > gunzip: taxdata/taxdump.tar.gz: No such file or directory > sh: line 1: cd: taxdata: No such file or directory > tar: taxdump.tar: cannot open: No such file or directory > tar: error is not recoverable: exiting now > loading NCBI taxon database in taxdata: > ... retrieving all taxon nodes in the database > ... reading in taxon nodes from nodes.dmp > Couldn't open data file taxdata/nodes.dmp: No such file or directory > rollback ineffective with AutoCommit enabled at load_ncbi_taxonomy.pl line > 818. > Use of uninitialized value in concatenation (.) or string at > load_ncbi_taxonomy.pl line 820. > rollback failed > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > From s0460205 at sms.ed.ac.uk Fri Mar 18 09:05:18 2005 From: s0460205 at sms.ed.ac.uk (SG Edwards) Date: Fri Mar 18 08:59:39 2005 Subject: [Bioperl-l] Loading taxonomy data into BioSQL In-Reply-To: <1111153552.423adb90e3e0d@sms.ed.ac.uk> References: <1111153552.423adb90e3e0d@sms.ed.ac.uk> Message-ID: <1111154718.423ae01e25896@sms.ed.ac.uk> I find that if I manually gunzip and tar the download from ncbi then the script finds the file nodes.dmp (N.B not sure if this is a fault with load_ncbi_taxonomy.pl or something with my system?!) The script then tries to load the data into the taxon table but the column "taxon_id" type is INTEGER but the script thinks it is varchar. So either need to change the database column to varchar or change the perl script to INTEGER. Has anyone had this problem?! Quoting s0460205@sms.ed.ac.uk: > I have been trying: > > perl load_ncbi_taxonomy.pl -dbname milk -driver Pg -dbuser s0460205 -dbpass > password -download > > and this gave me the error message below. > If I download the ncbi_taxonomy data manually it and direct the perl script > to > this using: > > perl load_ncbi_taxonomy.pl -dbname milk -driver Pg -dbuser s0460205 -dbpass > password -directory /home/s0460205/ > > This seems to get a bit further but still results in error, > > "loading NCBI taxon database in /home/s0460205: > ... retrieving all taxon nodes in the database > ... reading in taxon nodes from nodes.dmp > Couldn't open data file taxdata/nodes.dmp: No such file or directory > rollback ineffective with AutoCommit enabled at load_ncbi_taxonomy.pl line > 818. > Use of uninitialized value in concatenation (.) or string at > load_ncbi_taxonomy.pl line 820. > rollback failed > > It seems to be choking on finding the nodes.dmp but I'm not sure why?! > > > Quoting Brian Osborne : > > > SG, > > > > =head1 DESCRIPTION > > > > This script loads or updates a biosql schema with the NCBI Taxon > > Database. There are a number of options to do with where the biosql > > database is (i.e., database name, hostname, user for database, > > password, database name). > > > > This script may download the NCBI Taxon Database from the NCBI FTP > > server on-the-fly (ftp://ftp.ncbi.nih.gov/pub/taxonomy/). Otherwise it > > expects the files to be downloaded already. > > > > > > > > Brian O. > > > > -----Original Message----- > > From: bioperl-l-bounces@portal.open-bio.org > > [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of SG Edwards > > Sent: Friday, March 18, 2005 6:45 AM > > To: bioperl-l@portal.open-bio.org > > Subject: [Bioperl-l] Loading taxonomy data into BioSQL > > > > > > > > > > Hi, > > > > Can you please help me with an error message? I have just installed a > BioSQL > > database and am trying to run the load_ncbi_taxonomy.pl script to get > > taxonomy > > data into my database before I start to load sequences in. The database has > > been created and is empty, however, I get the following error message: > > > > > > Cannot open Local file taxdata/taxdump.tar.gz: No such file or directory at > > load_ncbi_taxonom.pl line 628 > > gunzip: taxdata/taxdump.tar.gz: No such file or directory > > sh: line 1: cd: taxdata: No such file or directory > > tar: taxdump.tar: cannot open: No such file or directory > > tar: error is not recoverable: exiting now > > loading NCBI taxon database in taxdata: > > ... retrieving all taxon nodes in the database > > ... reading in taxon nodes from nodes.dmp > > Couldn't open data file taxdata/nodes.dmp: No such file or directory > > rollback ineffective with AutoCommit enabled at load_ncbi_taxonomy.pl line > > 818. > > Use of uninitialized value in concatenation (.) or string at > > load_ncbi_taxonomy.pl line 820. > > rollback failed > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > From hlapp at gmx.net Fri Mar 18 09:17:34 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri Mar 18 09:12:18 2005 Subject: [Bioperl-l] Loading taxonomy data into BioSQL In-Reply-To: <1111154718.423ae01e25896@sms.ed.ac.uk> Message-ID: <7920B327-97B8-11D9-BAA9-000A959EB4C4@gmx.net> Why do you believe the script thinks that taxon_id is a varchar? It doesn't AFAIK. Also, not sure why your Pg (you are using PostgreSQL, right?) is in auto-commit mode. That doesn't sound right. -hilmar On Friday, March 18, 2005, at 06:05 AM, SG Edwards wrote: > I find that if I manually gunzip and tar the download from ncbi then > the script > finds the file nodes.dmp (N.B not sure if this is a fault with > load_ncbi_taxonomy.pl or something with my system?!) > > The script then tries to load the data into the taxon table but the > column > "taxon_id" type is INTEGER but the script thinks it is varchar. So > either need > to change the database column to varchar or change the perl script to > INTEGER. > > Has anyone had this problem?! > > > Quoting s0460205@sms.ed.ac.uk: > >> I have been trying: >> >> perl load_ncbi_taxonomy.pl -dbname milk -driver Pg -dbuser s0460205 >> -dbpass >> password -download >> >> and this gave me the error message below. >> If I download the ncbi_taxonomy data manually it and direct the perl >> script >> to >> this using: >> >> perl load_ncbi_taxonomy.pl -dbname milk -driver Pg -dbuser s0460205 >> -dbpass >> password -directory /home/s0460205/ >> >> This seems to get a bit further but still results in error, >> >> "loading NCBI taxon database in /home/s0460205: >> ... retrieving all taxon nodes in the database >> ... reading in taxon nodes from nodes.dmp >> Couldn't open data file taxdata/nodes.dmp: No such file or directory >> rollback ineffective with AutoCommit enabled at load_ncbi_taxonomy.pl >> line >> 818. >> Use of uninitialized value in concatenation (.) or string at >> load_ncbi_taxonomy.pl line 820. >> rollback failed >> >> It seems to be choking on finding the nodes.dmp but I'm not sure why?! >> >> >> Quoting Brian Osborne : >> >>> SG, >>> >>> =head1 DESCRIPTION >>> >>> This script loads or updates a biosql schema with the NCBI Taxon >>> Database. There are a number of options to do with where the biosql >>> database is (i.e., database name, hostname, user for database, >>> password, database name). >>> >>> This script may download the NCBI Taxon Database from the NCBI FTP >>> server on-the-fly (ftp://ftp.ncbi.nih.gov/pub/taxonomy/). Otherwise >>> it >>> expects the files to be downloaded already. >>> >>> >>> >>> Brian O. >>> >>> -----Original Message----- >>> From: bioperl-l-bounces@portal.open-bio.org >>> [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of SG Edwards >>> Sent: Friday, March 18, 2005 6:45 AM >>> To: bioperl-l@portal.open-bio.org >>> Subject: [Bioperl-l] Loading taxonomy data into BioSQL >>> >>> >>> >>> >>> Hi, >>> >>> Can you please help me with an error message? I have just installed a >> BioSQL >>> database and am trying to run the load_ncbi_taxonomy.pl script to get >>> taxonomy >>> data into my database before I start to load sequences in. The >>> database has >>> been created and is empty, however, I get the following error >>> message: >>> >>> >>> Cannot open Local file taxdata/taxdump.tar.gz: No such file or >>> directory at >>> load_ncbi_taxonom.pl line 628 >>> gunzip: taxdata/taxdump.tar.gz: No such file or directory >>> sh: line 1: cd: taxdata: No such file or directory >>> tar: taxdump.tar: cannot open: No such file or directory >>> tar: error is not recoverable: exiting now >>> loading NCBI taxon database in taxdata: >>> ... retrieving all taxon nodes in the database >>> ... reading in taxon nodes from nodes.dmp >>> Couldn't open data file taxdata/nodes.dmp: No such file or directory >>> rollback ineffective with AutoCommit enabled at >>> load_ncbi_taxonomy.pl line >>> 818. >>> Use of uninitialized value in concatenation (.) or string at >>> load_ncbi_taxonomy.pl line 820. >>> rollback failed >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >>> >> >> >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From s0460205 at sms.ed.ac.uk Fri Mar 18 09:25:24 2005 From: s0460205 at sms.ed.ac.uk (SG Edwards) Date: Fri Mar 18 09:20:49 2005 Subject: [Bioperl-l] Loading taxonomy data into BioSQL In-Reply-To: <7920B327-97B8-11D9-BAA9-000A959EB4C4@gmx.net> References: <7920B327-97B8-11D9-BAA9-000A959EB4C4@gmx.net> Message-ID: <1111155924.423ae4d4c422f@sms.ed.ac.uk> Thanks Hilmar, Yeah I am using Postgres, should I take it out of auto-commit mode? I thought the script deals with this but maybe not? If I run it with: perl load_ncbi_taxonomy.pl -dbname milk -driver Pg -dbuser s0460205 -dbpass password -directory /home/s0460205/ I get the error message: loading NCBI taxon database in /home/s0460205: ... retrieving all taxon nodes in the database ... reading in taxon nodes from nodes.dmp ... insert/update/delete taxon nodes failed to insert node (1;1;1;no rank;1;0): ERROR: column "taxon_id" is of type integer but expression is of type character varying HINT: You will need to rewrite or cast the expression Quoting Hilmar Lapp : > Why do you believe the script thinks that taxon_id is a varchar? It > doesn't AFAIK. > > Also, not sure why your Pg (you are using PostgreSQL, right?) is in > auto-commit mode. That doesn't sound right. > > -hilmar > > On Friday, March 18, 2005, at 06:05 AM, SG Edwards wrote: > > > I find that if I manually gunzip and tar the download from ncbi then > > the script > > finds the file nodes.dmp (N.B not sure if this is a fault with > > load_ncbi_taxonomy.pl or something with my system?!) > > > > The script then tries to load the data into the taxon table but the > > column > > "taxon_id" type is INTEGER but the script thinks it is varchar. So > > either need > > to change the database column to varchar or change the perl script to > > INTEGER. > > > > Has anyone had this problem?! > > > > > > Quoting s0460205@sms.ed.ac.uk: > > > >> I have been trying: > >> > >> perl load_ncbi_taxonomy.pl -dbname milk -driver Pg -dbuser s0460205 > >> -dbpass > >> password -download > >> > >> and this gave me the error message below. > >> If I download the ncbi_taxonomy data manually it and direct the perl > >> script > >> to > >> this using: > >> > >> perl load_ncbi_taxonomy.pl -dbname milk -driver Pg -dbuser s0460205 > >> -dbpass > >> password -directory /home/s0460205/ > >> > >> This seems to get a bit further but still results in error, > >> > >> "loading NCBI taxon database in /home/s0460205: > >> ... retrieving all taxon nodes in the database > >> ... reading in taxon nodes from nodes.dmp > >> Couldn't open data file taxdata/nodes.dmp: No such file or directory > >> rollback ineffective with AutoCommit enabled at load_ncbi_taxonomy.pl > >> line > >> 818. > >> Use of uninitialized value in concatenation (.) or string at > >> load_ncbi_taxonomy.pl line 820. > >> rollback failed > >> > >> It seems to be choking on finding the nodes.dmp but I'm not sure why?! > >> > >> > >> Quoting Brian Osborne : > >> > >>> SG, > >>> > >>> =head1 DESCRIPTION > >>> > >>> This script loads or updates a biosql schema with the NCBI Taxon > >>> Database. There are a number of options to do with where the biosql > >>> database is (i.e., database name, hostname, user for database, > >>> password, database name). > >>> > >>> This script may download the NCBI Taxon Database from the NCBI FTP > >>> server on-the-fly (ftp://ftp.ncbi.nih.gov/pub/taxonomy/). Otherwise > >>> it > >>> expects the files to be downloaded already. > >>> > >>> > >>> > >>> Brian O. > >>> > >>> -----Original Message----- > >>> From: bioperl-l-bounces@portal.open-bio.org > >>> [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of SG Edwards > >>> Sent: Friday, March 18, 2005 6:45 AM > >>> To: bioperl-l@portal.open-bio.org > >>> Subject: [Bioperl-l] Loading taxonomy data into BioSQL > >>> > >>> > >>> > >>> > >>> Hi, > >>> > >>> Can you please help me with an error message? I have just installed a > >> BioSQL > >>> database and am trying to run the load_ncbi_taxonomy.pl script to get > >>> taxonomy > >>> data into my database before I start to load sequences in. The > >>> database has > >>> been created and is empty, however, I get the following error > >>> message: > >>> > >>> > >>> Cannot open Local file taxdata/taxdump.tar.gz: No such file or > >>> directory at > >>> load_ncbi_taxonom.pl line 628 > >>> gunzip: taxdata/taxdump.tar.gz: No such file or directory > >>> sh: line 1: cd: taxdata: No such file or directory > >>> tar: taxdump.tar: cannot open: No such file or directory > >>> tar: error is not recoverable: exiting now > >>> loading NCBI taxon database in taxdata: > >>> ... retrieving all taxon nodes in the database > >>> ... reading in taxon nodes from nodes.dmp > >>> Couldn't open data file taxdata/nodes.dmp: No such file or directory > >>> rollback ineffective with AutoCommit enabled at > >>> load_ncbi_taxonomy.pl line > >>> 818. > >>> Use of uninitialized value in concatenation (.) or string at > >>> load_ncbi_taxonomy.pl line 820. > >>> rollback failed > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l@portal.open-bio.org > >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l > >>> > >>> > >>> > >>> > >> > >> > >> > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp at gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > > > From vaughn at cshl.org Fri Mar 18 06:55:20 2005 From: vaughn at cshl.org (Matthew Vaughn) Date: Fri Mar 18 13:32:57 2005 Subject: [Bioperl-l] How to express 'histogram' data in GFF3 Message-ID: <9A6B282E-97A4-11D9-A08F-000A95A26D06@cshl.org> OK, I've bashed my head against this and have come up short, so now I'm asking for help. Recently, I decided to upgrade my development system to BioPerl 1.5 and bring all my code up to GFF3 compliance. This of course, includes code that generates GFF files for loading into our local Generic Genome Browser (1.62). The problem comes when I try to express histogram data. In the past, rows like this worked fine as GFF2 "ChrII rev1 poly1 1591004 1591068 464.835 - . poly1 ChrII:rev1" but this is invalid for GFF3. As far as I can figure from interpreting the GFF3 spec, the same record should look something like this "ChrII rev1 poly1 1591004 1591068 464.835 - . ID=poly1%3AChrII%3Arev1" But this violates the GFF3 spec in that ID is now non-unique. Rows formatted thusly also fail to display any histogram data in my browser. I've considered loading the array data as GFF2 and my annotation data as GFF3, but that seems, well, inelegant (plus I don't even know if that will work) Any input will be very much appreciated! Matt -- Matthew W. Vaughn, Ph.D. Cold Spring Harbor Laboratory Delbruck Laboratory / Martienssen Group 1 Bungtown Road Cold Spring Harbor, NY 11724 phone: (516) 367-8469 -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2359 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050318/3934a693/smime-0001.bin From hlapp at gmx.net Fri Mar 18 23:14:24 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri Mar 18 23:11:14 2005 Subject: [Bioperl-l] how to parse the GenPept sequence object to get the'DBSOURCE' field In-Reply-To: Message-ID: <60CF6110-982D-11D9-AB95-000A959EB4C4@gmx.net> On Friday, March 18, 2005, at 04:33 AM, Brian Osborne wrote: > Hilmar, > > Excellent. OK, I need some suggestions as to values, this is an > annotation > that I've never constructed. Here's an example: > > DATABASE GenBank > > PRIMARY_ID AAC12345 > > OPTIONAL_ID AAC12345.2 No, leave blank - it is meant for cases where it is really different from the primary_id. > > COMMENT: ? right, undef > > TAGNAME: dblink Correct. > > NAMESPACE: ? Ignore. I believe it defaults to database automagically. > > AUTHORITY: ? right, undef > > VERSION: 2 right. Cheers, -hilmar > > > Brian O. > > > -----Original Message----- > From: Hilmar Lapp [mailto:hlapp@gmx.net] > Sent: Thursday, March 17, 2005 7:18 PM > To: Brian Osborne > Cc: Leonardo Kenji Shikida; bioperl-l@portal.open-bio.org > Subject: Re: [Bioperl-l] how to parse the GenPept sequence object to > get > the'DBSOURCE' field > > > Isn't this a dbxref? So, yes the work should be in genbank.pm but it > should create a Bio::Annotation::DBLink object instead of a > SimpleValue. DBLink will also properly represent version, accession, > and database, instead of just a flat string. > > -hilmar > > On Thursday, March 17, 2005, at 06:06 AM, Brian Osborne wrote: > >> K, >> >> I've added some code to SeqIO/genbank.pm that appears to work but I >> can't >> commit it until I ask the Bioperl designers a question. Namely, it >> appears >> that this DBSOURCE field is specific to Genbank Protein, so the work >> of >> creating the Annotation::SimpleValue should be in genbank.pm, not >> RichSeq.pm, right? >> >> Brian O. >> >> -----Original Message----- >> From: bioperl-l-bounces@portal.open-bio.org >> [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Leonardo >> Kenji Shikida >> Sent: Wednesday, March 16, 2005 2:16 PM >> To: bioperl-l@portal.open-bio.org >> Subject: [Bioperl-l] how to parse the GenPept sequence object to get >> the'DBSOURCE' field >> >> >> does anyone know how to parse the GenPept sequence object to get the >> 'DBSOURCE' field? >> >> e.g. human.protein.gpff >> >> LOCUS NP_000358 245 aa linear PRI >> 31-OCT-2000 >> DEFINITION thiopurine S-methyltransferase [Homo sapiens]. >> ACCESSION NP_000358 >> VERSION NP_000358.1 GI:4507653 >> DBSOURCE REFSEQ: accession NM_000367.1 <<== >> KEYWORDS . >> SOURCE Homo sapiens (human) >> >> I found no answer reading the docs, and there is the same unanswered >> question in this list archives at >> >> http://bioperl.org/pipermail/bioperl-l/2003-June/012438.html >> >> thanks in advance >> >> K. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >> > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp at gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > > > > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From brian_osborne at cognia.com Fri Mar 18 23:24:40 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Fri Mar 18 23:21:27 2005 Subject: [Bioperl-l] how to parse the GenPept sequence object to get the'DBSOURCE' field In-Reply-To: <60CF6110-982D-11D9-AB95-000A959EB4C4@gmx.net> Message-ID: Hilmar and K, OK, it seems to read and write properly, I'll commit. Give it a try K. Brian O. -----Original Message----- From: Hilmar Lapp [mailto:hlapp@gmx.net] Sent: Friday, March 18, 2005 11:14 PM To: Brian Osborne Cc: Leonardo Kenji Shikida; bioperl-l@portal.open-bio.org Subject: Re: [Bioperl-l] how to parse the GenPept sequence object to get the'DBSOURCE' field On Friday, March 18, 2005, at 04:33 AM, Brian Osborne wrote: > Hilmar, > > Excellent. OK, I need some suggestions as to values, this is an > annotation > that I've never constructed. Here's an example: > > DATABASE GenBank > > PRIMARY_ID AAC12345 > > OPTIONAL_ID AAC12345.2 No, leave blank - it is meant for cases where it is really different from the primary_id. > > COMMENT: ? right, undef > > TAGNAME: dblink Correct. > > NAMESPACE: ? Ignore. I believe it defaults to database automagically. > > AUTHORITY: ? right, undef > > VERSION: 2 right. Cheers, -hilmar > > > Brian O. > > > -----Original Message----- > From: Hilmar Lapp [mailto:hlapp@gmx.net] > Sent: Thursday, March 17, 2005 7:18 PM > To: Brian Osborne > Cc: Leonardo Kenji Shikida; bioperl-l@portal.open-bio.org > Subject: Re: [Bioperl-l] how to parse the GenPept sequence object to > get > the'DBSOURCE' field > > > Isn't this a dbxref? So, yes the work should be in genbank.pm but it > should create a Bio::Annotation::DBLink object instead of a > SimpleValue. DBLink will also properly represent version, accession, > and database, instead of just a flat string. > > -hilmar > > On Thursday, March 17, 2005, at 06:06 AM, Brian Osborne wrote: > >> K, >> >> I've added some code to SeqIO/genbank.pm that appears to work but I >> can't >> commit it until I ask the Bioperl designers a question. Namely, it >> appears >> that this DBSOURCE field is specific to Genbank Protein, so the work >> of >> creating the Annotation::SimpleValue should be in genbank.pm, not >> RichSeq.pm, right? >> >> Brian O. >> >> -----Original Message----- >> From: bioperl-l-bounces@portal.open-bio.org >> [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Leonardo >> Kenji Shikida >> Sent: Wednesday, March 16, 2005 2:16 PM >> To: bioperl-l@portal.open-bio.org >> Subject: [Bioperl-l] how to parse the GenPept sequence object to get >> the'DBSOURCE' field >> >> >> does anyone know how to parse the GenPept sequence object to get the >> 'DBSOURCE' field? >> >> e.g. human.protein.gpff >> >> LOCUS NP_000358 245 aa linear PRI >> 31-OCT-2000 >> DEFINITION thiopurine S-methyltransferase [Homo sapiens]. >> ACCESSION NP_000358 >> VERSION NP_000358.1 GI:4507653 >> DBSOURCE REFSEQ: accession NM_000367.1 <<== >> KEYWORDS . >> SOURCE Homo sapiens (human) >> >> I found no answer reading the docs, and there is the same unanswered >> question in this list archives at >> >> http://bioperl.org/pipermail/bioperl-l/2003-June/012438.html >> >> thanks in advance >> >> K. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >> > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp at gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > > > > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gmx.net Fri Mar 18 23:40:33 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri Mar 18 23:35:32 2005 Subject: [Bioperl-l] Loading taxonomy data into BioSQL In-Reply-To: <1111155924.423ae4d4c422f@sms.ed.ac.uk> Message-ID: <07F7F769-9831-11D9-AB95-000A959EB4C4@gmx.net> On Friday, March 18, 2005, at 06:25 AM, SG Edwards wrote: > Thanks Hilmar, > > Yeah I am using Postgres, should I take it out of auto-commit mode? I > thought > the script deals with this but maybe not? Why would you ever want to run a database in auto-commit mode unless that's the only option you have like with mysql? If you run this in auto-commit mode the users will see a totally inconsistent state for possibly more than half an hour. The script goes to great lengths not to leave the transction unless it really doesn't know any better. > > If I run it with: > > perl load_ncbi_taxonomy.pl -dbname milk -driver Pg -dbuser s0460205 > -dbpass > password -directory /home/s0460205/ > > I get the error message: > > loading NCBI taxon database in /home/s0460205: > ... retrieving all taxon nodes in the database > ... reading in taxon nodes from nodes.dmp > ... insert/update/delete taxon nodes > failed to insert node (1;1;1;no rank;1;0): ERROR: column "taxon_id" is > of type > integer but expression is of type character varying > HINT: You will need to rewrite or cast the expression OK this is the piece that reveals it. It's a bug in DBD::Pg 1.40 against 8.0x PostgreSQL servers. Check here for the thread, a fix is in preparation but apparently doesn't fully catch it yet. http://gborg.postgresql.org/pipermail/dbdpg-general/2005-March/ 001514.html Maybe 1.41 is out already? Or you can downgrade Pg to 7.4.x? Or wait until the DBD::Pg people fixed it? In any event, beyond our control. -hilmar > > > Quoting Hilmar Lapp : > >> Why do you believe the script thinks that taxon_id is a varchar? It >> doesn't AFAIK. >> >> Also, not sure why your Pg (you are using PostgreSQL, right?) is in >> auto-commit mode. That doesn't sound right. >> >> -hilmar >> >> On Friday, March 18, 2005, at 06:05 AM, SG Edwards wrote: >> >>> I find that if I manually gunzip and tar the download from ncbi then >>> the script >>> finds the file nodes.dmp (N.B not sure if this is a fault with >>> load_ncbi_taxonomy.pl or something with my system?!) >>> >>> The script then tries to load the data into the taxon table but the >>> column >>> "taxon_id" type is INTEGER but the script thinks it is varchar. So >>> either need >>> to change the database column to varchar or change the perl script to >>> INTEGER. >>> >>> Has anyone had this problem?! >>> >>> >>> Quoting s0460205@sms.ed.ac.uk: >>> >>>> I have been trying: >>>> >>>> perl load_ncbi_taxonomy.pl -dbname milk -driver Pg -dbuser s0460205 >>>> -dbpass >>>> password -download >>>> >>>> and this gave me the error message below. >>>> If I download the ncbi_taxonomy data manually it and direct the perl >>>> script >>>> to >>>> this using: >>>> >>>> perl load_ncbi_taxonomy.pl -dbname milk -driver Pg -dbuser s0460205 >>>> -dbpass >>>> password -directory /home/s0460205/ >>>> >>>> This seems to get a bit further but still results in error, >>>> >>>> "loading NCBI taxon database in /home/s0460205: >>>> ... retrieving all taxon nodes in the database >>>> ... reading in taxon nodes from nodes.dmp >>>> Couldn't open data file taxdata/nodes.dmp: No such file or directory >>>> rollback ineffective with AutoCommit enabled at >>>> load_ncbi_taxonomy.pl >>>> line >>>> 818. >>>> Use of uninitialized value in concatenation (.) or string at >>>> load_ncbi_taxonomy.pl line 820. >>>> rollback failed >>>> >>>> It seems to be choking on finding the nodes.dmp but I'm not sure >>>> why?! >>>> >>>> >>>> Quoting Brian Osborne : >>>> >>>>> SG, >>>>> >>>>> =head1 DESCRIPTION >>>>> >>>>> This script loads or updates a biosql schema with the NCBI Taxon >>>>> Database. There are a number of options to do with where the biosql >>>>> database is (i.e., database name, hostname, user for database, >>>>> password, database name). >>>>> >>>>> This script may download the NCBI Taxon Database from the NCBI FTP >>>>> server on-the-fly (ftp://ftp.ncbi.nih.gov/pub/taxonomy/). Otherwise >>>>> it >>>>> expects the files to be downloaded already. >>>>> >>>>> >>>>> >>>>> Brian O. >>>>> >>>>> -----Original Message----- >>>>> From: bioperl-l-bounces@portal.open-bio.org >>>>> [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of SG >>>>> Edwards >>>>> Sent: Friday, March 18, 2005 6:45 AM >>>>> To: bioperl-l@portal.open-bio.org >>>>> Subject: [Bioperl-l] Loading taxonomy data into BioSQL >>>>> >>>>> >>>>> >>>>> >>>>> Hi, >>>>> >>>>> Can you please help me with an error message? I have just >>>>> installed a >>>> BioSQL >>>>> database and am trying to run the load_ncbi_taxonomy.pl script to >>>>> get >>>>> taxonomy >>>>> data into my database before I start to load sequences in. The >>>>> database has >>>>> been created and is empty, however, I get the following error >>>>> message: >>>>> >>>>> >>>>> Cannot open Local file taxdata/taxdump.tar.gz: No such file or >>>>> directory at >>>>> load_ncbi_taxonom.pl line 628 >>>>> gunzip: taxdata/taxdump.tar.gz: No such file or directory >>>>> sh: line 1: cd: taxdata: No such file or directory >>>>> tar: taxdump.tar: cannot open: No such file or directory >>>>> tar: error is not recoverable: exiting now >>>>> loading NCBI taxon database in taxdata: >>>>> ... retrieving all taxon nodes in the database >>>>> ... reading in taxon nodes from nodes.dmp >>>>> Couldn't open data file taxdata/nodes.dmp: No such file or >>>>> directory >>>>> rollback ineffective with AutoCommit enabled at >>>>> load_ncbi_taxonomy.pl line >>>>> 818. >>>>> Use of uninitialized value in concatenation (.) or string at >>>>> load_ncbi_taxonomy.pl line 820. >>>>> rollback failed >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l@portal.open-bio.org >>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> -- >> ------------------------------------------------------------- >> Hilmar Lapp email: lapp at gnf.org >> GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 >> ------------------------------------------------------------- >> >> >> > > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From cain at cshl.edu Sat Mar 19 09:23:54 2005 From: cain at cshl.edu (Scott Cain) Date: Sat Mar 19 09:18:52 2005 Subject: [Bioperl-l] How to express 'histogram' data in GFF3 In-Reply-To: <9A6B282E-97A4-11D9-A08F-000A95A26D06@cshl.org> References: <9A6B282E-97A4-11D9-A08F-000A95A26D06@cshl.org> Message-ID: <1111242234.3557.0.camel@localhost.localdomain> Matt, First, let me start by saying there are some "unexplored" areas of GFF3 and GBrowse (at least, they are unexplored for me). While I haven't test this, it should work fine. What you can do is create one parent feature that encapsulates the entire range, and then have the data points be lines of the parent: ChrII rev1 region 1 2000000 . . . ID=poly1%3AChrII%3Arev1 ChrII rev1 poly1 1591004 1591068 464.835 - . Parent=poly1%3AChrII%3Arev1 Now whether this will work in a GFF database with GBrowse currently is an open question (like I said, I haven't tested it); I know it would work in a chado database and GBrowse. You might need a custom aggregator to make it work in a GFF database. On the other hand, I'm not convinced that having all the lines with the same ID violates the GFF3 spec, as you could probably view this as one big feature of the whole range, and therefore the ID applies to that one feature, not to the individual pieces that make of the lines of GFF. If you want, you can send me a small sample set of data and I'll see what I can do. Scott On Fri, 2005-03-18 at 06:55 -0500, Matthew Vaughn wrote: > OK, I've bashed my head against this and have come up short, so now I'm > asking for help. Recently, I decided to upgrade my development system > to BioPerl 1.5 and bring all my code up to GFF3 compliance. This of > course, includes code that generates GFF files for loading into our > local Generic Genome Browser (1.62). > > The problem comes when I try to express histogram data. In the past, > rows like this worked fine as GFF2 > > "ChrII rev1 poly1 1591004 1591068 464.835 - . poly1 ChrII:rev1" > > but this is invalid for GFF3. As far as I can figure from interpreting > the GFF3 spec, the same record should look something like this > > "ChrII rev1 poly1 1591004 1591068 464.835 - . ID=poly1%3AChrII%3Arev1" > > But this violates the GFF3 spec in that ID is now non-unique. Rows > formatted thusly also fail to display any histogram data in my browser. > > I've considered loading the array data as GFF2 and my annotation data > as GFF3, but that seems, well, inelegant (plus I don't even know if > that will work) > > Any input will be very much appreciated! > > Matt > > -- > Matthew W. Vaughn, Ph.D. > Cold Spring Harbor Laboratory > Delbruck Laboratory / Martienssen Group > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > > phone: (516) 367-8469 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain@cshl.org GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From brian_osborne at cognia.com Sat Mar 19 10:29:33 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Sat Mar 19 10:39:23 2005 Subject: [Bioperl-l] Symlink on install In-Reply-To: <1111084939.6085.163.camel@ricotta> Message-ID: Ben, Yes, maintenance/ sounds reasonable. What a surprise! ;-) Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Ben Faga Sent: Thursday, March 17, 2005 1:42 PM To: bioperl-l@bioperl.org Subject: [Bioperl-l] Symlink on install Hello everyone, I've replaced the bp_bulk_load_gff.pl with a script that takes the place of both itself (mysql version) and bp_pg_bulk_load_gff.pl (postgres version). Upon install of bioperl, I want to create a symbolic link from postgres version to bp_bulk_load_gff.pl so that this change will be transparent to people who have been using the postgres version. I have a working solution but I wouldn't mind hearing suggestions and critiques. The way that it works is on make, an external script symlink_scripts.pl gets created with all the necessary path info. In the postamble of Makefile.PL, I inserted a line to call the symlink_scripts.pl file. Then on install, symlink_scripts.pl is run and creates the symbolic link. I used the Perl symlink function to create the link. On systems where symlink doesn't work, it catches the error and prints a note to the user. That is untested though since I have only tested it on a fedora box. If all of this sounds good, I have a question about where I should place the symlink_scripts.PLS file. It has been suggested that I might put it in the maintenance directory. Any thoughts. Ben _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From glim at mycybernet.net Sat Mar 19 15:05:00 2005 From: glim at mycybernet.net (glim@mycybernet.net) Date: Sat Mar 19 15:27:21 2005 Subject: [Bioperl-l] Yet Another Perl Conference, North America, 2005 Registration now open Message-ID: ----------> Yet Another Perl Conference, North America, 2005 Registration now open. Conference dates: Monday - Wednesday 27 - 29 June 2005 Location: 89 Chestnut Street http://89chestnut.com/ University of Toronto Toronto, Ontario, Canada Info at: http://yapc.org/America Direct registration: http://donate.perlfoundation.org/index.pl?node=registrant%20info&conference_id=423 Full registration fee $85 (USD) Book now for great deals on accommodations and ensure a space for yourself. Speaking slots are still open. If you would like to present at YAPC::NA 2005, see: http://yapc.org/America/cfp-2005.shtml Details of this announcement: http://yapc.org/America/registration-announcement-2005.txt <---------- More Details ============ Registration for YAPC::NA (Yet Another Perl Conference, North America) 2005 in Toronto, Ontario, Canada is now open. The conference registration price is USD$85. This price includes admission to all aspects of the conference, respectable amounts of catering, several activities and a few conference goodies. The YAPC North America 2005 conference features... * Fantastic speakers + most are the core creators of the technology on which they present + many are professional IT authors, trainers and conference speakers * An excellent learning opportunity * A chance to meet Perl professionals from all over North America and the world + YAPC attendees tend to be very involved in Perl and so are another great way to learn more about what the language has to offer beyond just what the speakers have to say * Extra-curricular / after hours activities * A great location in downtown Toronto All this, and the price is more than an order of magnitude cheaper than what commercial conferences can offer. This is because YAPC is a 100% volunteer effort, both from its organizers and its speakers. Quality is *not* sacrificed to achieve this stunning level of affordability. YAPC provides the best value-for-dollar in IT conferences. And it's a ton of fun, too. The dates of the conference are Monday - Wednesday 27-29 June 2005. The location is 89 Chestnut Street in downtown Toronto, Ontario, Canada. (Note that a different date block was previously announced; we moved the conference date to accommodate venue availability.) http://89chestnut.com/ -- a facility within the University of Toronto If you are at all interested in attending the conference... Book now! Book now! Book now! We have room for about 400 attendees and we hope to sell out well in advance of the late June conference date. However, the critical matter is that of hotels. The YAPC::NA 2005 organizers have made group arrangements with several facilities around the city to provide _excellent_ quality accommodations in _very_ convenient locations at _terrific_ prices for the _full_ capacity of conference attendees (around 400 people). (Finding, booking and paying accommodations is the responsibility of the attendees, but we will provide you with a list of the hotels and university dorms to try first based on our group arrangement with them when you register for the conference. Also, see the web site at http://yapc.org/America/accommodations-2005.shtml. More details will be up shortly. The dorm option will be approx. C$55/night, the hotel options will be more like C$90/night, and for slightly different prices there will be options for putting more than 1 person in a room. Exact details and how to book will be emailed directly to people who have registered for the conference as soon as they become available.) *The catch is -- book now!!* The group reservations will expire in early May, at which point in time the group rates will mostly still apply, but the rooms will be given out on an "availability basis". Which means that someone else outside of the YAPC group can book the rooms as well. Make no mistake -- the rooms *will* be sold. Toronto is a very active conference city in the summer and there will be _no_ guarantee of vacancies either at the facilities we made arrangements with or anywhere else in the city if you leave it to within 6 weeks of the conference date. So, if you want to save yourself the likely-fruitless headache of scrambling around looking for accommodations at the last minute, Book now! Book now! Book now! Have any questions? Email na-help@yapc.org for more details. Additionally, we are still welcoming submissions for proposals via: http://yapc.org/America/cfp-2005.shtml The close of the call-for-papers is April 18, 2005 at 11:59 pm (Toronto time). If you have any questions regarding the call-for-papers or speaking at YAPC::NA 2005 please email na-author@yapc.org We would love to hear from potential sponsors. Please contact the organizers at na-sponsor@yapc.org to learn about the benefits of sponsorship. From Nathan.Johnson at astrazeneca.com Mon Mar 21 11:01:10 2005 From: Nathan.Johnson at astrazeneca.com (Johnson, Nathan) Date: Mon Mar 21 11:34:38 2005 Subject: [Bioperl-l] cigarline conversion Message-ID: Hi bioperlers Does anyone know of a module which handles the conversion of multiple alignment cigar line format(multiple strings with M and D's but no I's) cigar line data to a pairwise format (one string with M,D and I's). SimpleAlign doesn't seem to do what I want :\ Cheers Nath From zhoujie at fudan.edu.cn Tue Mar 22 04:51:40 2005 From: zhoujie at fudan.edu.cn (zhoujie@fudan.edu.cn) Date: Tue Mar 22 05:23:56 2005 Subject: [Bioperl-l] How to use proxy in Bioperl? Message-ID: <7c3e427c5712.7c57127c3e42@fudan.edu.cn> Hi all, I'm new to bioperl and here is my question: How to use proxy in bioperl? For example I'm using get_Seq_by_acc() method, how can I get the sequence when I can only access NCBI via proxy? Thanks very much. J Z From zhoujie at fudan.edu.cn Tue Mar 22 04:54:27 2005 From: zhoujie at fudan.edu.cn (zhoujie@fudan.edu.cn) Date: Tue Mar 22 05:26:43 2005 Subject: [Bioperl-l] How to use proxy in Bioperl? Message-ID: <7c53817c84b9.7c84b97c5381@fudan.edu.cn> Hi all, I'm new to bioperl and here is my question: How to use proxy in bioperl? For example I'm using get_Seq_by_acc() method, how can I get the sequence when I can only access NCBI via proxy? Thanks very much. J Z From Marc.Logghe at devgen.com Tue Mar 22 05:39:02 2005 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Tue Mar 22 05:33:38 2005 Subject: [Bioperl-l] How to use proxy in Bioperl? Message-ID: Hi JZ, In the back LWP::Simple is doing the request for you. By default it reads the proxy from your environment. Guess setting the HTTP_PROXY env var should solve your problem. HTH, Marc > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of > zhoujie@fudan.edu.cn > Sent: Tuesday, March 22, 2005 10:52 AM > To: bioperl-l@bioperl.org > Subject: [Bioperl-l] How to use proxy in Bioperl? > > Hi all, > > I'm new to bioperl and here is my question: How to use proxy > in bioperl? For example I'm using get_Seq_by_acc() method, > how can I get the sequence when I can only access NCBI via proxy? > > Thanks very much. > > J Z > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > From Marc.Logghe at devgen.com Tue Mar 22 05:58:21 2005 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Tue Mar 22 05:52:55 2005 Subject: [Bioperl-l] How to use proxy in Bioperl? Message-ID: You can also set it in your script: $gb=Bio::DB::GenBank->new(); $gb->proxy([ftp, http], 'http://'); Think this works also: $gb->proxy('http://'); HTH, Marc > -----Original Message----- > From: Marc Logghe > Sent: Tuesday, March 22, 2005 11:39 AM > To: 'zhoujie@fudan.edu.cn'; bioperl-l@bioperl.org > Subject: RE: [Bioperl-l] How to use proxy in Bioperl? > > Hi JZ, > In the back LWP::Simple is doing the request for you. By > default it reads the proxy from your environment. > Guess setting the HTTP_PROXY env var should solve your problem. > HTH, > Marc > > > -----Original Message----- > > From: bioperl-l-bounces@portal.open-bio.org > > [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of > > zhoujie@fudan.edu.cn > > Sent: Tuesday, March 22, 2005 10:52 AM > > To: bioperl-l@bioperl.org > > Subject: [Bioperl-l] How to use proxy in Bioperl? > > > > Hi all, > > > > I'm new to bioperl and here is my question: How to use proxy in > > bioperl? For example I'm using get_Seq_by_acc() method, how > can I get > > the sequence when I can only access NCBI via proxy? > > > > Thanks very much. > > > > J Z > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > From mlemieux at bioinfo.ca Tue Mar 22 12:26:35 2005 From: mlemieux at bioinfo.ca (Madeleine Lemieux) Date: Tue Mar 22 17:53:22 2005 Subject: [Bioperl-l] Easy switching from wwwBlast to QBlast In-Reply-To: <75616934-3FED-11D9-B611-000393C44276@duke.edu> References: <5B03F9DE-3DFA-11D9-AF99-000A95B139D2@bioinfo.ca> <75616934-3FED-11D9-B611-000393C44276@duke.edu> Message-ID: <0097862bb8ee879eaac2ecc69bc1b668@bioinfo.ca> Jason, I've used RemoteBlast.pm (1.5 version) as a template for a new module I'm calling LocalServerBlast.pm which lets users submit jobs to a wwwBlast server. I've also added a subroutine, wwwBlast_sequence, to Perl.pm that mimics blast_sequence. I've added support for both procedures to pass CGI parameters through to submit_blast. So one can do: # for QBlast use Bio::Perl; my $blast_report = blast_sequence($seq, {'-expect' => 1e-6, DESCRIPTIONS => 25}); and # for wwwBlast use Bio::Perl; my $blast_report = wwwBlast_sequence($seq, {'-expect' => 1e-6, DESCRIPTIONS => 25}); Only the procedure name has to be changed to switch between wwwBlast and QBlast. In fact, if a hash of parameters gets set up that includes both QBlast and wwwBlast options mixed in together, it doesn't matter since each server only looks at the parameters it recognizes and ignores all others; as long as the values of the parameters it does recognize are set correctly it just works. The only kludge needed for this is for ALIGNMENT_VIEW where QBlast expects a string but wwwBlast uses numbers to specify the view option. And in both cases, requesting a tabular view will cause the blast result parser to fail. I haven't bothered catching that particular error since it's the same behaviour in both cases. The interest for me was to use this for prototyping software that will eventually hit the NCBI Blast server but without clogging up the NCBI queue or wasting my internet connect time while I'm developing. I can also imagine it being useful in a center generating its own sequence databases and already using wwwBlast. Since the wwwBlast server doesn't support queues, there's no concept of RID and so no need for retrieve_blast in LocalServerBlast; instead, submit_blast returns an array of Bio::Tools::BPlite or Bio::Tools::Blast objects. I've also made a slight change to how RemoteBlast.pm checks the return status of blast jobs. The HTML returned from the NCBI server contains a status line near the top of the file so I just read far enough in the response file to pull that information out and then use that, rather than the filesize to decide if the job is ready, waiting, or failed. I've attached the patch files for Perl.pm and RemoteBlast.pm (cvs diff -aur against both 1.4 and 1.5) as well as the LocalServerBlast.pm file. I'm not sure what the protocol is for "cared for", "copyright" and "author" notices is. I've mostly just modified your and Ewan Birney's stuff. I'd be happy to care for these modules. I haven't written any code for the test suite yet but I'll start working on that soon. Also, upon further reflection, I decide not to incorporate the support for accession# and gi to blast_sequence. If anyone wants that, I can put it back in but for a first pass I didn't want to change Perl.pm too much. I've tested these modules with wwwBlast 2.2.9 and 2.2.10 under MacOS X. All the best, Madeleine > Dear Madeleine - > > Great. Would love for someone to be a maintainer and keeper of this > module. All your changes sound great. I think a new function in > Bio::Perl would be the best way to allow providing of a new > localserver. Note that Bio::Perl is supposed to really just be a > convenience of just having a list of functions for new users - so > there is room for new *well named* functions to be added there. > > As for applying the changes - you can submit a patch of differences > for your new code versus the current CVS HEAD by making changes and > then running "cvs diff -aur " to get the changes in a patch format. > You'll want to checkout the code via CVS first - > http://cvs.open-bio.org/. We have to give you an authorized account > to be able to apply changes back to the repository though. Once > you've submitted a few fixes to show you understand the toolkit and > the coding practices we can see about getting you that account. > > -jason > On Nov 24, 2004, at 4:22 AM, Madeleine Lemieux wrote: > >> I've just recently started exploring BioPerl (v.1.4). So far it's >> been fun if a little daunting. >> >> As an exercise, I decided to try change the blast_sequence subroutine >> in Perl.pm so that it would let me send the query to either my local >> wwwBlast server or out over my slow, flakey internet connection to >> the QBlast server. I did this by adding a parameter LOCALSERVER >> which, if set to a URL, redirects the query to that server (e.g. >> LOCALSERVER => http://localhost/blast/blast.cgi); otherwise, it >> defaults to the server at the NCBI. >> >> I've also added support for query by accession or gi # (QBlast only >> since wwwBlast doesn't support such queries), submission of multiple >> sequences (either in a file or string or string variable), as well as >> passing any of the QBlast Put and Get options as parameters. Unlike >> the original one, my blast_sequence returns an array of results, not >> a single result, so that code calling my version of blast_sequence in >> a scalar context would incorrectly get the size of the array. >> >> Apart from Perl.pm, the only other file that I had to change was >> Bio/Tools/Run/RemoteBlast.pm. I just downloaded the latest release >> candidate, 1.5.RC1, and noticed that RemoteBlast.pm has been changed >> in ways that overlap with the changes I've made while maintaining >> backwards compatibility which my version does not since I was only >> working for myself at the time. >> >> So my question is: is anyone interested in getting the code I've >> developed? If so, a corollary question is: how do I go about >> contributing the code? I can pretty easily forward port my changes to >> RemoteBlast.pm to the 1.5.RC1 version in order to use the nice >> "validate by regexp" trick introduced there and to provide backwards >> compatibility. I'm not sure what to do about the Perl.pm module, >> though. I guess that the easiest would be to change the name of my >> blast_sequence subroutine and add it to Perl.pm since there is no >> object interface being altered. >> >> As I was working on this, I noticed that the HTML stripping that gets >> done on the response from the QBlast server fails on wwwBlast output >> since the format of the HTML is a little different (manifests as a >> "can't find mid-line data" error when processing the alignments). So >> I wrote a generic stripper which removes all HTML tags except those >> that contain an end-of-line within the tag itself or an internal, >> un-escaped closing angle bracket (>) which wouldn't be valid HTML >> anyway, I think. It doesn't touch single angle brackets (>) such as >> those found at the beginning of descriptions (>gi ...). >> # html stripper >> # remove simple and closing tags first and then leftover tags >> $str =~ s/<(\/)?\w+>//g; >> $str =~ s/<\D+([^>]*\n*)*>//g; >> >> Also, when retrieving RIDs in RemoteBlast.pm (retrieve_rid), the test >> for completion relies on the size of the file containing the reply. >> This has failed at least once for me. Since there is a status line >> near the top of the file in the response, it seems to me that >> something along the lines of the following might be more robust: >> # read file until QBlastInfoEnd to pull out status >> my $status = ''; >> my $junk = ''; >> open(TMP, $tempfile) or $self->throw("cannot open $tempfile"); >> while( defined (my $line = ) ) { >> last if ($line =~ /QBlastInfoEnd/); >> ($junk, $status) = (split /=/, $line) if ($line =~ >> /waiting|ready/i); >> } >> close TMP; >> >> if( $response->is_success ) { >> if ( $status =~ /waiting/i ) { >> return 0; >> } elsif ( $status =~ /ready/i ) { >> ... >> } else { # failed >> ... >> } >> } ... >> >> Finally, let me end by thanking all the BioPerl contributors for >> their fine work. >> >> Regards, >> Madeleine >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >> > -- > Jason Stajich > jason.stajich at duke.edu > http://www.duke.edu/~jes12/ -------------- next part -------------- A non-text attachment was scrubbed... Name: RemoteBlast.pm.diff-1.4 Type: application/octet-stream Size: 22148 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050322/11e8734b/RemoteBlast.pm.diff-1-0002.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: RemoteBlast.pm.diff-1.5 Type: application/octet-stream Size: 22150 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050322/11e8734b/RemoteBlast.pm.diff-1-0003.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: Perl.pm.diff-1.4 Type: application/octet-stream Size: 21885 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050322/11e8734b/Perl.pm.diff-1-0002.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: Perl.pm.diff-1.5 Type: application/octet-stream Size: 19626 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050322/11e8734b/Perl.pm.diff-1-0003.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: LocalServerBlast.pm Type: application/octet-stream Size: 16943 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050322/11e8734b/LocalServerBlast-0001.obj -------------- next part -------------- From sebastien.moretti at igs.cnrs-mrs.fr Thu Mar 24 06:05:27 2005 From: sebastien.moretti at igs.cnrs-mrs.fr (Sebastien Moretti) Date: Thu Mar 24 05:59:52 2005 Subject: [Bioperl-l] [How to add features in genbank flat file] In-Reply-To: <200502151525.38790.moretti@igs.cnrs-mrs.fr> References: <200502151525.38790.moretti@igs.cnrs-mrs.fr> Message-ID: <42429EF7.4050504@igs.cnrs-mrs.fr> Hello, No one seems to have a solution to this problem I posted a month ago. So, I changed my mind and use 'wget' to get the GenBank sequences. I get the full GenBank entry, with most of features. And I can avoid another bug: COMMENT lines are not well formated with the BioPerl script I used (not as COMMENT lines are on NCBI), and blank lines are removed. #!/usr/bin/perl -w use strict; use diagnostics; use File::Cat; my $acc=$ARGV[0] or die "\n\tThe accession number you seek for is missing.\n\tTry something like: $0 NM_178432\n\n"; `wget -O output_file.tmp "http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&qty=1&c_start=1&val=$acc&dopt=gbwithparts&send=Send&sendto=t&from=begin&to=end&extrafeatpresent=1&ef_SNP=1&ef_CDD=8&ef_MGC=16&ef_HPRD=32" 2>/dev/null`; cat ("output_file.tmp", \*STDOUT); unlink("output_file.tmp"); # wget -O output_file 'http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&qty=1&c_start=1&val=NM_178432&dopt=gbwithparts&send=Send&sendto=t&from=begin&to=end&extrafeatpresent=1&ef_SNP=1&ef_CDD=8&ef_MGC=16&ef_HPRD=32' exit; Sorry, I don't use BioPerl to Query GenBank (but for other applications) but BioPerl 1.5 has not corrected the COMMENT bug and the missing features. > Hello, > I saw that Genbank web site have changed: > Now, features like 'SNPs' are no more included in the EST flat files. > At the NCBI web site, we must click on 'features: SNP' to add them in our flat > file. > > With BioPerl, 1.4 or 1.5, it's the same, the variation features are no more > included in the EST flat files that I upload. > > Here is the script I use: > #!/usr/bin/perl -w > > use strict; > use Bio::DB::GenBank; > use Bio::DB::Query::GenBank; > use Bio::SeqIO; > my $acc=$ARGV[0] or die "\n\tThe accession number you seek for is missing. > \n\tTry something like: $0 NM_178432\n\n"; > > $acc=$acc."[Accession]"; > > my $query_string = "$acc"; > my $query = Bio::DB::Query::GenBank->new(-db=>'nucleotide', > -query=>$query_string); > > my $gb = new Bio::DB::GenBank; > my $stream = $gb->get_Stream_by_query($query); > > my $out=Bio::SeqIO->new(-format=>'genbank'); > my $seq = $stream->next_seq(); > > my $result=$out->write_seq($seq); > $result =~ s/^1.*$//; > #print $out->write_seq($seq); > print $result; > > exit; > > How can I add most of features to my nucleotide flat files ? > > Thanks -- S?bastien Moretti http://igs.cnrs-mrs.fr/ CNRS - IGS 31 chemin Joseph Aiguier 13402 Marseille cedex From cerdman2 at du.edu Thu Mar 24 12:54:50 2005 From: cerdman2 at du.edu (Colin Erdman) Date: Thu Mar 24 12:49:26 2005 Subject: [Bioperl-l] Assistance with a BioPerl/Perl project Message-ID: <0IDV00EJ0B3JVD@smtpout.cair.du.edu> Hello list, I am a 22 year old bioinformatics and molecular biology major at the University of Denver. I just accepted a position with a researcher here, and already have a first assignment. We are working on a comprehensive chromosome 21 gene database and map and my first task is to update a list of known (and curated) Human chromosome 21 genes. I have become rapidly familiar with BioPerl however my adviser needs me to use Entrez Gene to compare the currently known Chr 21 genes (from query: '21[CHR] AND Homo sapiens[ORGN] AND NOT Pseudogene' ) with a list of genes that she has provided in xls and xml format. The idea is to take the accession numbers in the provided files, pull the nucleotide sequence from them, and run those against the sequences for records found with the Entrez Gene query in order to find any newly annotated/(discovered/elucidated?) genes for that sequence. I am familiar with the current problem of BioPerl not directly being able to parse the EntrezGene object, but have played with the Bio::SeqIO::Gene2accession (& geneinfo) and the egparser. My programming skills are not completely up to par, so egparser is tough for me to grasp. Bio::SeqIO::Gene2accession is more intuitive, however I am having a terrible time figuring out how to convert my desired entrezgene results into the legacy gene_info and gene2accession formats? Any suggestions are greatly appreciated, I am very new at this, so very simple coding examples and explanations help and are the best way for me to learn. Thanks all! colin From sdavis2 at mail.nih.gov Thu Mar 24 13:49:40 2005 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Thu Mar 24 13:44:01 2005 Subject: [Bioperl-l] Assistance with a BioPerl/Perl project In-Reply-To: <0IDV00EJ0B3JVD@smtpout.cair.du.edu> References: <0IDV00EJ0B3JVD@smtpout.cair.du.edu> Message-ID: <0d3beed247327883b45b2e29ca07a864@mail.nih.gov> If you are starting with Genbank Accession numbers and want to get to Entrez Gene, the "standard" way to do that is to use Unigene. If you go to the Entrez website and choose the Unigene database, you can type in your accession and you will be taken to a unigene record. If you click on the "links" section, you can then link to Entrez Gene. To do this in batch mode, I download Hs.data.gz from NCBI at: ftp://ftp.ncbi.nih.gov/repository/UniGene/ Then, you can use Bio::ClusterIO to parse Unigene. Grab the accession_number part of each sequence (there is an example of doing this in the POD documentation). You can then make a hash like: push(@{$acc_hash{$acc}},$in->unigene_id}; which maps accessions to unigene ids. Make a second hash that maps unigene to gene using the file: ftp://ftp.ncbi.nih.gov/gene/DATA/gene2unigene which will map the unigene ids to gene. Then, you have the information you need to map from accession to gene via unigene. Just a note on Entrez Gene: the Gene does not represent a sequence, but instead a set of sequences. The sequences are Refseq sequences. So, you wouldn't be blasting against "Gene" per say, but against the one or several Refseq sequences (if there are any) that represent the Gene. Hope this helps. Standard disclaimer: as with perl AND bioinformatics, there is more than one way to do this. And keep in mind that Entrez Gene is only one source of annotation; for chromosome 21, there may be other sites that have more information, specifically Ensembl. Sean On Mar 24, 2005, at 12:54 PM, Colin Erdman wrote: > Hello list, > > > > I am a 22 year old bioinformatics and molecular biology major at the > University of Denver. I just accepted a position with a researcher > here, and > already have a first assignment. We are working on a comprehensive > chromosome 21 gene database and map and my first task is to update a > list of > known (and curated) Human chromosome 21 genes. I have become rapidly > familiar with BioPerl however my adviser needs me to use Entrez Gene to > compare the currently known Chr 21 genes (from query: '21[CHR] AND Homo > sapiens[ORGN] AND NOT Pseudogene' ) with a list of genes that she has > provided in xls and xml format. > > The idea is to take the accession numbers in the provided files, pull > the > nucleotide sequence from them, and run those against the sequences for > records found with the Entrez Gene query in order to find any newly > annotated/(discovered/elucidated?) genes for that sequence. I am > familiar > with the current problem of BioPerl not directly being able to parse > the > EntrezGene object, but have played with the Bio::SeqIO::Gene2accession > (& > geneinfo) and the egparser. My programming skills are not completely > up to > par, so egparser is tough for me to grasp. Bio::SeqIO::Gene2accession > is > more intuitive, however I am having a terrible time figuring out how to > convert my desired entrezgene results into the legacy gene_info and > gene2accession formats? Any suggestions are greatly appreciated, I am > very > new at this, so very simple coding examples and explanations help and > are > the best way for me to learn. > > > > Thanks all! > > colin > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From Kary at ioc.fiocruz.br Thu Mar 24 15:24:11 2005 From: Kary at ioc.fiocruz.br (Kary Ann Del Carmen Soriano Ocana) Date: Thu Mar 24 15:23:52 2005 Subject: [Bioperl-l] Help with hmmpfam Message-ID: <8D44604203DAF9438BF9123B4A08C779B2700A@alpha.ioc.fiocruz.br> Dear All, I am new to bioperl and would like (if possible) to obtain some help with the SearcIO module and hmmpfam. I am listing my code below and the output containing the following error: (partial) output and error: [kary@vivax inserir_dados]$ perl bioperl_pfam_23_03_05.pl Passou a definicao do arquivo query passou abrir o arquivo mmm.hmm sh: -c: line 0: syntax error near unexpected token `(' sh: -c: line 0: `/usr/local/bin/hmmpfam -E 0.0001 1 HMMER2.0 [2.3.2]\nNAME 76GJYz8zFm\nLENG 327\nALPH I put some "print" commands everywhere to see where I am getting the error and looks like it is not entering/printing the while results (eg: next_result, next_hit). Any help would be greatly appreciated. Thanks, Kary ************ Script: #!/usr/bin/perl -w use lib "/usr/local/bioperl14"; use lib "/usr/local/bioperl-run-1.4"; use Bio::Search::Result::HMMERResult; use Bio::Tools::Run::Hmmer; use Bio::Tools::Run::Hmmpfam; use strict; my $query; my $db; my $seq; my $dbfile; my @array; $query = "sequencia_fasta_4_arg.txt"; print "Passou a definicao do arquivo query\n"; open (READ, "$query") or die "Cannot open $query: $!"; while (my $sequence = ){ for ($sequence) { &hmmpfam($sequence); #print $seq; } } close (READ); print "Passou leitura do arquivo query\n"; ############################################################################################################################################# sub hmmpfam { my ($seq) = @_; $db = "mmm.hmm"; open (DH, "$db") or die "Cannot open $db: $!"; print "passou abrir o arquivo mmm.hmm\n\n"; while ($dbfile = ){ #Build a Hmmpfam factory my @params = ('DB'=>$dbfile,'E'=>0.0001); my $factory = Bio::Tools::Run::Hmmpfam->new(@params); # Pass the factory a Bio::Seq object or a file name # returns a Bio::SearchIO object my $search = $factory->run($seq); print "Search: $search\n"; print "Passou search com parametros \n"; my @feat; my $searchio = new Bio::SearchIO(-format => 'hmmer', -file => 'result.hmmer') or die print "Error for open the file"; while (my $result = $searchio->next_result){ print "come?a o while do NEXT RESULT\n\n"; while(my $hit = $result->next_hit){ print "come?a o while do HIT - NEXT HIT\n\n"; while (my $hsp = my $hit->next_hsp){ print join("\t", ( my$r->query_name, $hsp->query->start, $hsp->query->end, $hit->name, $hsp->hit->start, $hsp->hit->end, $hsp->score, $hsp->evalue, $hsp->seq_str, )), "\n"; print "terminou o while dos HSPs\n\n"; } } } } close (DH); } From cerdman2 at du.edu Thu Mar 24 16:46:36 2005 From: cerdman2 at du.edu (Colin Erdman) Date: Thu Mar 24 17:42:03 2005 Subject: [Bioperl-l] Assistance with a BioPerl/Perl project In-Reply-To: <0d3beed247327883b45b2e29ca07a864@mail.nih.gov> Message-ID: <0IDV00IILLTX7V@smtpout.cair.du.edu> So in effect, this is just as good as taking the actual nucleotide sequences (derived using a GenBank lookup) from my static accession number list and running them through the 'member sequences' of my genes (clusters) of interest in order to see if any new gene products or information have been added for that sequence? And where would you suspect that BLASTN will then fit into the scheme. I apologize for the redundancy, there is just so much to take in! Thanks, Colin -----Original Message----- From: Sean Davis [mailto:sdavis2@mail.nih.gov] Sent: Thursday, March 24, 2005 11:50 AM To: Colin Erdman Cc: bioperl-l@portal.open-bio.org Subject: Re: [Bioperl-l] Assistance with a BioPerl/Perl project If you are starting with Genbank Accession numbers and want to get to Entrez Gene, the "standard" way to do that is to use Unigene. If you go to the Entrez website and choose the Unigene database, you can type in your accession and you will be taken to a unigene record. If you click on the "links" section, you can then link to Entrez Gene. To do this in batch mode, I download Hs.data.gz from NCBI at: ftp://ftp.ncbi.nih.gov/repository/UniGene/ Then, you can use Bio::ClusterIO to parse Unigene. Grab the accession_number part of each sequence (there is an example of doing this in the POD documentation). You can then make a hash like: push(@{$acc_hash{$acc}},$in->unigene_id}; which maps accessions to unigene ids. Make a second hash that maps unigene to gene using the file: ftp://ftp.ncbi.nih.gov/gene/DATA/gene2unigene which will map the unigene ids to gene. Then, you have the information you need to map from accession to gene via unigene. Just a note on Entrez Gene: the Gene does not represent a sequence, but instead a set of sequences. The sequences are Refseq sequences. So, you wouldn't be blasting against "Gene" per say, but against the one or several Refseq sequences (if there are any) that represent the Gene. Hope this helps. Standard disclaimer: as with perl AND bioinformatics, there is more than one way to do this. And keep in mind that Entrez Gene is only one source of annotation; for chromosome 21, there may be other sites that have more information, specifically Ensembl. Sean On Mar 24, 2005, at 12:54 PM, Colin Erdman wrote: > Hello list, > > > > I am a 22 year old bioinformatics and molecular biology major at the > University of Denver. I just accepted a position with a researcher > here, and > already have a first assignment. We are working on a comprehensive > chromosome 21 gene database and map and my first task is to update a > list of > known (and curated) Human chromosome 21 genes. I have become rapidly > familiar with BioPerl however my adviser needs me to use Entrez Gene to > compare the currently known Chr 21 genes (from query: '21[CHR] AND Homo > sapiens[ORGN] AND NOT Pseudogene' ) with a list of genes that she has > provided in xls and xml format. > > The idea is to take the accession numbers in the provided files, pull > the > nucleotide sequence from them, and run those against the sequences for > records found with the Entrez Gene query in order to find any newly > annotated/(discovered/elucidated?) genes for that sequence. I am > familiar > with the current problem of BioPerl not directly being able to parse > the > EntrezGene object, but have played with the Bio::SeqIO::Gene2accession > (& > geneinfo) and the egparser. My programming skills are not completely > up to > par, so egparser is tough for me to grasp. Bio::SeqIO::Gene2accession > is > more intuitive, however I am having a terrible time figuring out how to > convert my desired entrezgene results into the legacy gene_info and > gene2accession formats? Any suggestions are greatly appreciated, I am > very > new at this, so very simple coding examples and explanations help and > are > the best way for me to learn. > > > > Thanks all! > > colin > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From sdavis2 at mail.nih.gov Thu Mar 24 18:10:08 2005 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Thu Mar 24 18:04:34 2005 Subject: [Bioperl-l] Assistance with a BioPerl/Perl project In-Reply-To: <0IDV00IILLTX7V@smtpout.cair.du.edu> References: <0IDV00IILLTX7V@smtpout.cair.du.edu> Message-ID: <536997326a76511d9638b5340225f03e@mail.nih.gov> If I understood you correctly, you are starting with a list of genbank accession numbers? If you start with, for example, CR407631: Go to: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=&DB=unigene and type in that accession. You will see the resulting Unigene entry and after one click to get details you will be at this page: http://www.ncbi.nlm.nih.gov/UniGene/clust.cgi?ORG=Hs&CID=2 There is a small "links" link just under the search bar. Normally, you can link from there to Gene (but it appears to be broken at the moment). In any case, with the file from below, you can look up a unigene id and get the Entrez Gene (if there is one) entry. The nice thing about using Unigene is that there is no blasting involved at all. What you end up with is an Entrez Gene (and bonus Unigene id) associated with your accession (most of the time, but some will not be in Unigene for various reasons). You can then mine Gene for whatever information you want to assign to the accessions. For that, you will need either a gene parser (from sourceforge) or just use the tab-delimited text files from the Gene/DATA ftp site noted in my previous email to get the information you want. -------------------------------- Now, if you really want the easy way to do the above, go to: http://genome-www5.stanford.edu/cgi-bin/source/sourceBatchSearch Here, just paste in your accessions and get whatever information back you want--very nice site for this. (They still call it LocusLink ID, but that is a Gene ID as well). Hope this helps. Sean On Mar 24, 2005, at 4:46 PM, Colin Erdman wrote: > So in effect, this is just as good as taking the actual nucleotide > sequences > (derived using a GenBank lookup) from my static accession number list > and > running them through the 'member sequences' of my genes (clusters) of > interest in order to see if any new gene products or information have > been > added for that sequence? And where would you suspect that BLASTN will > then > fit into the scheme. I apologize for the redundancy, there is just so > much > to take in! > > Thanks, > Colin > > -----Original Message----- > From: Sean Davis [mailto:sdavis2@mail.nih.gov] > Sent: Thursday, March 24, 2005 11:50 AM > To: Colin Erdman > Cc: bioperl-l@portal.open-bio.org > Subject: Re: [Bioperl-l] Assistance with a BioPerl/Perl project > > If you are starting with Genbank Accession numbers and want to get to > Entrez Gene, the "standard" way to do that is to use Unigene. If you > go to the Entrez website and choose the Unigene database, you can type > in your accession and you will be taken to a unigene record. If you > click on the "links" section, you can then link to Entrez Gene. > > To do this in batch mode, I download Hs.data.gz from NCBI at: > > ftp://ftp.ncbi.nih.gov/repository/UniGene/ > > Then, you can use Bio::ClusterIO to parse Unigene. Grab the > accession_number part of each sequence (there is an example of doing > this in the POD documentation). You can then make a hash like: > > push(@{$acc_hash{$acc}},$in->unigene_id}; > > which maps accessions to unigene ids. > > Make a second hash that maps unigene to gene using the file: > > ftp://ftp.ncbi.nih.gov/gene/DATA/gene2unigene > > which will map the unigene ids to gene. > > Then, you have the information you need to map from accession to gene > via unigene. > > Just a note on Entrez Gene: the Gene does not represent a sequence, > but instead a set of sequences. The sequences are Refseq sequences. > So, you wouldn't be blasting against "Gene" per say, but against the > one or several Refseq sequences (if there are any) that represent the > Gene. > > Hope this helps. Standard disclaimer: as with perl AND > bioinformatics, there is more than one way to do this. And keep in > mind that Entrez Gene is only one source of annotation; for chromosome > 21, there may be other sites that have more information, specifically > Ensembl. > > Sean > > > On Mar 24, 2005, at 12:54 PM, Colin Erdman wrote: > >> Hello list, >> >> >> >> I am a 22 year old bioinformatics and molecular biology major at the >> University of Denver. I just accepted a position with a researcher >> here, and >> already have a first assignment. We are working on a comprehensive >> chromosome 21 gene database and map and my first task is to update a >> list of >> known (and curated) Human chromosome 21 genes. I have become rapidly >> familiar with BioPerl however my adviser needs me to use Entrez Gene >> to >> compare the currently known Chr 21 genes (from query: '21[CHR] AND >> Homo >> sapiens[ORGN] AND NOT Pseudogene' ) with a list of genes that she has >> provided in xls and xml format. >> >> The idea is to take the accession numbers in the provided files, pull >> the >> nucleotide sequence from them, and run those against the sequences for >> records found with the Entrez Gene query in order to find any newly >> annotated/(discovered/elucidated?) genes for that sequence. I am >> familiar >> with the current problem of BioPerl not directly being able to parse >> the >> EntrezGene object, but have played with the Bio::SeqIO::Gene2accession >> (& >> geneinfo) and the egparser. My programming skills are not completely >> up to >> par, so egparser is tough for me to grasp. Bio::SeqIO::Gene2accession >> is >> more intuitive, however I am having a terrible time figuring out how >> to >> convert my desired entrezgene results into the legacy gene_info and >> gene2accession formats? Any suggestions are greatly appreciated, I am >> very >> new at this, so very simple coding examples and explanations help and >> are >> the best way for me to learn. >> >> >> >> Thanks all! >> >> colin >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l > > From jason.stajich at duke.edu Thu Mar 24 20:32:54 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Thu Mar 24 20:27:35 2005 Subject: [Bioperl-l] Help with hmmpfam In-Reply-To: <8D44604203DAF9438BF9123B4A08C779B2700A@alpha.ioc.fiocruz.br> References: <8D44604203DAF9438BF9123B4A08C779B2700A@alpha.ioc.fiocruz.br> Message-ID: We would really need your hmmpfam output to diagnose the problem (result.hmmer) -jason -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ On Mar 24, 2005, at 12:24 PM, Kary Ann Del Carmen Soriano Ocana wrote: > Dear All, > > I am new to bioperl and would like (if possible) to obtain some help > with the SearcIO module and hmmpfam. I am listing my code below and > the output containing the following error: > > (partial) output and error: > > [kary@vivax inserir_dados]$ perl bioperl_pfam_23_03_05.pl > Passou a definicao do arquivo query > passou abrir o arquivo mmm.hmm > > sh: -c: line 0: syntax error near unexpected token `(' > sh: -c: line 0: `/usr/local/bin/hmmpfam -E 0.0001 1 > HMMER2.0 [2.3.2]\nNAME 76GJYz8zFm\nLENG 327\nALPH > > I put some "print" commands everywhere to see where I am getting the > error and looks like it is not entering/printing the while results > (eg: next_result, next_hit). Any help would be greatly appreciated. > > Thanks, Kary > > ************ > > Script: > > #!/usr/bin/perl -w > > use lib "/usr/local/bioperl14"; > use lib "/usr/local/bioperl-run-1.4"; > > use Bio::Search::Result::HMMERResult; > use Bio::Tools::Run::Hmmer; > use Bio::Tools::Run::Hmmpfam; > use strict; > > my $query; > my $db; > my $seq; > my $dbfile; > my @array; > > $query = "sequencia_fasta_4_arg.txt"; > > print "Passou a definicao do arquivo query\n"; > > open (READ, "$query") or die "Cannot open $query: $!"; > while (my $sequence = ){ > for ($sequence) { > &hmmpfam($sequence); > #print $seq; > } > } > close (READ); > > print "Passou leitura do arquivo query\n"; > ####################################################################### > ###################################################################### > sub hmmpfam { > my ($seq) = @_; > $db = "mmm.hmm"; > open (DH, "$db") or die "Cannot open $db: $!"; > > print "passou abrir o arquivo mmm.hmm\n\n"; > > while ($dbfile = ){ > > #Build a Hmmpfam factory > my @params = ('DB'=>$dbfile,'E'=>0.0001); > > > my $factory = Bio::Tools::Run::Hmmpfam->new(@params); > > > # Pass the factory a Bio::Seq object or a file name > # returns a Bio::SearchIO object > my $search = $factory->run($seq); > print "Search: $search\n"; > > print "Passou search com parametros \n"; > > > my @feat; > > my $searchio = new Bio::SearchIO(-format => 'hmmer', > -file => 'result.hmmer') or die > print "Error for open the file"; > > while (my $result = $searchio->next_result){ > print "come?a o while do NEXT RESULT\n\n"; > while(my $hit = $result->next_hit){ > print "come?a o while do HIT - NEXT HIT\n\n"; > while (my $hsp = my $hit->next_hsp){ > print join("\t", ( my$r->query_name, > $hsp->query->start, > $hsp->query->end, > $hit->name, > $hsp->hit->start, > $hsp->hit->end, > $hsp->score, > $hsp->evalue, > $hsp->seq_str, > )), "\n"; > print "terminou o while dos HSPs\n\n"; > > } > } > } > > } > > > close (DH); > } > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From Mark.Hoebeke at jouy.inra.fr Thu Mar 24 10:59:22 2005 From: Mark.Hoebeke at jouy.inra.fr (Mark Hoebeke) Date: Thu Mar 24 20:30:34 2005 Subject: [Bioperl-l] Hierarchical location parsing Message-ID: <1111679962.18235.8.camel@homer> Hi, confronted with a bug related to hierarchical location parsing[1], I checked the source code of Bio::Factory::FTLocationFactory.pm (both in 1.5 and bioperl-live). The comments around the code clearly state that hierarchical locations are not supported. Is this shortcoming due to performance concerns, or just because it seems tedious to code ;D ? Mark [1] Example of hierarchical location description : join(1000,join(2000,join(3000,4000))) -- --------------------------Mark.Hoebeke@jouy.inra.fr---------------------- Unit? Statistique & G?nome Unit? MIG +33 (0)1 60 87 38 03 T?l. +33 (0)1 34 65 28 85 +33 (0)1 60 87 38 09 Fax. +33 (0)1 34 65 29 01 Tour Evry 2, 523 pl. des Terrasses INRA - Domaine de Vilvert F - 91000 Evry F - 78352 Jouy-en-Josas CEDEX -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: Ceci est une partie de message =?ISO-8859-1?Q?num=E9riquement?= =?ISO-8859-1?Q?_sign=E9e?= Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050324/738037c8/attachment.bin From sdavis2 at mail.nih.gov Thu Mar 24 20:42:34 2005 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Thu Mar 24 20:36:42 2005 Subject: [Bioperl-l] Fw: [Gene-announce] Announcing gene2xml to facilitate conversion of Entrez Gene ASN.1 to XML and more... Message-ID: <000b01c530db$eadf7010$1f6df345@WATSON> Forwarded from a colleague: ----- Original Message ----- Sent: Thursday, March 24, 2005 6:33 PM Subject: Fwd: [Gene-announce] Announcing gene2xml to facilitate conversion of Entrez Gene ASN.1 to XML and more... >>Sender: gene-announce-bounces@ncbi.nlm.nih.gov >> >>Contents: >> 1. new directory on the ftp site >> 2. release of gene2xml to convert the files in the new directory to XML >> 3. modifications in Entrez Gene displays >> 4. modifications in Entrez Gene content >> 5. Gene chapter in the NCBI handbook >> >>1. the new ASN_BINARY subdirectory >> >> We would like to announce that Entrez Gene has added a new subdirectory >> to >>its ftp site, namely /DATA/ASN_BINARY. The subdirectories and files in >>ASN_BINARY have the same scope as the files in the /DATA/ASN subdirectory, >>namely comprensive extractions of Entrez Gene records. The difference in >>the >>directories is that the format of files in ASN_BINARY is binary, and the >>organization of the records is as an Entrezgene set. The files in the ASN >>directory are ASN.1 text and the records are concatenated. >> >> >>2. gene2xml >> >> The ASN_BINARY format is being introduced in conjunction with the tool >>gene2xml, which readily converts the binary ASN.1 to XML. >>The tool is available from >> >> ftp://ftp.ncbi.nlm.nih.gov/toolbox/ncbi_tools/cmdline/ >> >>for multiple platforms. >> >> gene2xml.Darwin-7.8.0-Power_Macintosh.gz >> gene2xml.Linux-2.4.23-P3-4G-i686.gz >> gene2xml.OSF1-V5.1-alpha.gz >> gene2xml.SunOS-5.8-sun4u.gz >> gene2xml.win32.exe.gz >> >> >>The documentation for this program is provided in this README file: >> >> ftp://ftp.ncbi.nlm.nih.gov/gene/tools/README. >> >> We would like to draw your attention to some of the functions of >> gene2xml. >>If you are interested in Gene records for only one species or strain, and >>records for that species or strain have not already been provided in a >>separate file, there is an option (-t) that extracts records based on the >>NCBI >>Taxonomy identifier for that species or strain. >> >> There is also an option (-x) to convert the binary ASN.1 Entrezgene set >>into >>the concatenated ASN.1 text we have been providing. >> >> Our plan is to provide both formats for an indeterminate period, but >> then >>discontinue production of the files in the ASN directory, because that >>format >>can be (and is by us) be reproduced from the gene2xml tool. >> >>Please be reminded that the Entrezgene specification is here: >> >>http://www.ncbi.nlm.nih.gov/IEB/ToolBox/CPP_DOC/lxr/source/src/objects/entre >>zgene/entrezgene.asn >> >>and the DTD for Entrez Gene is here: >> >> http://www.ncbi.nlm.nih.gov/dtd/ >> >> >>3. changes in Entrez Gene display >> >> In the next few days, we will be adding limited context-specific help >> to >>subdivisions of the Entrez Gene graphic (default) display. This means >>that >>question marks (?) will occur at the far right of the blue bar. These will >>anchor links to the appropriate subsection of the Entrez Gene help >>document. >> >>4. changes in content >> >> In the next few days, we will be adding a new subsection to the >> record, >>'Alleles'. This section reports the general characteristics of alleles >>that >>have been described for a gene, and provides links to more detailed >>information. This function is being phased in gradually; the current set >>is >>for mouse and is being developed from information supplied by Mouse Genome >>Informatics. >> >> >>5. NCBI Handbook chapter 19 >> >> >>The LocusLink chapter of the NCBI handbook >> >>http://www.ncbi.nlm.nih.gov/books/bv.fcgi?call=bv.View..ShowTOC&rid=handbook >>.TOC&depth=2 >> >>has now been replaced with a chapter describing Entrez Gene. >> >>http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=handbook.chapter.ch19 >> >>We hope this chapter helps answer your questions. If not, you can email >>your questions to info@ncbi.nlm.nih.gov or lodge your comments here: >> >> http://www.ncbi.nlm.nih.gov/RefSeq/update.cgi >> >>_______________________________________________ >>Gene-announce mailing list >>Gene-announce@ncbi.nlm.nih.gov >>http://www.ncbi.nlm.nih.gov/mailman/listinfo/gene-announce > From jason.stajich at duke.edu Thu Mar 24 20:42:49 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Thu Mar 24 20:37:01 2005 Subject: [Bioperl-l] cigarline conversion In-Reply-To: References: Message-ID: <13512c74d64cddf8afb07c4b6bf55b1f@duke.edu> I think you'll have to write it or steal from Ensembl. I assume it isn't so hard to do walking through the seqs in the alignment. Propose the algorithim to convert it and maybe some of the willing volunteers who listen to the list and want to be contributing to bioinformatics will volunteer to code it. -jason -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ On Mar 21, 2005, at 8:01 AM, Johnson, Nathan wrote: > Hi bioperlers > > Does anyone know of a module which handles the conversion of multiple > alignment cigar line format(multiple strings with M and D's but no I's) > cigar line data to a pairwise format (one string with M,D and I's). > > SimpleAlign doesn't seem to do what I want :\ > > Cheers > > Nath > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From jason.stajich at duke.edu Thu Mar 24 20:51:28 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Thu Mar 24 20:45:52 2005 Subject: [Bioperl-l] [How to add features in genbank flat file] In-Reply-To: <42429EF7.4050504@igs.cnrs-mrs.fr> References: <200502151525.38790.moretti@igs.cnrs-mrs.fr> <42429EF7.4050504@igs.cnrs-mrs.fr> Message-ID: You seem annoyed that no one solved the problem for you - I hope that you realize that if you want a specific feature you can also modify the module yourself and provide a patch to the project. As for the specifics of your problem perhaps if you highlight what the entrez key-value sets need to be set to in order to get the SNP data we can add it to the GenBank::Query as an option. Removing the blank lines is part of the SeqIO parsing but I suppose a state variable could be added in genbank.pm to not skip them when in the 'COMMENT' state if this is a critical feature for you. If you are just downloading genbank files it looks like you have a good solution so I'm glad you were able to figure it out. -jason > Hello, > No one seems to have a solution to this problem I posted a month ago. > > So, I changed my mind and use 'wget' to get the GenBank sequences. > I get the full GenBank entry, with most of features. > And I can avoid another bug: COMMENT lines are not well formated with > the BioPerl script I used (not as COMMENT lines are on NCBI), and > blank lines are removed. > > > #!/usr/bin/perl -w > > use strict; > use diagnostics; > use File::Cat; > > my $acc=$ARGV[0] or die "\n\tThe accession number you seek for is > missing.\n\tTry something like: $0 NM_178432\n\n"; > > `wget -O output_file.tmp > "http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi? > db=nucleotide&qty=1&c_start=1&val=$acc&dopt=gbwithparts&send=Send&sendt > o=t&from=begin&to=end&extrafeatpresent=1&ef_SNP=1&ef_CDD=8&ef_MGC=16&ef > _HPRD=32" 2>/dev/null`; > > cat ("output_file.tmp", \*STDOUT); > unlink("output_file.tmp"); > > # wget -O output_file > 'http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi? > db=nucleotide&qty=1&c_start=1&val=NM_178432&dopt=gbwithparts&send=Send& > sendto=t&from=begin&to=end&extrafeatpresent=1&ef_SNP=1&ef_CDD=8&ef_MGC= > 16&ef_HPRD=32' > > exit; > > > Sorry, I don't use BioPerl to Query GenBank (but for other > applications) but BioPerl 1.5 has not corrected the COMMENT bug and > the missing features. > >> Hello, >> I saw that Genbank web site have changed: >> Now, features like 'SNPs' are no more included in the EST flat files. >> At the NCBI web site, we must click on 'features: SNP' to add them in >> our flat file. >> With BioPerl, 1.4 or 1.5, it's the same, the variation features are >> no more included in the EST flat files that I upload. >> Here is the script I use: >> #!/usr/bin/perl -w >> >> use strict; >> use Bio::DB::GenBank; >> use Bio::DB::Query::GenBank; >> use Bio::SeqIO; >> my $acc=$ARGV[0] or die "\n\tThe accession number you seek for is >> missing. >> \n\tTry something like: $0 NM_178432\n\n"; >> >> $acc=$acc."[Accession]"; >> >> my $query_string = "$acc"; >> my $query = Bio::DB::Query::GenBank->new(-db=>'nucleotide', >> >> -query=>$query_string); >> >> my $gb = new Bio::DB::GenBank; >> my $stream = $gb->get_Stream_by_query($query); >> >> my $out=Bio::SeqIO->new(-format=>'genbank'); >> my $seq = $stream->next_seq(); >> >> my $result=$out->write_seq($seq); >> $result =~ s/^1.*$//; >> #print $out->write_seq($seq); >> print $result; >> >> exit; >> How can I add most of features to my nucleotide flat files ? >> Thanks > > -- > S?bastien Moretti > http://igs.cnrs-mrs.fr/ > CNRS - IGS > 31 chemin Joseph Aiguier > 13402 Marseille cedex > From jason.stajich at duke.edu Thu Mar 24 20:52:52 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Thu Mar 24 20:47:00 2005 Subject: [Bioperl-l] Does BioPerl like mpiBlast? In-Reply-To: <529e768305031714193ab15b9d@mail.gmail.com> References: <529e768305031714193ab15b9d@mail.gmail.com> Message-ID: Are you saying would it be hard to parse BLAST from MPIBLAST -- no. It should already work with Bio::SearchIO. Is it hard to run MPIBLAST from within bioperl - you could just write a simple wrapper module that looks a lot like StandAloneBlast (but simpler). -jason -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ On Mar 17, 2005, at 2:19 PM, Scott Lambdin wrote: > Help please. The scientists have found a blast job that eats all the > user memory (~4Gigabytes) on the little 32-bit blast server I set up > for them. I was looking at giving them mpiBLAST so that they can > spread the database over some processes, but a requirement is to have > the BLAST program usable by the BioPerl. Would it be hard for them to > use mpiBLAST in BioPerl? That is, harder than using regular NCBI > BLAST? > > --Scott > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From jason.stajich at duke.edu Thu Mar 24 20:55:38 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Thu Mar 24 20:49:51 2005 Subject: [Bioperl-l] Hierarchical location parsing In-Reply-To: <1111679962.18235.8.camel@homer> References: <1111679962.18235.8.camel@homer> Message-ID: <1ac3134a474c60b4b72dde9289c65f24@duke.edu> Is there a real example where these types of locations exist - why can't it be flattened without the nested joins? At any rate - I don't really care to parse these if they never exist "in-nature". If your bugfix soln works and doesn't slow things down we can use it I guess, although I prefer a regexp. I don't really have time to patch or test in the near future so it will have to wait for someone to volunteer to get to it. -jason -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ On Mar 24, 2005, at 7:59 AM, Mark Hoebeke wrote: > Hi, > > confronted with a bug related to hierarchical location parsing[1], I > checked the source code of Bio::Factory::FTLocationFactory.pm (both in > 1.5 and bioperl-live). The comments around the code clearly state that > hierarchical locations are not supported. > > Is this shortcoming due to performance concerns, or just because it > seems tedious to code ;D ? > > > Mark > > [1] Example of hierarchical location description : > join(1000,join(2000,join(3000,4000))) > > > > > > > > > -- > -------------------------- > Mark.Hoebeke@jouy.inra.fr---------------------- > Unit? Statistique & G?nome Unit? > MIG > +33 (0)1 60 87 38 03 T?l. +33 (0)1 34 65 28 > 85 > +33 (0)1 60 87 38 09 Fax. +33 (0)1 34 65 29 > 01 > Tour Evry 2, 523 pl. des Terrasses INRA - Domaine de > Vilvert > F - 91000 Evry F - 78352 Jouy-en-Josas > CEDEX > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From iluminati at earthlink.net Thu Mar 24 21:10:51 2005 From: iluminati at earthlink.net (iluminati@earthlink.net) Date: Thu Mar 24 21:03:10 2005 Subject: [Bioperl-l] Question about accessing tags in Bio::SeqFeature::Generic Message-ID: <4243732B.9050406@earthlink.net> I have a question about a the Bio::SeqFeature::Generic that doesn't seem clear to me from the docs. Here's an example of the seq feature I'm creating... my $RepeatElement = new Bio::SeqFeature::Generic( -start => $L1HERVLine[6], -end => $L1HERVLine[7], -strand => $L1HERVLine[9], -source => 'Repeat', -tag =>{ -repName => $L1HERVLine[10], -repClass => $L1HERVLine[11], -repFamily => $L1HERVLine[12]} ); Now, the feature itself creates fine. However, it isn't clear how I would retrieve information from the tag has. The get_tag_value() function isn't working for me, and I can't access the hash directly. What should I do to be able to access the data? Let me know, and thanks in advance. From hlapp at gmx.net Fri Mar 25 02:40:03 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri Mar 25 02:34:43 2005 Subject: [Bioperl-l] Question about accessing tags in Bio::SeqFeature::Generic In-Reply-To: <4243732B.9050406@earthlink.net> Message-ID: <198497D2-9D01-11D9-B83F-000A959EB4C4@gmx.net> *Always* provide the error message. Nobody of us has a crystal ball. 'isn't working for me' why? because of error or because you need something that it isn't designed to return? On Thursday, March 24, 2005, at 06:10 PM, iluminati@earthlink.net wrote: > I have a question about a the Bio::SeqFeature::Generic that doesn't > seem clear to me from the docs. Here's an example of the seq feature > I'm creating... > > my $RepeatElement = new Bio::SeqFeature::Generic( -start => > $L1HERVLine[6], > -end => $L1HERVLine[7], > -strand => $L1HERVLine[9], > -source => 'Repeat', > -tag =>{ > -repName => $L1HERVLine[10], > -repClass => > $L1HERVLine[11], > -repFamily => > $L1HERVLine[12]} > ); > > Now, the feature itself creates fine. However, it isn't clear how I > would retrieve information from the tag has. The get_tag_value() > function isn't working for me, and I can't access the hash directly. > What should I do to be able to access the data? Let me know, and > thanks in advance. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From Marc.Logghe at devgen.com Fri Mar 25 03:04:25 2005 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Fri Mar 25 02:59:41 2005 Subject: [Bioperl-l] Question about accessing tags inBio::SeqFeature::Generic Message-ID: > > *Always* provide the error message. Nobody of us has a crystal ball. > 'isn't working for me' why? because of error or because you > need something that it isn't designed to return? No crystall ball indeed. Intestines of toads might do well also. There are a lots of toads currently migrating and (unfortunately) lots of them are flattenend by cars. Anyhow, what I could see in there is that you probably call get_tag_values (watch out here, it is 'get_tag_values', plural, not 'get_tag_value', because you might have multiple values for a certain tag) in scalar context and not in list context. So, you should be doing something like: my ($repclass) = $seq->get_tag_values('repClass'); # or $seq->get_tag_values('-repClass') when you want to keep the hyphens in your keys Also the -tag option takes a hash ref, so I think it is better not to use hyphens in there for the keys. HTH and the toad intestines have not let me down ;-) Marc > > On Thursday, March 24, 2005, at 06:10 PM, iluminati@earthlink.net > wrote: > > > I have a question about a the Bio::SeqFeature::Generic that doesn't > > seem clear to me from the docs. Here's an example of the > seq feature > > I'm creating... > > > > my $RepeatElement = new Bio::SeqFeature::Generic( -start => > > $L1HERVLine[6], > > -end => $L1HERVLine[7], > > -strand => > $L1HERVLine[9], > > -source => 'Repeat', > > -tag =>{ > > -repName => > $L1HERVLine[10], > > -repClass => > > $L1HERVLine[11], > > -repFamily => > > $L1HERVLine[12]} > > ); > > > > Now, the feature itself creates fine. However, it isn't > clear how I > > would retrieve information from the tag has. The get_tag_value() > > function isn't working for me, and I can't access the hash directly. > > What should I do to be able to access the data? Let me know, and > > thanks in advance. > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp at gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > From brian_osborne at cognia.com Fri Mar 25 07:54:05 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Fri Mar 25 07:48:26 2005 Subject: [Bioperl-l] Hierarchical location parsing In-Reply-To: <1111679962.18235.8.camel@homer> Message-ID: Mark, I'm afraid I don't know the answer to your question but let me turn the question around: would you like to help us fix this? Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Mark Hoebeke Sent: Thursday, March 24, 2005 10:59 AM To: bioperl-l@portal.open-bio.org Subject: [Bioperl-l] Hierarchical location parsing Hi, confronted with a bug related to hierarchical location parsing[1], I checked the source code of Bio::Factory::FTLocationFactory.pm (both in 1.5 and bioperl-live). The comments around the code clearly state that hierarchical locations are not supported. Is this shortcoming due to performance concerns, or just because it seems tedious to code ;D ? Mark [1] Example of hierarchical location description : join(1000,join(2000,join(3000,4000))) -- --------------------------Mark.Hoebeke@jouy.inra.fr---------------------- Unit? Statistique & G?nome Unit? MIG +33 (0)1 60 87 38 03 T?l. +33 (0)1 34 65 28 85 +33 (0)1 60 87 38 09 Fax. +33 (0)1 34 65 29 01 Tour Evry 2, 523 pl. des Terrasses INRA - Domaine de Vilvert F - 91000 Evry F - 78352 Jouy-en-Josas CEDEX From brian_osborne at cognia.com Fri Mar 25 07:58:00 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Fri Mar 25 07:52:06 2005 Subject: [Bioperl-l] Help with hmmpfam In-Reply-To: <8D44604203DAF9438BF9123B4A08C779B2700A@alpha.ioc.fiocruz.br> Message-ID: Kary, It could be that there's something odd about your hmmpfam output file, for that reason you should probably show us its contents. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Kary Ann Del Carmen Soriano Ocana Sent: Thursday, March 24, 2005 3:24 PM To: bioperl-l@portal.open-bio.org Cc: maruco@gmail.com Subject: [Bioperl-l] Help with hmmpfam Dear All, I am new to bioperl and would like (if possible) to obtain some help with the SearcIO module and hmmpfam. I am listing my code below and the output containing the following error: (partial) output and error: [kary@vivax inserir_dados]$ perl bioperl_pfam_23_03_05.pl Passou a definicao do arquivo query passou abrir o arquivo mmm.hmm sh: -c: line 0: syntax error near unexpected token `(' sh: -c: line 0: `/usr/local/bin/hmmpfam -E 0.0001 1 HMMER2.0 [2.3.2]\nNAME 76GJYz8zFm\nLENG 327\nALPH I put some "print" commands everywhere to see where I am getting the error and looks like it is not entering/printing the while results (eg: next_result, next_hit). Any help would be greatly appreciated. Thanks, Kary ************ Script: #!/usr/bin/perl -w use lib "/usr/local/bioperl14"; use lib "/usr/local/bioperl-run-1.4"; use Bio::Search::Result::HMMERResult; use Bio::Tools::Run::Hmmer; use Bio::Tools::Run::Hmmpfam; use strict; my $query; my $db; my $seq; my $dbfile; my @array; $query = "sequencia_fasta_4_arg.txt"; print "Passou a definicao do arquivo query\n"; open (READ, "$query") or die "Cannot open $query: $!"; while (my $sequence = ){ for ($sequence) { &hmmpfam($sequence); #print $seq; } } close (READ); print "Passou leitura do arquivo query\n"; ############################################################################ ################################################################# sub hmmpfam { my ($seq) = @_; $db = "mmm.hmm"; open (DH, "$db") or die "Cannot open $db: $!"; print "passou abrir o arquivo mmm.hmm\n\n"; while ($dbfile = ){ #Build a Hmmpfam factory my @params = ('DB'=>$dbfile,'E'=>0.0001); my $factory = Bio::Tools::Run::Hmmpfam->new(@params); # Pass the factory a Bio::Seq object or a file name # returns a Bio::SearchIO object my $search = $factory->run($seq); print "Search: $search\n"; print "Passou search com parametros \n"; my @feat; my $searchio = new Bio::SearchIO(-format => 'hmmer', -file => 'result.hmmer') or die print "Error for open the file"; while (my $result = $searchio->next_result){ print "come?a o while do NEXT RESULT\n\n"; while(my $hit = $result->next_hit){ print "come?a o while do HIT - NEXT HIT\n\n"; while (my $hsp = my $hit->next_hsp){ print join("\t", ( my$r->query_name, $hsp->query->start, $hsp->query->end, $hit->name, $hsp->hit->start, $hsp->hit->end, $hsp->score, $hsp->evalue, $hsp->seq_str, )), "\n"; print "terminou o while dos HSPs\n\n"; } } } } close (DH); } _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From brian_osborne at cognia.com Fri Mar 25 11:52:46 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Fri Mar 25 11:46:52 2005 Subject: [Bioperl-l] Hierarchical location parsing In-Reply-To: <1111766199.18772.13.camel@homer> Message-ID: Mark, Can you also attach the sequence file that you used in order to test your code? That way I can write a test specifically for the parsing of hierarchical locations. You wrote "I'm not sure the new patch won't slow down location parsing considerably..." Have you actually timed the parsing using the old and new code? Thanks again, Brian O. -----Original Message----- From: Mark Hoebeke [mailto:Mark.Hoebeke@jouy.inra.fr] Sent: Friday, March 25, 2005 10:57 AM To: brian.osborne@cognia.com Cc: bioperl-l@portal.open-bio.org Subject: Re: [Bioperl-l] Hierarchical location parsing Hi Brian, In fact, I filed a bug request (#1765) to which I attached a patch. I checked that the patched FTLocationFactory.pm and the unpatched one in the bioperl-live CVS repository exposed the same behaviour when running 'make test'. Of course, I don't know the variety of location descriptions found in the test scripts... Mark > Mark, > > I'm afraid I don't know the answer to your question but let me turn the > question around: would you like to help us fix this? > > Brian O. > -- --------------------------Mark.Hoebeke@jouy.inra.fr---------------------- Unit? Statistique & G?nome Unit? MIG +33 (0)1 60 87 38 03 T?l. +33 (0)1 34 65 28 85 +33 (0)1 60 87 38 09 Fax. +33 (0)1 34 65 29 01 Tour Evry 2, 523 pl. des Terrasses INRA - Domaine de Vilvert F - 91000 Evry F - 78352 Jouy-en-Josas CEDEX From hlapp at gmx.net Sat Mar 26 23:55:08 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat Mar 26 23:49:10 2005 Subject: [Bioperl-l] Question about accessing tags in Bio::SeqFeature::Generic In-Reply-To: <42445787.8020301@earthlink.net> Message-ID: <6474577C-9E7C-11D9-A406-000A959EB4C4@gmx.net> $feature->get_tag_values() will throw an exception if the tag does not exist. You need to check first using $feature->has_tag(). Apparently you asked for values for tag 'tag', which is not a tag given your initialization example. Instead, -repname, -repClass, etc are tags that you can ask the values for. On Friday, March 25, 2005, at 10:25 AM, iluminati@earthlink.net wrote: > Fair enough. Here's the error message... > ------------- EXCEPTION ------------- > MSG: asking for tag value that does not exist tag > STACK Bio::SeqFeature::Generic::get_tag_values > C:/Perl/site/lib/Bio/SeqFeature/G > eneric.pm:501 > STACK main::L1PA1presence L1PA1presence.pm:43 > STACK toplevel ThesisScript.pl:167 > > -------------------------------------- > I know that it's supposed to return an array from which I can access > the tag values, but if I can't get the array, how can I get the tag > values? Thanks for the help. > > > Hilmar Lapp wrote: > >> *Always* provide the error message. Nobody of us has a crystal ball. >> 'isn't working for me' why? because of error or because you need >> something that it isn't designed to return? >> >> On Thursday, March 24, 2005, at 06:10 PM, iluminati@earthlink.net >> wrote: >> >>> I have a question about a the Bio::SeqFeature::Generic that doesn't >>> seem clear to me from the docs. Here's an example of the seq >>> feature I'm creating... >>> >>> my $RepeatElement = new Bio::SeqFeature::Generic( -start => >>> $L1HERVLine[6], >>> -end => $L1HERVLine[7], >>> -strand => $L1HERVLine[9], >>> -source => 'Repeat', >>> -tag =>{ >>> -repName => >>> $L1HERVLine[10], >>> -repClass => >>> $L1HERVLine[11], >>> -repFamily => >>> $L1HERVLine[12]} >>> ); >>> >>> Now, the feature itself creates fine. However, it isn't clear how I >>> would retrieve information from the tag has. The get_tag_value() >>> function isn't working for me, and I can't access the hash directly. >>> What should I do to be able to access the data? Let me know, and >>> thanks in advance. >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >>> > > From sutripa at vbi.vt.edu Sat Mar 26 23:57:21 2005 From: sutripa at vbi.vt.edu (Sucheta Tripathy) Date: Sat Mar 26 23:51:25 2005 Subject: [Bioperl-l] drawing correct orientations of subject strands Message-ID: <3563.199.3.136.4.1111899441.squirrel@webmail.vbi.vt.edu> Hi Group, I have been trying to plot the correct directions of the HSPs of a standard blast output using Bio::Graphics. I don't know where I am going wrong,all the arrows are pointing to one direction. Any help in this will be greatly appreciated. Here is what I tried: use strict; use Bio::Graphics; use Bio::SearchIO; my $file = shift or die "Usage: blast_graphics.pl \n"; my $out_file = shift; my $eval = shift; my $num_tracks = shift; my $searchio = Bio::SearchIO->new(-file => $file, -format => 'blast') or die "parse failed"; my $result = $searchio->next_result() or die "no result"; my $panel = Bio::Graphics::Panel->new(-length => $result->query_length, -width => 800, -pad_left => 10, -pad_right => 10, ); my $full_length = Bio::SeqFeature::Generic->new(-start => 1, -end => $result->query_length, -display_name=>$result->query_name ); $panel->add_track($full_length, -glyph => 'arrow', -tick => 2, -fgcolor => 'black', -double => 1, -label => 1, ); my $track = $panel->add_track(-glyph => 'graded_segments', -label => 1, -connector => 'dashed', -bgcolor => 'blue', -font2color => 'red', -lineWidth => 1, -stranded => 1, -sort_order => 'high_score', -description => sub { my $feature = shift; return unless $feature->has_tag('description'); my ($description) = $feature->each_tag_value('description'); my $score = $feature->score; "$description, score=$score"; }); my $i=0; my $strand; while( my $hit = $result->next_hit ) { next unless $hit->significance < $eval; $i++; my $feature = Bio::SeqFeature::Generic->new(-score => $hit->raw_score, -display_name => $hit->name, -strand => $strand, -tag => { description => $hit->description }, ); while( my $hsp = $hit->next_hsp ) { $strand=$hsp->sbjct->strand; print "strand is $strand"; $feature->add_sub_SeqFeature($hsp,'EXPAND'); } $track->add_feature($feature); if($i >= $num_tracks){ last;} } open FH,">$out_file" or die "can't open file $out_file for writing\n $!"; print FH $panel->png; close(FH); many thanks Sucheta -- Sucheta Tripathy Virginia Bioinformatics Institute Phase-I Washington street. Virginia Tech. Blacksburg,VA 24061-0447 phone:(540)231-8138 Fax: (540) 231-2606 From rob at salmonella.org Sun Mar 27 00:21:40 2005 From: rob at salmonella.org (Rob Edwards) Date: Sun Mar 27 00:17:19 2005 Subject: [Bioperl-l] drawing correct orientations of subject strands In-Reply-To: <3563.199.3.136.4.1111899441.squirrel@webmail.vbi.vt.edu> References: <3563.199.3.136.4.1111899441.squirrel@webmail.vbi.vt.edu> Message-ID: <79b0d47c658a258b60c75262561dd1f5@salmonella.org> > my $strand; > while( my $hit = $result->next_hit ) { > next unless $hit->significance < $eval; > $i++; > my $feature = Bio::SeqFeature::Generic->new(-score => > $hit->raw_score, > -display_name => > $hit->name, > -strand => > $strand, > It looks like at this point $strand is not set to anything. Shouldn't you move the while (my $hsp = $hit->next_hsp){ loop above setting -strand? Rob > -tag => { > > description > => > $hit->description > }, > ); > while( my $hsp = $hit->next_hsp ) { > $strand=$hsp->sbjct->strand; > print "strand is $strand"; > $feature->add_sub_SeqFeature($hsp,'EXPAND'); > } > > From zhoujie at fudan.edu.cn Sun Mar 27 10:07:40 2005 From: zhoujie at fudan.edu.cn (zhoujie@fudan.edu.cn) Date: Sun Mar 27 10:17:03 2005 Subject: [Bioperl-l] A question about flatting taxonomy database Message-ID: Hi all, When I'm using "bp_local_taxonomydb_query.pl" to build a local taxonomy database and query it, I always get a exception saying:"no such file or directory ***, STACK ***", it seems that the nodes file, id2names and names2id files are already created, but how does the error MSG arise? I have already installed the BerkeleyDB module by ppm. Is there anything else that I need to do? Thanks very much for you help. J Z From jason.stajich at duke.edu Sun Mar 27 17:58:16 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Sun Mar 27 17:52:38 2005 Subject: [Bioperl-l] A question about flatting taxonomy database In-Reply-To: References: Message-ID: <1111964296.42473a88d6d90@webmail.duke.edu> Can you show the command line argument that you passing in? You need to tell the script where to find these files. -jason -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ Quoting zhoujie@fudan.edu.cn: > Hi all, > > When I'm using "bp_local_taxonomydb_query.pl" to build a local > taxonomy database and query it, I always get a exception saying:"no > such file or directory ***, STACK ***", it seems that the nodes file, > id2names and names2id files are already created, but how does the > error MSG arise? I have already installed the BerkeleyDB module by > ppm. Is there anything else that I need to do? > > Thanks very much for you help. > > J Z > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From Mark.Hoebeke at jouy.inra.fr Thu Mar 24 23:52:27 2005 From: Mark.Hoebeke at jouy.inra.fr (Mark Hoebeke) Date: Sun Mar 27 17:57:05 2005 Subject: [Bioperl-l] Hierarchical location parsing In-Reply-To: <1ac3134a474c60b4b72dde9289c65f24@duke.edu> References: <1111679962.18235.8.camel@homer> <1ac3134a474c60b4b72dde9289c65f24@duke.edu> Message-ID: <1111726347.30799.19.camel@homer> Sorry I messed up the example I gave, but an "in-nature" hierarchical location can be found in the complete genome of Streptococcus pyogenes strain MGAS315 (Genbank access number AE014074) : source join(1..749107,join(788646..977266,join(1018339..1137553, join(1171973..1230114,join(1271911..1313193, join(1351400..1410541,1450556..1900521)))))) In this case, it seems likely that the joins could be flattened out. However, when massively feeding Genbank entries into a database it could be unpractical to re-parse location strings to determine if 1/ they contain nested joins and 2/ they can or cannot be flattened out. I don't know to what extent the FTLocationFactory is tested when running 'make test' on a bioper-live tree, but it yields the same results on both patched and unpatched trees. Mark Le jeudi 24 mars 2005 ? 17:55 -0800, Jason Stajich a ?crit : > Is there a real example where these types of locations exist - why > can't it be flattened without the nested joins? At any rate - I don't > really care to parse these if they never exist "in-nature". If your > bugfix soln works and doesn't slow things down we can use it I guess, > although I prefer a regexp. I don't really have time to patch or test > in the near future so it will have to wait for someone to volunteer to > get to it. > > -jason > -- > Jason Stajich > jason.stajich at duke.edu > http://www.duke.edu/~jes12/ --------------------------Mark.Hoebeke@jouy.inra.fr---------------------- Unit? Statistique & G?nome Unit? MIG +33 (0)1 60 87 38 03 T?l. +33 (0)1 34 65 28 85 +33 (0)1 60 87 38 09 Fax. +33 (0)1 34 65 29 01 Tour Evry 2, 523 pl. des Terrasses INRA - Domaine de Vilvert F - 91000 Evry F - 78352 Jouy-en-Josas CEDEX -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: Ceci est une partie de message =?ISO-8859-1?Q?num=E9riquement?= =?ISO-8859-1?Q?_sign=E9e?= Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050325/f8dfb372/attachment.bin From vaughn at cshl.org Fri Mar 25 07:39:36 2005 From: vaughn at cshl.org (Matthew Vaughn) Date: Sun Mar 27 17:57:08 2005 Subject: [Bioperl-l] Re: How to express 'histogram' data in GFF3 Message-ID: I posted a question about this a few days ago and have worked out what appears to be a definitive answer, thanks to some advice from Scott Cain. I thought I'd share what appears to work with BioPerl 1.5 and Gbrowse 1.62. For a given bit of histogram-type data, proper GFF2 formatting was as follows: ChrII fwd chip1 0 100 45.4 + . chip1 ChrII:fwd Contrast this with GFF3 format for the same data point ChrII fwd chip1 0 100 45.4 + . ID=chip1:ChrII:fwd Basically, I merged what used to be the group field into an ID tag. Technically, the ':' character should be HTML-escaped, leaving the ID tag like so ChrII fwd chip1 0 100 45.4 + . ID=chip1%3AChrII%3Afwd Does the fact the ID is not unique violate the GFF3 spec? That's a tough question that I leave to the experts. The gbrowse configuration file aggregators for GFF2 and GFF3 are the same, in this case: aggregators = agg1{chip1:fwd} Scott suggested that I might need to create a region feature, then assign my histogram data points to it as children using the new Parent attribute of GFF3. However, it appears that the custom aggregator takes care of this. Clicking on the histogram in my current genome browser yields a gbrowse_detail page with all the histogram data points within the currently displayed span of coordinates. -- Matthew W. Vaughn, Ph.D. Cold Spring Harbor Laboratory Delbruck Laboratory / Martienssen Group 1 Bungtown Road Cold Spring Harbor, NY 11724 phone: (516) 367-8469 -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2359 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050325/3c0e4b5c/smime.bin From Mark.Hoebeke at jouy.inra.fr Fri Mar 25 10:56:39 2005 From: Mark.Hoebeke at jouy.inra.fr (Mark Hoebeke) Date: Sun Mar 27 17:57:11 2005 Subject: [Bioperl-l] Hierarchical location parsing Message-ID: <1111766199.18772.13.camel@homer> Hi Brian, In fact, I filed a bug request (#1765) to which I attached a patch. I checked that the patched FTLocationFactory.pm and the unpatched one in the bioperl-live CVS repository exposed the same behaviour when running 'make test'. Of course, I don't know the variety of location descriptions found in the test scripts... Mark > Mark, > > I'm afraid I don't know the answer to your question but let me turn the > question around: would you like to help us fix this? > > Brian O. > -- --------------------------Mark.Hoebeke@jouy.inra.fr---------------------- Unit? Statistique & G?nome Unit? MIG +33 (0)1 60 87 38 03 T?l. +33 (0)1 34 65 28 85 +33 (0)1 60 87 38 09 Fax. +33 (0)1 34 65 29 01 Tour Evry 2, 523 pl. des Terrasses INRA - Domaine de Vilvert F - 91000 Evry F - 78352 Jouy-en-Josas CEDEX -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: Ceci est une partie de message =?ISO-8859-1?Q?num=E9riquement?= =?ISO-8859-1?Q?_sign=E9e?= Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050325/b14ecffe/attachment.bin From Mark.Hoebeke at jouy.inra.fr Fri Mar 25 15:23:42 2005 From: Mark.Hoebeke at jouy.inra.fr (Mark Hoebeke) Date: Sun Mar 27 17:57:14 2005 Subject: [Bioperl-l] Hierarchical location parsing In-Reply-To: References: Message-ID: <1111782222.18772.37.camel@homer> Brian, an example of a nested location is found in the 'source' feature of the Genbank entry having accession AE014074 (Streptococcus pyogenes MGAS315 complete genome). As the file is over 1 Meg in size once compressed it might not be a good idea to attach it to this mail which is CC'ed to bioperl-l ;D Regarding the performance hit of my fix, I feared that replacing a compiled regexp with a split and a loop over every character of the string could have a significant impact. As it stands, I timed a simple parsing script swallowing Genbank files and spitting out each feature location as a GFF string, on 131 complete microbial genomes. There is no difference in output between the bioperl-live FTLocationFactory and its patched version (basically meaning that this test sample did not contain nested locations). The times are comparable, with even a slight advantage to the patched version (915.66user 19.53system 15:42.19elapsed 99%CPU vs. 938.06user 17.33system 16:04.15elapsed 99%CPU). When comparing the outputs of the parser run on a file with a nested location, it appears that without the bugfix, the nested location yields an incorrect GFF string as shown by the diff below. [mark@homer Loc]$ diff MGAS315 MGAS315_patched 1c1 < join(1..749107,join(788646..977266,join(1018339..1137553,join(1171973..1230114,join(1271911..1313193,join(1351400..1410541,1450556..1900521),) --- > join(1..749107,join(788646..977266,join(1018339..1137553,join(1171973..1230114,join(1271911..1313193,join(1351400..1410541,1450556..1900521)))))) I'm still cautious about the bugfix because I only produced the diffs on microbial genomes, which probably have simpler location definitions that higher eukaryotes. Greetings, Mark Le vendredi 25 mars 2005 ? 11:52 -0500, Brian Osborne a ?crit : > Mark, > > Can you also attach the sequence file that you used in order to test your > code? That way I can write a test specifically for the parsing of > hierarchical locations. > > You wrote "I'm not sure the new patch won't slow down location parsing > considerably..." Have you actually timed the parsing using the old and new > code? > > Thanks again, > > Brian O. > -- --------------------------Mark.Hoebeke@jouy.inra.fr---------------------- Unit? Statistique & G?nome Unit? MIG +33 (0)1 60 87 38 03 T?l. +33 (0)1 34 65 28 85 +33 (0)1 60 87 38 09 Fax. +33 (0)1 34 65 29 01 Tour Evry 2, 523 pl. des Terrasses INRA - Domaine de Vilvert F - 91000 Evry F - 78352 Jouy-en-Josas CEDEX -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: Ceci est une partie de message =?ISO-8859-1?Q?num=E9riquement?= =?ISO-8859-1?Q?_sign=E9e?= Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050325/88bde8b1/attachment.bin From ymc at paxil.stanford.edu Fri Mar 25 18:49:53 2005 From: ymc at paxil.stanford.edu (Yee Man Chan) Date: Sun Mar 27 17:57:18 2005 Subject: [Bioperl-l] Hidden Markov Model in Bioperl? Message-ID: Hi all I just wrote a C module to do Hidden Markov Model (HMM) related calculations. I find that there is no HMM implementation anywhere (there are parsers for HMMER output however) in Bioperl. I think maybe it will be a good idea for me to add this module to Bioperl? I am thinking of an interface like this: Bio::Tools::HMM->new("symbols", "states") - instantiate an HMM object with a string of symbols (each character corresponds to one symbol) and a string of states. Other parameters of the model is generated randomly. Good for starting a Baum-Welch training. Bio::Tools::HMM->new("symbols", "states", array of initial state probabilities, matrix of state transition probabilities, matrix of emission probabilities) - similar to the one before but now we explicit assign the HMM parameters. Bio::Tools::HMM->ObsSeqProb("string of observed sequence") - return the probability of an observed sequence. Bio::Tools::HMM->Viterbi("string of observed sequence") - return a string of hidden sequence that maximize the probability of the happening of the observed sequence. Bio::Tools::HMM->BaumWelchTraining(array of observed sequences) - uses an array of observed sequences to find the HMM parameters that locally maximizes the probabilities of these observed sequences. Optional parameters can be passed to change the tolerance and maximum number of iteration. Bio::Tools::HMM->StatisticalTraining(array of observed sequences, array of hidden state sequences) - when the hidden state sequence is also known, use it to determine the parameter of an HMM using statistical method. Bio::Tools::HMM->getInitArray() - return the array of initial state probabilities as an @array Bio::Tools::HMM->getStateMatrix() - return the matrix of state transition probabilities as MatrixI Bio::Tools::HMM->getEmissionMatrix() - return the matrix of emission probabilities as MatrixI This should cover the most HMM applications. What do you think? Do you have other functions in mind? I already contributed Bio::Tools::dpAlign before, so I am not a newbie. If someone thinks it is a good idea to have this in Bioperl, I can work on it as soon as possible. Best Regards, Yee Man From hlapp at gmx.net Sun Mar 27 18:18:01 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun Mar 27 18:14:17 2005 Subject: [Bioperl-l] Hidden Markov Model in Bioperl? In-Reply-To: Message-ID: <76E15569-9F16-11D9-86E3-000A959EB4C4@gmx.net> Sounds like a cool thing to have in bioperl. Just one minor comment for naming, in perl/bioperl we typically DontUseCapitatilization to delineate words (like in Java) but put underscores. Otherwise to my knowledge you're breaking new ground here so there is no consistency check with the rest of bioperl to be passed, unless I'm missing something. -hilmar On Friday, March 25, 2005, at 03:49 PM, Yee Man Chan wrote: > > Hi all > > I just wrote a C module to do Hidden Markov Model (HMM) related > calculations. I find that there is no HMM implementation anywhere > (there > are parsers for HMMER output however) in Bioperl. I think maybe it > will be > a good idea for me to add this module to Bioperl? > > I am thinking of an interface like this: > > Bio::Tools::HMM->new("symbols", "states") > - instantiate an HMM object with a string of symbols (each character > corresponds to one symbol) and a string of states. Other parameters of > the > model is generated randomly. Good for starting a Baum-Welch training. > > Bio::Tools::HMM->new("symbols", "states", array of initial state > probabilities, matrix of state transition probabilities, matrix of > emission probabilities) > - similar to the one before but now we explicit assign the HMM > parameters. > > Bio::Tools::HMM->ObsSeqProb("string of observed sequence") > - return the probability of an observed sequence. > > Bio::Tools::HMM->Viterbi("string of observed sequence") > - return a string of hidden sequence that maximize the probability of > the > happening of the observed sequence. > > Bio::Tools::HMM->BaumWelchTraining(array of observed sequences) > - uses an array of observed sequences to find the HMM parameters that > locally maximizes the probabilities of these observed sequences. > Optional > parameters can be passed to change the tolerance and maximum number of > iteration. > > Bio::Tools::HMM->StatisticalTraining(array of observed sequences, > array of > hidden state sequences) > - when the hidden state sequence is also known, use it to determine the > parameter of an HMM using statistical method. > > Bio::Tools::HMM->getInitArray() > - return the array of initial state probabilities as an @array > > Bio::Tools::HMM->getStateMatrix() > - return the matrix of state transition probabilities as MatrixI > > Bio::Tools::HMM->getEmissionMatrix() > - return the matrix of emission probabilities as MatrixI > > This should cover the most HMM applications. What do you think? Do > you have other functions in mind? > > I already contributed Bio::Tools::dpAlign before, so I am not a > newbie. If someone thinks it is a good idea to have this in Bioperl, I > can > work on it as soon as possible. > > Best Regards, > Yee Man > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From amackey at pcbi.upenn.edu Mon Mar 28 08:11:33 2005 From: amackey at pcbi.upenn.edu (Aaron J. Mackey) Date: Mon Mar 28 08:07:47 2005 Subject: [Bioperl-l] Hidden Markov Model in Bioperl? In-Reply-To: References: Message-ID: <24c3580c3fa75bee1d50f9c8b9b1c0b1@pcbi.upenn.edu> Yes, in bioperl-ext, of course ... On Mar 25, 2005, at 6:49 PM, Yee Man Chan wrote: > I am thinking of an interface like this: > > Bio::Tools::HMM->new("symbols", "states") > - instantiate an HMM object with a string of symbols (each character > corresponds to one symbol) and a string of states. Other parameters of > the > model is generated randomly. Good for starting a Baum-Welch training. Why not expand this to be two arrayrefs of symbols or states? You can convert them into whatever encoded single-char alphabet you'd like. Think Perl, not C. This is a feature request, not a requirement, of course. > Bio::Tools::HMM->ObsSeqProb("string of observed sequence") > - return the probability of an observed sequence. This is the Forward algorithm P()? Perhaps an alias to Forward(), and the ability to specify an offset/index at which you want the Forward value (see below)? Or is this the product of viterbi factors? > Bio::Tools::HMM->Viterbi("string of observed sequence") > - return a string of hidden sequence that maximize the probability of > the > happening of the observed sequence. this might also return the P() of the viterbi path; and again, instead of returning string of symbols, an arrayref of symbols. > Bio::Tools::HMM->getInitArray() > Bio::Tools::HMM->getStateMatrix() > Bio::Tools::HMM->getEmissionMatrix() Presumably these should be get/set methods? What's missing is 1) posterior decoding and 2) partial path probability (i.e. F_{i}*v_{i+1}*v+{i+2}*...v*_{j-1}*B_{j}/F_{x}, where i < j, F and B are Forward and Backward values, v's are viterbi factors for each step in the partial path specified from i to j) I'd also prefer lower case names (BaumWelch could just be called "train" or "learn_unsupervised" or somesuch) Also, see the HMM functions available in Matlab that do the same ... Good luck, -Aaron -- Aaron J. Mackey, Ph.D. Dept. of Biology, Goddard 212 University of Pennsylvania email: amackey@pcbi.upenn.edu 415 S. University Avenue office: 215-898-1205 Philadelphia, PA 19104-6017 fax: 215-746-6697 From ymc at paxil.stanford.edu Mon Mar 28 12:53:03 2005 From: ymc at paxil.stanford.edu (Yee Man Chan) Date: Mon Mar 28 13:28:05 2005 Subject: [Bioperl-l] Hidden Markov Model in Bioperl? In-Reply-To: <76E15569-9F16-11D9-86E3-000A959EB4C4@gmx.net> Message-ID: On Sun, 27 Mar 2005, Hilmar Lapp wrote: > Sounds like a cool thing to have in bioperl. > > Just one minor comment for naming, in perl/bioperl we typically > DontUseCapitatilization to delineate words (like in Java) but put > underscores. That's fine with me. I can use underscores. Regards, Yee Man > Otherwise to my knowledge you're breaking new ground here > so there is no consistency check with the rest of bioperl to be passed, > unless I'm missing something. > > -hilmar > > On Friday, March 25, 2005, at 03:49 PM, Yee Man Chan wrote: > > > > > Hi all > > > > I just wrote a C module to do Hidden Markov Model (HMM) related > > calculations. I find that there is no HMM implementation anywhere > > (there > > are parsers for HMMER output however) in Bioperl. I think maybe it > > will be > > a good idea for me to add this module to Bioperl? > > > > I am thinking of an interface like this: > > > > Bio::Tools::HMM->new("symbols", "states") > > - instantiate an HMM object with a string of symbols (each character > > corresponds to one symbol) and a string of states. Other parameters of > > the > > model is generated randomly. Good for starting a Baum-Welch training. > > > > Bio::Tools::HMM->new("symbols", "states", array of initial state > > probabilities, matrix of state transition probabilities, matrix of > > emission probabilities) > > - similar to the one before but now we explicit assign the HMM > > parameters. > > > > Bio::Tools::HMM->ObsSeqProb("string of observed sequence") > > - return the probability of an observed sequence. > > > > Bio::Tools::HMM->Viterbi("string of observed sequence") > > - return a string of hidden sequence that maximize the probability of > > the > > happening of the observed sequence. > > > > Bio::Tools::HMM->BaumWelchTraining(array of observed sequences) > > - uses an array of observed sequences to find the HMM parameters that > > locally maximizes the probabilities of these observed sequences. > > Optional > > parameters can be passed to change the tolerance and maximum number of > > iteration. > > > > Bio::Tools::HMM->StatisticalTraining(array of observed sequences, > > array of > > hidden state sequences) > > - when the hidden state sequence is also known, use it to determine the > > parameter of an HMM using statistical method. > > > > Bio::Tools::HMM->getInitArray() > > - return the array of initial state probabilities as an @array > > > > Bio::Tools::HMM->getStateMatrix() > > - return the matrix of state transition probabilities as MatrixI > > > > Bio::Tools::HMM->getEmissionMatrix() > > - return the matrix of emission probabilities as MatrixI > > > > This should cover the most HMM applications. What do you think? Do > > you have other functions in mind? > > > > I already contributed Bio::Tools::dpAlign before, so I am not a > > newbie. If someone thinks it is a good idea to have this in Bioperl, I > > can > > work on it as soon as possible. > > > > Best Regards, > > Yee Man > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp at gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > > From ymc at paxil.stanford.edu Mon Mar 28 13:14:56 2005 From: ymc at paxil.stanford.edu (Yee Man Chan) Date: Mon Mar 28 13:28:09 2005 Subject: [Bioperl-l] Hidden Markov Model in Bioperl? In-Reply-To: <24c3580c3fa75bee1d50f9c8b9b1c0b1@pcbi.upenn.edu> Message-ID: On Mon, 28 Mar 2005, Aaron J. Mackey wrote: > Yes, in bioperl-ext, of course ... That was my intention to add it to bioperl-ext. > > On Mar 25, 2005, at 6:49 PM, Yee Man Chan wrote: > > > I am thinking of an interface like this: > > > > Bio::Tools::HMM->new("symbols", "states") > > - instantiate an HMM object with a string of symbols (each character > > corresponds to one symbol) and a string of states. Other parameters of > > the > > model is generated randomly. Good for starting a Baum-Welch training. > > Why not expand this to be two arrayrefs of symbols or states? You can > convert them into whatever encoded single-char alphabet you'd like. > Think Perl, not C. This is a feature request, not a requirement, of > course. I thought about that too. But I suppose this is an HMM for Bioperl and I don't see any usage outside DNA sequences and protein sequences. So maybe strings are ok? It can be quite tedious if I need to convert a DNA string to an array of DNA characters to use HMM. Can you give me some biological examples that can justify this feature request? > > > Bio::Tools::HMM->ObsSeqProb("string of observed sequence") > > - return the probability of an observed sequence. > > This is the Forward algorithm P()? Perhaps an alias to Forward(), and > the ability to specify an offset/index at which you want the Forward > value (see below)? Or is this the product of viterbi factors? > This is the P(O|lambda), ie given an HMM model and an observed sequence, what is the probability of seeing this observed sequence. It is equivalent to sum_1_to_N alpha_T(i) where alpha is the forward function, T is the length of observed sequence and N is the number of hidden states. Forward and Backward functions are hidden from this interface for now. Oh. Should I return this as log(P)? For a sequence of just couple hundred symbols, P tends to be very close to zero, so maybe log(P) will make more sense to users? > > Bio::Tools::HMM->Viterbi("string of observed sequence") > > - return a string of hidden sequence that maximize the probability of > > the > > happening of the observed sequence. > > this might also return the P() of the viterbi path; and again, instead > of returning string of symbols, an arrayref of symbols. > Based on my understanding of the literature, I don't recall seeing any effort to compute the probability of the hidden state sequence. > > Bio::Tools::HMM->getInitArray() > > Bio::Tools::HMM->getStateMatrix() > > Bio::Tools::HMM->getEmissionMatrix() > > Presumably these should be get/set methods? > Yeah. I should do both get and set. > What's missing is 1) posterior decoding and 2) partial path probability > (i.e. F_{i}*v_{i+1}*v+{i+2}*...v*_{j-1}*B_{j}/F_{x}, where i < j, F and > B are Forward and Backward values, v's are viterbi factors for each > step in the partial path specified from i to j) > I can add posterior_decoding but I am not sure what partial_path_probability is. Can you give me a link to some information about it? > I'd also prefer lower case names (BaumWelch could just be called > "train" or "learn_unsupervised" or somesuch) I have two ways to train the HMM, one is without hidden state sequence supplied (ie BaumWelchTraining) and one is with hidden state sequence (ie StatisticalTraining). Is the former learn_unsupervised and the latter learn_supervised in the AI speak? Regards, Yee Man > > Also, see the HMM functions available in Matlab that do the same ... > > Good luck, > > -Aaron > > -- > Aaron J. Mackey, Ph.D. > Dept. of Biology, Goddard 212 > University of Pennsylvania email: amackey@pcbi.upenn.edu > 415 S. University Avenue office: 215-898-1205 > Philadelphia, PA 19104-6017 fax: 215-746-6697 > From zhoujie at fudan.edu.cn Mon Mar 28 20:27:19 2005 From: zhoujie at fudan.edu.cn (zhoujie@fudan.edu.cn) Date: Mon Mar 28 20:29:16 2005 Subject: [Bioperl-l] A question about flatting taxonomy database Message-ID: Sorry, probably the first mail I replyed was lost, so I send it agian here. My conmmand line is: perl bp_local_taxonomydb_query.pl --nodes nodes.dmp --names names.dmp I only changed one thing: the directory in the script, I changed it to './index' , and it generates the right thing in that directory. But when it finished, the script throw out an exception: ------------------- EXCEPTION ---------------- MSG: No such file or directory ./index/nodes STACK Bio::DB::Taxonomy::flatfile::_db_connect C:/Perl/site/lib/Bio\DBTaxonomy\flatfile.pm:325 STACK Bio::DB::Taxonomy::flatfile::new C:/Perl/site/lib/Bio\Bio\DB\Taxonomy\flatfile.pm:138 STACK Bio::DB::Taxonomy::new C:/Perl/site/lib/Bio/DB/Taxonomy.pm:104 STACK toplevel bp_local_taxonomy_query.pl:22 ----------------------------------------------- I think I have already told the script the location, by the -directory parameter in the new method of Bio::DB::Taxonomy, at line 25 of the script. Is there anything wrong with my process? J Z ----- Ô­Óʼþ ----- ´Ó: Jason Stajich ÈÕÆÚ: ÐÇÆÚÒ», ÈýÔ 28ÈÕ, 2005 ÉÏÎç6:58 Ö÷Ìâ: Re: [Bioperl-l] A question about flatting taxonomy database > Can you show the command line argument that you passing in? You > need to tell > the script where to find these files. > > -jason > -- > Jason Stajich > jason.stajich at duke.edu > http://www.duke.edu/~jes12/ > > > Quoting zhoujie@fudan.edu.cn: > > > Hi all, > > > > When I'm using "bp_local_taxonomydb_query.pl" to build a local > > taxonomy database and query it, I always get a exception > saying:"no > > such file or directory ***, STACK ***", it seems that the nodes > file, > > id2names and names2id files are already created, but how does > the > > error MSG arise? I have already installed the BerkeleyDB module > by > > ppm. Is there anything else that I need to do? > > > > Thanks very much for you help. > > > > J Z > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From babyleo11 at yahoo.com.sg Tue Mar 29 01:38:15 2005 From: babyleo11 at yahoo.com.sg (Minyi) Date: Tue Mar 29 01:32:24 2005 Subject: [Bioperl-l] BLASTP Message-ID: <20050329063815.52422.qmail@web40614.mail.yahoo.com> hi all, i'm doing a program to run blastp on cgi/perl. However, there's no hits found no matter what files i use. But when i run the program using standalone blast with the same files, there are hits found. Also, the same program can work for blastn on cgi/perl and standalone. The only thing it can't work is blastp on cgi/perl. Does anyone know what's the problem? Thank You! Regards, Minyi " This is the beginning of a new day. You have been given this day to use as you will. You can waste it or use it for good. What you do today is important because you are exchanging a day of your life for it. When tomorrow comes, this day will be gone forever; in its place is something that you have left behind...let it be something good. " Send instant messages to your online friends http://uk.messenger.yahoo.com From muratem at eng.uah.edu Tue Mar 29 07:57:59 2005 From: muratem at eng.uah.edu (Mike Muratet) Date: Tue Mar 29 07:52:05 2005 Subject: [Bioperl-l] Primer3.pm Message-ID: Greetings I know this has come up before, but I can't seem to track down the answer. There is a Primer3.pm in Bio/Tools. The latest 1.4 module docs on the web page place it there. It's also in Bio/Tools/Run through bioperl-run-1.4. Which is the correct (or best) path/version to use? thanks Mike From brian_osborne at cognia.com Tue Mar 29 08:09:05 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Tue Mar 29 08:04:31 2005 Subject: [Bioperl-l] Hierarchical location parsing In-Reply-To: <1111782222.18772.37.camel@homer> Message-ID: Mark, I didn't see any "join(join..." statements in that Genbank entry, as part of a source feature or anywhere else. I'm used this URL: http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=21909536 Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Mark Hoebeke Sent: Friday, March 25, 2005 3:24 PM To: Brian Osborne Cc: bioperl-l@portal.open-bio.org Subject: RE: [Bioperl-l] Hierarchical location parsing Brian, an example of a nested location is found in the 'source' feature of the Genbank entry having accession AE014074 (Streptococcus pyogenes MGAS315 complete genome). As the file is over 1 Meg in size once compressed it might not be a good idea to attach it to this mail which is CC'ed to bioperl-l ;D Regarding the performance hit of my fix, I feared that replacing a compiled regexp with a split and a loop over every character of the string could have a significant impact. As it stands, I timed a simple parsing script swallowing Genbank files and spitting out each feature location as a GFF string, on 131 complete microbial genomes. There is no difference in output between the bioperl-live FTLocationFactory and its patched version (basically meaning that this test sample did not contain nested locations). The times are comparable, with even a slight advantage to the patched version (915.66user 19.53system 15:42.19elapsed 99%CPU vs. 938.06user 17.33system 16:04.15elapsed 99%CPU). When comparing the outputs of the parser run on a file with a nested location, it appears that without the bugfix, the nested location yields an incorrect GFF string as shown by the diff below. [mark@homer Loc]$ diff MGAS315 MGAS315_patched 1c1 < join(1..749107,join(788646..977266,join(1018339..1137553,join(1171973..12301 14,join(1271911..1313193,join(1351400..1410541,1450556..1900521),) --- > join(1..749107,join(788646..977266,join(1018339..1137553,join(1171973..12301 14,join(1271911..1313193,join(1351400..1410541,1450556..1900521)))))) I'm still cautious about the bugfix because I only produced the diffs on microbial genomes, which probably have simpler location definitions that higher eukaryotes. Greetings, Mark Le vendredi 25 mars 2005 ? 11:52 -0500, Brian Osborne a ?crit : > Mark, > > Can you also attach the sequence file that you used in order to test your > code? That way I can write a test specifically for the parsing of > hierarchical locations. > > You wrote "I'm not sure the new patch won't slow down location parsing > considerably..." Have you actually timed the parsing using the old and new > code? > > Thanks again, > > Brian O. > -- --------------------------Mark.Hoebeke@jouy.inra.fr---------------------- Unit? Statistique & G?nome Unit? MIG +33 (0)1 60 87 38 03 T?l. +33 (0)1 34 65 28 85 +33 (0)1 60 87 38 09 Fax. +33 (0)1 34 65 29 01 Tour Evry 2, 523 pl. des Terrasses INRA - Domaine de Vilvert F - 91000 Evry F - 78352 Jouy-en-Josas CEDEX From muratem at eng.uah.edu Tue Mar 29 08:43:54 2005 From: muratem at eng.uah.edu (Mike Muratet) Date: Tue Mar 29 08:40:11 2005 Subject: [Bioperl-l] Primer3.pm In-Reply-To: Message-ID: On Tue, 29 Mar 2005, Mike Muratet wrote: > Greetings > > I know this has come up before, but I can't seem to track down the answer. > There is a Primer3.pm in Bio/Tools. The latest 1.4 module docs on the web > page place it there. It's also in Bio/Tools/Run through bioperl-run-1.4. > Which is the correct (or best) path/version to use? > > thanks > > Mike > Hello again A careful reading of the documentation for each module would indicate that the former above is called by the latter which is probably the answer to the question unless someone knows otherwise. There is nothing in the docs on the bioperl webpage for Bio/Tools/Run/Primer3.pm. Mike From palmeida at igc.gulbenkian.pt Tue Mar 29 09:53:19 2005 From: palmeida at igc.gulbenkian.pt (Paulo Almeida) Date: Tue Mar 29 09:47:01 2005 Subject: [Bioperl-l] BLASTP In-Reply-To: <20050329063815.52422.qmail@web40614.mail.yahoo.com> References: <20050329063815.52422.qmail@web40614.mail.yahoo.com> Message-ID: <20050329145319.GA8773@bioinf.igc.gulbenkian.pt> Hi Minyi, Have you tried running blastp on that file with perl, from the command line? When you run it with cgi, can you check the webserver's log to see if there are any errors, or send the error output to the browser (I think you can do it with CGI::Carp, but I haven't done that in a long time)? -Paulo On Tue, Mar 29, 2005 at 07:38:15AM +0100, Minyi wrote: > hi all, > i'm doing a program to run blastp on cgi/perl. However, there's no hits found no matter what files i use. But when i run the program using standalone blast with the same files, there are hits found. Also, the same program can work for blastn on cgi/perl and standalone. The only thing it can't work is blastp on cgi/perl. Does anyone know what's the problem? Thank You! -- Paulo Almeida Instituto Gulbenkian de Ciencia Apartado 14, 2781-901, Oeiras, PORTUGAL tel +351 21 446 46 35 fax +351 21 440 79 70 http://www.igc.gulbenkian.pt From rob at salmonella.org Tue Mar 29 11:31:56 2005 From: rob at salmonella.org (Rob Edwards) Date: Tue Mar 29 11:26:08 2005 Subject: [Bioperl-l] Primer3.pm In-Reply-To: References: Message-ID: <67be66fa1184724340efba2b24dd71e0@salmonella.org> Bio::Tools::Run::Primer3 is the interface to run the primer3 program. Bio::Tools::Primer3 is the interface to parse the output from primer3. If you already have run primer3 (or do it outside bioperl) then you don't need the run module, you can use the parsing module and pass in the file. If you want to take a sequence object, design primers against it using primer3, and get sequence objects back for the primers and the products then you need both. Rob On Mar 29, 2005, at 5:43 AM, Mike Muratet wrote: > > > On Tue, 29 Mar 2005, Mike Muratet wrote: > >> Greetings >> >> I know this has come up before, but I can't seem to track down the >> answer. >> There is a Primer3.pm in Bio/Tools. The latest 1.4 module docs on the >> web >> page place it there. It's also in Bio/Tools/Run through >> bioperl-run-1.4. >> Which is the correct (or best) path/version to use? >> >> thanks >> >> Mike >> > > Hello again > > A careful reading of the documentation for each module would indicate > that > the former above is called by the latter which is probably the answer > to > the question unless someone knows otherwise. There is nothing in the > docs > on the bioperl webpage for Bio/Tools/Run/Primer3.pm. > > Mike > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From muratem at eng.uah.edu Tue Mar 29 12:13:22 2005 From: muratem at eng.uah.edu (Mike Muratet) Date: Tue Mar 29 12:08:23 2005 Subject: [Bioperl-l] More fun with primer3 Message-ID: Greetings Having deduced (and heard from Rob) that Tools::Run::Primer3 produces a Tools::Primer3 object, I tried the following: my $results = $primer3->run; print "n results ",$results->number_of_results(),"\n"; my $primer = $results->next_primer(); and got n results 4 ------------- EXCEPTION ------------- MSG: The target_sequence must be a Bio::Seq to create this object. STACK Bio::Seq::PrimedSeq::new /usr/local/lib/perl5/site_perl/5.8.0/Bio/Seq/PrimedSeq.pm:232 STACK Bio::Tools::Primer3::next_primer /usr/local/lib/perl5/site_perl/5.8.0/Bio/Tools/Primer3.pm:331 STACK toplevel ./extractAlignments.pl:209 I'm at a loss, it's pretty much a cut and paste out of the docs. Does anybody have any ideas? Cheers Mike PS The perldoc (I have) of Bio::Tools::Run::Primer3 says it returns a Bio::Tools::Run::Primer3 object and not Bio::Tools::Primer3. I think it's the latest version. From skirov at utk.edu Tue Mar 29 16:21:35 2005 From: skirov at utk.edu (Stefan Kirov) Date: Tue Mar 29 16:16:06 2005 Subject: [Bioperl-l] Possible memory leak in Bio::SeqFeature::Gene::GeneStructure? Message-ID: <4249C6DF.3040100@utk.edu> Forgot to mention: Devel::Cycle reports cycle references between GeneStructure and Transcript and perl has a known issue of not being able to destroy such objects. So I guess my question is: Is this a feature or a 'feature' :-) . In any case Thanks Stefan From jason.stajich at duke.edu Tue Mar 29 16:55:04 2005 From: jason.stajich at duke.edu (Jason Stajich) Date: Tue Mar 29 16:48:59 2005 Subject: [Bioperl-l] Possible memory leak in Bio::SeqFeature::Gene::GeneStructure? In-Reply-To: <4249C6DF.3040100@utk.edu> References: <4249C6DF.3040100@utk.edu> Message-ID: <89d38db7d506ac1f04a15c6457b480f4@duke.edu> I had problems with too myself and the memleak actually comes back to bite if you process a lot of genes. I tried to track it down but didn't realize it was a cycle there. We just need to put some code in the DESTROY block to take care of this. Can you send the script which reports the cycle so I can re-test the changes? -jason -- Jason Stajich jason.stajich at duke.edu http://www.duke.edu/~jes12/ On Mar 29, 2005, at 1:21 PM, Stefan Kirov wrote: > Forgot to mention: Devel::Cycle reports cycle references between > GeneStructure and Transcript and perl has a known issue of not being > able to destroy such objects. > So I guess my question is: Is this a feature or a 'feature' :-) . > In any case > Thanks > Stefan > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From skirov at utk.edu Tue Mar 29 16:59:45 2005 From: skirov at utk.edu (Stefan Kirov) Date: Tue Mar 29 16:53:54 2005 Subject: [Bioperl-l] Possible memory leak in Bio::SeqFeature::Gene::GeneStructure? In-Reply-To: <89d38db7d506ac1f04a15c6457b480f4@duke.edu> References: <4249C6DF.3040100@utk.edu> <89d38db7d506ac1f04a15c6457b480f4@duke.edu> Message-ID: <4249CFD1.4090700@utk.edu> Actually I did, but my message is considered suspicious :-( . I will send it directly to your e-mail. Stefan Jason Stajich wrote: > I had problems with too myself and the memleak actually comes back to > bite if you process a lot of genes. I tried to track it down but > didn't realize it was a cycle there. We just need to put some code in > the DESTROY block to take care of this. > > Can you send the script which reports the cycle so I can re-test the > changes? > > -jason > -- > Jason Stajich > jason.stajich at duke.edu > http://www.duke.edu/~jes12/ > > On Mar 29, 2005, at 1:21 PM, Stefan Kirov wrote: > >> Forgot to mention: Devel::Cycle reports cycle references between >> GeneStructure and Transcript and perl has a known issue of not being >> able to destroy such objects. >> So I guess my question is: Is this a feature or a 'feature' :-) . >> In any case >> Thanks >> Stefan >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> > From jcsanchez at cib.csic.es Tue Mar 29 04:47:56 2005 From: jcsanchez at cib.csic.es (Juan Carlos Sanchez Ferrero) Date: Tue Mar 29 16:55:50 2005 Subject: [Bioperl-l] Re:BLASTP Message-ID: <4249244C.1050607@cib.csic.es> Hello, maybe you only have a nt db, but not a protein db accesible by cgi-bin, have you check that? regards jc From skirov at utk.edu Tue Mar 29 16:14:21 2005 From: skirov at utk.edu (Stefan Kirov) Date: Tue Mar 29 16:55:53 2005 Subject: [Bioperl-l] Possible memory leak in Bio::SeqFeature::Gene::GeneStructure? Message-ID: <4249C52D.2050804@utk.edu> I am working on the Entrezgene parser and tried to use Bio::SeqFeature::Gene::GeneStructure to describe NC/NT to NM and NP relationships. I am pretty much done with the parser (based on Mingyi Liu low lovel parser), but once I tried to parse a whole file (Homo sapiens) I ran out of memory. I think the problem might be Bio::SeqFeature::Gene::GeneStructure::add_Transcript. Here is the code which I used to simulate the problem and the resulting report file. It seams adding Bio::SeqFeature::Gene::Exon to Bio::SeqFeature::Gene::Transcript do not contribute to the problem. Any suggestions? Stefan -------------- next part -------------- Simulation 2 0MB Simulation 3 1MB Simulation 4 1MB Simulation 5 1MB Simulation 6 2MB Simulation 7 2MB Simulation 8 2MB Simulation 9 2MB Simulation 10 3MB Simulation 11 3MB Simulation 12 3MB Simulation 13 4MB Simulation 14 4MB Simulation 15 4MB Simulation 16 5MB Simulation 17 5MB Simulation 18 5MB Simulation 19 5MB Simulation 20 6MB Simulation 21 6MB Simulation 22 6MB Simulation 23 7MB Simulation 24 7MB Simulation 25 7MB Simulation 26 8MB Simulation 27 8MB Simulation 28 8MB Simulation 29 9MB Simulation 30 9MB Simulation 31 9MB Simulation 32 9MB Simulation 33 10MB Simulation 34 10MB Simulation 35 10MB Simulation 36 11MB Simulation 37 11MB Simulation 38 11MB Simulation 39 12MB Simulation 40 12MB Simulation 41 12MB Simulation 42 13MB Simulation 43 13MB Simulation 44 13MB Simulation 45 13MB Simulation 46 14MB Simulation 47 14MB Simulation 48 14MB Simulation 49 15MB Simulation 50 15MB Simulation 51 15MB Simulation 52 16MB Simulation 53 16MB Simulation 54 16MB Simulation 55 16MB Simulation 56 17MB Simulation 57 17MB Simulation 58 17MB Simulation 59 18MB Simulation 60 18MB Simulation 61 18MB Simulation 62 19MB Simulation 63 19MB Simulation 64 19MB Simulation 65 19MB Simulation 66 20MB Simulation 67 20MB Simulation 68 20MB Simulation 69 21MB Simulation 70 21MB Simulation 71 21MB Simulation 72 22MB Simulation 73 22MB Simulation 74 22MB Simulation 75 23MB Simulation 76 23MB Simulation 77 23MB Simulation 78 24MB Simulation 79 24MB Simulation 80 24MB Simulation 81 24MB Simulation 82 25MB Simulation 83 25MB Simulation 84 25MB Simulation 85 26MB Simulation 86 26MB Simulation 87 26MB Simulation 88 27MB Simulation 89 27MB Simulation 90 27MB Simulation 91 27MB Simulation 92 28MB Simulation 93 28MB Simulation 94 28MB Simulation 95 29MB Simulation 96 29MB Simulation 97 29MB Simulation 98 30MB Simulation 99 30MB Simulation 100 30MB 6620 6650 -------------- next part -------------- use Bio::SeqFeature::Gene::Exon; use Bio::SeqFeature::Gene::Transcript; use Bio::SeqFeature::Gene::GeneStructure; use strict; use Devel::Cycle; my ($prevmem,$growth,$first); for my $k (1..100) { open (FREE, "free -m|"); my $buf=; my $buf=; my ($x1,$x2,$mem,$x3)=split(/\s+/,$buf,4); if ($prevmem) { $growth+= $mem-$prevmem; print "Simulation $k\t$growth","MB\n"; } else { $first=$mem;} $prevmem=$mem; for my $i (1..20) { my $gstruct=new Bio::SeqFeature::Gene::GeneStructure; for my $n (0..3) { my $transcript=new Bio::SeqFeature::Gene::Transcript(-primary=>'memleak'.$n, -start=>1,-end=>2000,-strand=>, -desc=>'test for memmory leaks'); foreach my $e (1.10) { my $exonobj=new Bio::SeqFeature::Gene::Exon(-start=>$e*10,-end=>$e*10+9,-strand=>1); $transcript->add_exon($exonobj); } $gstruct->add_transcript($transcript); } } } print "$first\t$prevmem\n"; From babyleo11 at yahoo.com.sg Wed Mar 30 01:09:42 2005 From: babyleo11 at yahoo.com.sg (Minyi) Date: Wed Mar 30 01:04:46 2005 Subject: [Bioperl-l] BLASTP In-Reply-To: 6667 Message-ID: <20050330060942.63088.qmail@web40609.mail.yahoo.com> Hi all, i've solved my problem. Thanks Paulo for asking me to check the webserver's log. The program can't work because i didn't have .ncbirc file. Thanks everyone! Cheers! Paulo Almeida wrote: Hi Minyi, Have you tried running blastp on that file with perl, from the command line? When you run it with cgi, can you check the webserver's log to see if there are any errors, or send the error output to the browser (I think you can do it with CGI::Carp, but I haven't done that in a long time)? -Paulo On Tue, Mar 29, 2005 at 07:38:15AM +0100, Minyi wrote: > hi all, > i'm doing a program to run blastp on cgi/perl. However, there's no hits found no matter what files i use. But when i run the program using standalone blast with the same files, there are hits found. Also, the same program can work for blastn on cgi/perl and standalone. The only thing it can't work is blastp on cgi/perl. Does anyone know what's the problem? Thank You! -- Paulo Almeida Instituto Gulbenkian de Ciencia Apartado 14, 2781-901, Oeiras, PORTUGAL tel +351 21 446 46 35 fax +351 21 440 79 70 http://www.igc.gulbenkian.pt _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l Regards, Minyi " This is the beginning of a new day. You have been given this day to use as you will. You can waste it or use it for good. What you do today is important because you are exchanging a day of your life for it. When tomorrow comes, this day will be gone forever; in its place is something that you have left behind...let it be something good. " Send instant messages to your online friends http://uk.messenger.yahoo.com From brian_osborne at cognia.com Wed Mar 30 08:12:59 2005 From: brian_osborne at cognia.com (Brian Osborne) Date: Wed Mar 30 08:08:33 2005 Subject: [Bioperl-l] BLASTP In-Reply-To: <20050329145319.GA8773@bioinf.igc.gulbenkian.pt> Message-ID: Paulo and Minyi, >error output to the browser (I think you can do it with CGI::Carp Yes: use CGI::Carp qw(fatalsToBrowser); Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Paulo Almeida Sent: Tuesday, March 29, 2005 9:53 AM To: bioperl-l@portal.open-bio.org Subject: Re: [Bioperl-l] BLASTP Hi Minyi, Have you tried running blastp on that file with perl, from the command line? When you run it with cgi, can you check the webserver's log to see if there are any errors, or send the error output to the browser (I think you can do it with CGI::Carp, but I haven't done that in a long time)? -Paulo On Tue, Mar 29, 2005 at 07:38:15AM +0100, Minyi wrote: > hi all, > i'm doing a program to run blastp on cgi/perl. However, there's no hits found no matter what files i use. But when i run the program using standalone blast with the same files, there are hits found. Also, the same program can work for blastn on cgi/perl and standalone. The only thing it can't work is blastp on cgi/perl. Does anyone know what's the problem? Thank You! -- Paulo Almeida Instituto Gulbenkian de Ciencia Apartado 14, 2781-901, Oeiras, PORTUGAL tel +351 21 446 46 35 fax +351 21 440 79 70 http://www.igc.gulbenkian.pt _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From babenko at ncbi.nlm.nih.gov Wed Mar 30 12:02:59 2005 From: babenko at ncbi.nlm.nih.gov (Babenko, Vladimir (NIH/NLM/NCBI)) Date: Wed Mar 30 11:57:24 2005 Subject: [Bioperl-l] Turning the tree into bifurcating one Message-ID: <69BA0F938FAC6A4CBEF49461720696F208DDE410@nihexchange16.nih.gov> Greetings, Is there any possible solution to insert some pseudo-nodes into the tree to make it bufurkating? Some programs can deal only with bifurkating ones... Thank you, Vladimir From hlapp at gmx.net Wed Mar 30 12:13:54 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed Mar 30 12:09:12 2005 Subject: [Bioperl-l] Possible memory leak in Bio::SeqFeature::Gene::GeneStructure? In-Reply-To: <4249C52D.2050804@utk.edu> Message-ID: <1818EE1C-A13F-11D9-8431-000A959EB4C4@gmx.net> Those modules probably can use some serious review. If there is a cycle then Jason should be on the right path with overriding DESTROY, but first one would need to know where the cycle is. I don't recall one being there on purpose ... Sorry to not be of more help ... -hilmar On Tuesday, March 29, 2005, at 01:14 PM, Stefan Kirov wrote: > I am working on the Entrezgene parser and tried to use > Bio::SeqFeature::Gene::GeneStructure to describe NC/NT to NM and NP > relationships. I am pretty much done with the parser (based on Mingyi > Liu low lovel parser), but once I tried to parse a whole file (Homo > sapiens) I ran out of memory. I think the problem might be > Bio::SeqFeature::Gene::GeneStructure::add_Transcript. > Here is the code which I used to simulate the problem and the > resulting report file. It seams adding Bio::SeqFeature::Gene::Exon to > Bio::SeqFeature::Gene::Transcript do not contribute to the problem. > Any suggestions? > Stefan > Simulation 2 0MB > Simulation 3 1MB > Simulation 4 1MB > Simulation 5 1MB > Simulation 6 2MB > Simulation 7 2MB > Simulation 8 2MB > Simulation 9 2MB > Simulation 10 3MB > Simulation 11 3MB > Simulation 12 3MB > Simulation 13 4MB > Simulation 14 4MB > Simulation 15 4MB > Simulation 16 5MB > Simulation 17 5MB > Simulation 18 5MB > Simulation 19 5MB > Simulation 20 6MB > Simulation 21 6MB > Simulation 22 6MB > Simulation 23 7MB > Simulation 24 7MB > Simulation 25 7MB > Simulation 26 8MB > Simulation 27 8MB > Simulation 28 8MB > Simulation 29 9MB > Simulation 30 9MB > Simulation 31 9MB > Simulation 32 9MB > Simulation 33 10MB > Simulation 34 10MB > Simulation 35 10MB > Simulation 36 11MB > Simulation 37 11MB > Simulation 38 11MB > Simulation 39 12MB > Simulation 40 12MB > Simulation 41 12MB > Simulation 42 13MB > Simulation 43 13MB > Simulation 44 13MB > Simulation 45 13MB > Simulation 46 14MB > Simulation 47 14MB > Simulation 48 14MB > Simulation 49 15MB > Simulation 50 15MB > Simulation 51 15MB > Simulation 52 16MB > Simulation 53 16MB > Simulation 54 16MB > Simulation 55 16MB > Simulation 56 17MB > Simulation 57 17MB > Simulation 58 17MB > Simulation 59 18MB > Simulation 60 18MB > Simulation 61 18MB > Simulation 62 19MB > Simulation 63 19MB > Simulation 64 19MB > Simulation 65 19MB > Simulation 66 20MB > Simulation 67 20MB > Simulation 68 20MB > Simulation 69 21MB > Simulation 70 21MB > Simulation 71 21MB > Simulation 72 22MB > Simulation 73 22MB > Simulation 74 22MB > Simulation 75 23MB > Simulation 76 23MB > Simulation 77 23MB > Simulation 78 24MB > Simulation 79 24MB > Simulation 80 24MB > Simulation 81 24MB > Simulation 82 25MB > Simulation 83 25MB > Simulation 84 25MB > Simulation 85 26MB > Simulation 86 26MB > Simulation 87 26MB > Simulation 88 27MB > Simulation 89 27MB > Simulation 90 27MB > Simulation 91 27MB > Simulation 92 28MB > Simulation 93 28MB > Simulation 94 28MB > Simulation 95 29MB > Simulation 96 29MB > Simulation 97 29MB > Simulation 98 30MB > Simulation 99 30MB > Simulation 100 30MB > 6620 6650 > use Bio::SeqFeature::Gene::Exon; > use Bio::SeqFeature::Gene::Transcript; > use Bio::SeqFeature::Gene::GeneStructure; > use strict; > use Devel::Cycle; > > my ($prevmem,$growth,$first); > for my $k (1..100) { > open (FREE, "free -m|"); > my $buf=; > my $buf=; > my ($x1,$x2,$mem,$x3)=split(/\s+/,$buf,4); > if ($prevmem) { > $growth+= $mem-$prevmem; > print "Simulation $k\t$growth","MB\n"; > } > else { $first=$mem;} > $prevmem=$mem; > for my $i (1..20) { > my $gstruct=new Bio::SeqFeature::Gene::GeneStructure; > for my $n (0..3) { > my $transcript=new > Bio::SeqFeature::Gene::Transcript(-primary=>'memleak'.$n, > > -start=>1,-end=>2000,-strand=>, -desc=>'test for memmory leaks'); > > > foreach my $e (1.10) { > my $exonobj=new > Bio::SeqFeature::Gene::Exon(-start=>$e*10,-end=>$e*10+9,-strand=>1); > $transcript->add_exon($exonobj); > } > $gstruct->add_transcript($transcript); > } > } > } > print > "$first\t$prevmem\n";_______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From skirov at utk.edu Wed Mar 30 12:23:22 2005 From: skirov at utk.edu (Stefan Kirov) Date: Wed Mar 30 12:18:26 2005 Subject: [Bioperl-l] Possible memory leak in Bio::SeqFeature::Gene::GeneStructure? In-Reply-To: <1818EE1C-A13F-11D9-8431-000A959EB4C4@gmx.net> References: <1818EE1C-A13F-11D9-8431-000A959EB4C4@gmx.net> Message-ID: <424AE08A.3080201@utk.edu> Hilmar, Reported by Devel::Cycle: Cycle (1): $Bio::SeqFeature::Gene::GeneStructure::HC->{'_transcripts'} => \@HD $HD->[0] => \%Bio::SeqFeature::Gene::Transcript::HE $Bio::SeqFeature::Gene::Transcript::HE->{'parent'} => \%Bio::SeqFeature::Gene::GeneStructure::HC The problem is $fea in add_transcript adds $self (GeneStructure object) as parent: $fea->parent($self ) thus creating the cycle. One can simply call $fea->parent(); I guess, but this may need to be in DESTROY. Hilmar Lapp wrote: > Those modules probably can use some serious review. If there is a > cycle then Jason should be on the right path with overriding DESTROY, > but first one would need to know where the cycle is. I don't recall > one being there on purpose ... > > Sorry to not be of more help ... > > -hilmar > > On Tuesday, March 29, 2005, at 01:14 PM, Stefan Kirov wrote: > >> I am working on the Entrezgene parser and tried to use >> Bio::SeqFeature::Gene::GeneStructure to describe NC/NT to NM and NP >> relationships. I am pretty much done with the parser (based on Mingyi >> Liu low lovel parser), but once I tried to parse a whole file (Homo >> sapiens) I ran out of memory. I think the problem might be >> Bio::SeqFeature::Gene::GeneStructure::add_Transcript. >> Here is the code which I used to simulate the problem and the >> resulting report file. It seams adding Bio::SeqFeature::Gene::Exon to >> Bio::SeqFeature::Gene::Transcript do not contribute to the problem. >> Any suggestions? >> Stefan >> Simulation 2 0MB >> Simulation 3 1MB >> Simulation 4 1MB >> Simulation 5 1MB >> Simulation 6 2MB >> Simulation 7 2MB >> Simulation 8 2MB >> Simulation 9 2MB >> Simulation 10 3MB >> Simulation 11 3MB >> Simulation 12 3MB >> Simulation 13 4MB >> Simulation 14 4MB >> Simulation 15 4MB >> Simulation 16 5MB >> Simulation 17 5MB >> Simulation 18 5MB >> Simulation 19 5MB >> Simulation 20 6MB >> Simulation 21 6MB >> Simulation 22 6MB >> Simulation 23 7MB >> Simulation 24 7MB >> Simulation 25 7MB >> Simulation 26 8MB >> Simulation 27 8MB >> Simulation 28 8MB >> Simulation 29 9MB >> Simulation 30 9MB >> Simulation 31 9MB >> Simulation 32 9MB >> Simulation 33 10MB >> Simulation 34 10MB >> Simulation 35 10MB >> Simulation 36 11MB >> Simulation 37 11MB >> Simulation 38 11MB >> Simulation 39 12MB >> Simulation 40 12MB >> Simulation 41 12MB >> Simulation 42 13MB >> Simulation 43 13MB >> Simulation 44 13MB >> Simulation 45 13MB >> Simulation 46 14MB >> Simulation 47 14MB >> Simulation 48 14MB >> Simulation 49 15MB >> Simulation 50 15MB >> Simulation 51 15MB >> Simulation 52 16MB >> Simulation 53 16MB >> Simulation 54 16MB >> Simulation 55 16MB >> Simulation 56 17MB >> Simulation 57 17MB >> Simulation 58 17MB >> Simulation 59 18MB >> Simulation 60 18MB >> Simulation 61 18MB >> Simulation 62 19MB >> Simulation 63 19MB >> Simulation 64 19MB >> Simulation 65 19MB >> Simulation 66 20MB >> Simulation 67 20MB >> Simulation 68 20MB >> Simulation 69 21MB >> Simulation 70 21MB >> Simulation 71 21MB >> Simulation 72 22MB >> Simulation 73 22MB >> Simulation 74 22MB >> Simulation 75 23MB >> Simulation 76 23MB >> Simulation 77 23MB >> Simulation 78 24MB >> Simulation 79 24MB >> Simulation 80 24MB >> Simulation 81 24MB >> Simulation 82 25MB >> Simulation 83 25MB >> Simulation 84 25MB >> Simulation 85 26MB >> Simulation 86 26MB >> Simulation 87 26MB >> Simulation 88 27MB >> Simulation 89 27MB >> Simulation 90 27MB >> Simulation 91 27MB >> Simulation 92 28MB >> Simulation 93 28MB >> Simulation 94 28MB >> Simulation 95 29MB >> Simulation 96 29MB >> Simulation 97 29MB >> Simulation 98 30MB >> Simulation 99 30MB >> Simulation 100 30MB >> 6620 6650 >> use Bio::SeqFeature::Gene::Exon; >> use Bio::SeqFeature::Gene::Transcript; >> use Bio::SeqFeature::Gene::GeneStructure; >> use strict; >> use Devel::Cycle; >> >> my ($prevmem,$growth,$first); >> for my $k (1..100) { >> open (FREE, "free -m|"); >> my $buf=; >> my $buf=; >> my ($x1,$x2,$mem,$x3)=split(/\s+/,$buf,4); >> if ($prevmem) { >> $growth+= $mem-$prevmem; >> print "Simulation $k\t$growth","MB\n"; >> } >> else { $first=$mem;} >> $prevmem=$mem; >> for my $i (1..20) { >> my $gstruct=new Bio::SeqFeature::Gene::GeneStructure; >> for my $n (0..3) { >> my $transcript=new >> Bio::SeqFeature::Gene::Transcript(-primary=>'memleak'.$n, >> >> -start=>1,-end=>2000,-strand=>, -desc=>'test for memmory leaks'); >> >> >> foreach my $e (1.10) { >> my $exonobj=new >> Bio::SeqFeature::Gene::Exon(-start=>$e*10,-end=>$e*10+9,-strand=>1); >> $transcript->add_exon($exonobj); >> } >> $gstruct->add_transcript($transcript); >> } >> } >> } >> print >> "$first\t$prevmem\n";_______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Stefan Kirov, Ph.D. University of Tennessee/Oak Ridge National Laboratory 5700 bldg, PO BOX 2008 MS6164 Oak Ridge TN 37831-6164 USA tel +865 576 5120 fax +865-576-5332 e-mail: skirov@utk.edu sao@ornl.gov "And the wars go on with brainwashed pride For the love of God and our human rights And all these things are swept aside" From skirov at utk.edu Wed Mar 30 12:25:26 2005 From: skirov at utk.edu (Stefan Kirov) Date: Wed Mar 30 12:22:22 2005 Subject: [Bioperl-l] Possible memory leak in Bio::SeqFeature::Gene::GeneStructure? In-Reply-To: <1818EE1C-A13F-11D9-8431-000A959EB4C4@gmx.net> References: <1818EE1C-A13F-11D9-8431-000A959EB4C4@gmx.net> Message-ID: <424AE106.2020203@utk.edu> Oops, actually Bio::SeqFeature::Gene::Transcript::parent does not allow as it is undef.... Either should be fixed ot DESTROY needs to directly undef $transcript->{parent}. Hilmar Lapp wrote: > Those modules probably can use some serious review. If there is a > cycle then Jason should be on the right path with overriding DESTROY, > but first one would need to know where the cycle is. I don't recall > one being there on purpose ... > > Sorry to not be of more help ... > > -hilmar > > On Tuesday, March 29, 2005, at 01:14 PM, Stefan Kirov wrote: > >> I am working on the Entrezgene parser and tried to use >> Bio::SeqFeature::Gene::GeneStructure to describe NC/NT to NM and NP >> relationships. I am pretty much done with the parser (based on Mingyi >> Liu low lovel parser), but once I tried to parse a whole file (Homo >> sapiens) I ran out of memory. I think the problem might be >> Bio::SeqFeature::Gene::GeneStructure::add_Transcript. >> Here is the code which I used to simulate the problem and the >> resulting report file. It seams adding Bio::SeqFeature::Gene::Exon to >> Bio::SeqFeature::Gene::Transcript do not contribute to the problem. >> Any suggestions? >> Stefan >> Simulation 2 0MB >> Simulation 3 1MB >> Simulation 4 1MB >> Simulation 5 1MB >> Simulation 6 2MB >> Simulation 7 2MB >> Simulation 8 2MB >> Simulation 9 2MB >> Simulation 10 3MB >> Simulation 11 3MB >> Simulation 12 3MB >> Simulation 13 4MB >> Simulation 14 4MB >> Simulation 15 4MB >> Simulation 16 5MB >> Simulation 17 5MB >> Simulation 18 5MB >> Simulation 19 5MB >> Simulation 20 6MB >> Simulation 21 6MB >> Simulation 22 6MB >> Simulation 23 7MB >> Simulation 24 7MB >> Simulation 25 7MB >> Simulation 26 8MB >> Simulation 27 8MB >> Simulation 28 8MB >> Simulation 29 9MB >> Simulation 30 9MB >> Simulation 31 9MB >> Simulation 32 9MB >> Simulation 33 10MB >> Simulation 34 10MB >> Simulation 35 10MB >> Simulation 36 11MB >> Simulation 37 11MB >> Simulation 38 11MB >> Simulation 39 12MB >> Simulation 40 12MB >> Simulation 41 12MB >> Simulation 42 13MB >> Simulation 43 13MB >> Simulation 44 13MB >> Simulation 45 13MB >> Simulation 46 14MB >> Simulation 47 14MB >> Simulation 48 14MB >> Simulation 49 15MB >> Simulation 50 15MB >> Simulation 51 15MB >> Simulation 52 16MB >> Simulation 53 16MB >> Simulation 54 16MB >> Simulation 55 16MB >> Simulation 56 17MB >> Simulation 57 17MB >> Simulation 58 17MB >> Simulation 59 18MB >> Simulation 60 18MB >> Simulation 61 18MB >> Simulation 62 19MB >> Simulation 63 19MB >> Simulation 64 19MB >> Simulation 65 19MB >> Simulation 66 20MB >> Simulation 67 20MB >> Simulation 68 20MB >> Simulation 69 21MB >> Simulation 70 21MB >> Simulation 71 21MB >> Simulation 72 22MB >> Simulation 73 22MB >> Simulation 74 22MB >> Simulation 75 23MB >> Simulation 76 23MB >> Simulation 77 23MB >> Simulation 78 24MB >> Simulation 79 24MB >> Simulation 80 24MB >> Simulation 81 24MB >> Simulation 82 25MB >> Simulation 83 25MB >> Simulation 84 25MB >> Simulation 85 26MB >> Simulation 86 26MB >> Simulation 87 26MB >> Simulation 88 27MB >> Simulation 89 27MB >> Simulation 90 27MB >> Simulation 91 27MB >> Simulation 92 28MB >> Simulation 93 28MB >> Simulation 94 28MB >> Simulation 95 29MB >> Simulation 96 29MB >> Simulation 97 29MB >> Simulation 98 30MB >> Simulation 99 30MB >> Simulation 100 30MB >> 6620 6650 >> use Bio::SeqFeature::Gene::Exon; >> use Bio::SeqFeature::Gene::Transcript; >> use Bio::SeqFeature::Gene::GeneStructure; >> use strict; >> use Devel::Cycle; >> >> my ($prevmem,$growth,$first); >> for my $k (1..100) { >> open (FREE, "free -m|"); >> my $buf=; >> my $buf=; >> my ($x1,$x2,$mem,$x3)=split(/\s+/,$buf,4); >> if ($prevmem) { >> $growth+= $mem-$prevmem; >> print "Simulation $k\t$growth","MB\n"; >> } >> else { $first=$mem;} >> $prevmem=$mem; >> for my $i (1..20) { >> my $gstruct=new Bio::SeqFeature::Gene::GeneStructure; >> for my $n (0..3) { >> my $transcript=new >> Bio::SeqFeature::Gene::Transcript(-primary=>'memleak'.$n, >> >> -start=>1,-end=>2000,-strand=>, -desc=>'test for memmory leaks'); >> >> >> foreach my $e (1.10) { >> my $exonobj=new >> Bio::SeqFeature::Gene::Exon(-start=>$e*10,-end=>$e*10+9,-strand=>1); >> $transcript->add_exon($exonobj); >> } >> $gstruct->add_transcript($transcript); >> } >> } >> } >> print >> "$first\t$prevmem\n";_______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- Stefan Kirov, Ph.D. University of Tennessee/Oak Ridge National Laboratory 5700 bldg, PO BOX 2008 MS6164 Oak Ridge TN 37831-6164 USA tel +865 576 5120 fax +865-576-5332 e-mail: skirov@utk.edu sao@ornl.gov "And the wars go on with brainwashed pride For the love of God and our human rights And all these things are swept aside" From mlemieux at bioinfo.ca Wed Mar 30 12:26:58 2005 From: mlemieux at bioinfo.ca (Madeleine Lemieux) Date: Wed Mar 30 12:26:48 2005 Subject: [Bioperl-l] BLASTP Message-ID: <7565f93c3ae486b580f54b6a40dcf4bc@bioinfo.ca> Minyi, Are you sure both versions of Blast are getting the same e-val? The RemoteBlast.pm default is 1e-3 but Perl.pm sets it at 1e-10. I'm not sure what the standalone value is. -Madeleine From amtd9 at umr.edu Wed Mar 30 15:07:54 2005 From: amtd9 at umr.edu (Mane, Ajay (UMR-Student)) Date: Wed Mar 30 15:06:02 2005 Subject: [Bioperl-l] BL2SEQ Message-ID: <58AF0CF509606A49B1770AB5DFF811CE110839@UMR-CMAIL1.umr.edu> I have mailed earlier to this group, but there was no response. I want to run perl from a command line to get the results of NCBI bl2seq tool, which aligns two sequences. Which are the modules to be used ? A reply atleast this time would be great. - Ajay ________________________________ From: bioperl-l-bounces@portal.open-bio.org on behalf of Madeleine Lemieux Sent: Wed 3/30/2005 11:26 AM To: bioperl-l@portal.open-bio.org Subject: [Bioperl-l] BLASTP Minyi, Are you sure both versions of Blast are getting the same e-val? The RemoteBlast.pm default is 1e-3 but Perl.pm sets it at 1e-10. I'm not sure what the standalone value is. -Madeleine _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From skirov at utk.edu Wed Mar 30 17:31:44 2005 From: skirov at utk.edu (Stefan Kirov) Date: Wed Mar 30 17:26:40 2005 Subject: [Bioperl-l] EntrezGene ASN parser Message-ID: <424B28D0.7030101@utk.edu> I just finished a Bioperl EntrezGene Parser based on Mingyi Liu's ASN Gene parser. It creates two main objects: a Bio::Seq object which contains most of the data such as references, description, map location, etc; and a Bio::Cluster::SequenceFamily object, which contains the refseqs and the gene structure (through NT/NC annotation, represented as Bio::SeqFeature::Gene objects). Another data I make available is the uncaptured data. So each time a some data is transfered from the hash which represents the parsed data, I am deleting the respective key. Everything else is concidered uncaptured. I am doing this since some records could be non-compliant or simply there may be new data supplied by NCBI. There will be naturally some data, which is not interesting, and therefore is not captured (a lot of redundant data in the EntrezGene). So the parser would act like that: my ($egene,$assoc_seq,$uncaptured)=$egparser->next_seq; There are few things I need to add (Markers and GO are not yet in these objects), but most of work is done. Unless somebody objects, I will commit the code (Bio::SeqIO::entrezgene?) when I write the documentation to match the standard. Few notes: 1. It would be nice if there is Bio::Annotation::DBLink::url method. It makes sense (I think) since most DB links would refer also to a webpage. 2. It takes now 45 minutes to parse the whole human ASN file, which is 4 times slower. Keeping uncaptured data slows things down a bit, so I will introduce -debug option. Anyway I think the speed is not going to be an issue. 3. Due to the cyclic reference in the GeneStructure object I am removing the Transcript->{parent} in the parser. This code should be deleted once the Transcript object is fixed. There are also some other minor issues, but I think I will be able to fix them by the end of the week. Please let me know what you think. Stefan From kvddrift at earthlink.net Thu Mar 31 08:53:33 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Thu Mar 31 08:47:33 2005 Subject: [Bioperl-l] bioperl 1.5.0 on OS X Message-ID: Hi, Finally had some time to test bioperl 1.5.0 on Mac OS X (10.3.8). Note that I use fink to create the package. I get a few failing tests: t/DB.........................FAILED tests 29-31 Failed 3/78 tests, 96.15% okay t/EMBL_DB....................ok 11/15Use of uninitialized value in hash element at t/EMBL_DB.t line 99, line 1. Use of uninitialized value in hash element at t/EMBL_DB.t line 99, line 1. t/EMBL_DB....................FAILED tests 13-15 Failed 3/15 tests, 80.00% okay t/Perl.......................ok 10/14 -------------------- WARNING --------------------- MSG: id (BUM) does not exist --------------------------------------------------- t/Perl.......................ok 12/14 -------------------- WARNING --------------------- MSG: acc (NM_006732) does not exist --------------------------------------------------- t/Perl.......................FAILED tests 11, 13 Failed 2/14 tests, 85.71% okay Also, one test couldn't reach a server: t/FeatureIO..................ok 1/22 -------------------- WARNING --------------------- MSG: [1/5] tried to fetch http://umn.dl.sourceforge.net/sourceforge/song/sofa.definition, but server threw 500. retrying... If I disable the tests, the package installs fine. BTW, what is meant by a developer release? Is this not an official release, meaning that it contains some experimental code? cheers, - Koen. From awoolfe at hgmp.mrc.ac.uk Thu Mar 31 09:54:41 2005 From: awoolfe at hgmp.mrc.ac.uk (Adam Woolfe) Date: Thu Mar 31 09:50:06 2005 Subject: [Bioperl-l] Retrieving hits in order in SeqIO Message-ID: Hi Im trying to retrieve the top hit from a set of blast results in a single file using SeqIO. I assumed that SeqIO processed the hits in the same order as in the input file (i.e. from hits with the lowest evalue onwards) but Ive been getting some strange results back where the first hit is actually the last one in the list: e.g. the desciption lines of the hits in the input file is as follows: Score E Sequences producing significant alignments: (bits) Value EM:16 chromosome:NCBI35:16:49081023:50081022:1 277 1e-72 EM:20 chromosome:NCBI35:20:49832988:50832987:1 54 3e-05 EM:4 chromosome:NCBI35:4:103445867:104383982:1 40 0.52 EM:17 chromosome:NCBI35:17:32359244:33357400:1 40 0.52 EM:10 chromosome:NCBI35:10:106096982:107096981:1 40 0.52 EM:10 chromosome:NCBI35:10:77173138:78173137:1 40 0.52 As a test a highly stripped version of the perlscript: ------------------------------------------------------------- $file = "/path/to/infile.blast"; $in2 = new Bio::SearchIO( -format => 'blast', -file => "$file"); while( my $result = $in2->next_result ) { while( my $hit = $result->next_hit ) { while( my $hsp = $hit->next_hsp ) { print "hit:".$hit->name." ".$hit->description; } } } ------------------------------------------------------------ the output of this is: hit:EM:10 chromosome:NCBI35:10:77173138:78173137:1 hit:EM:16 chromosome:NCBI35:16:49081023:50081022:1 hit:EM:20 chromosome:NCBI35:20:49832988:50832987:1 hit:EM:4 chromosome:NCBI35:4:103445867:104383982:1 hit:EM:17 chromosome:NCBI35:17:32359244:33357400:1 hit:EM:10 chromosome:NCBI35:10:106096982:107096981:1 Why is it not giving me the results in the correct order? In other examples ive looked at, the top hit is not always the last (as in this example) so it seems like something very random is going on. Could anyone shed any light on this, I'd really appreciate it. many thanks, Adam P.S. Im using Bioperl 1.4 on Solaris9 From amtd9 at umr.edu Thu Mar 31 10:15:28 2005 From: amtd9 at umr.edu (Mane, Ajay (UMR-Student)) Date: Thu Mar 31 10:10:22 2005 Subject: [Bioperl-l] BL2SEQ Message-ID: <58AF0CF509606A49B1770AB5DFF811CE11083F@UMR-CMAIL1.umr.edu> Thanks for the reply. I have lots of sequence pairs which i want to align. Two sequences at a time. I want to form some statistics on the alignment results. That is about the coding regions in nucleotides. I need to manually look at the coding regions everytime and do some analysis. Instead, i want to run a perl file which runs the bl2seq on blast server, gets the results, formats them to provide some statistics which i am interested in. I do not want to install blast locally on my machine. I just have lots of accession numbers in a file and want to display statistics of all of them in some file/files using a perl file. I have succeeded to a limit using the normal perl, by submitting accession numbers and getting results from blast server using http::request methods. But i want to know how to use Bioperl for this job. Thanks, Ajay ________________________________ From: Barry Moore [mailto:barry.moore@genetics.utah.edu] Sent: Wed 3/30/2005 2:48 PM To: Mane, Ajay (UMR-Student) Subject: Re: [Bioperl-l] BL2SEQ Ajay- Not sure what you want to do with your blast results, but I think you'd be pretty limited in doing much analysis using bioperl from the perl command line. You might look at the SEALS package http://www.ncbi.nlm.nih.gov/CBBresearch/Walker/SEALS/, or repost with more detail about what it is that you are trying to do. Barry Mane, Ajay (UMR-Student) wrote: >I have mailed earlier to this group, but there was no response. I want to run perl from a command line to get the >results of NCBI bl2seq tool, which aligns two sequences. Which are the modules to be used ? A reply atleast this time >would be great. > >- Ajay > >________________________________ > >From: bioperl-l-bounces@portal.open-bio.org on behalf of Madeleine Lemieux >Sent: Wed 3/30/2005 11:26 AM >To: bioperl-l@portal.open-bio.org >Subject: [Bioperl-l] BLASTP > > > >Minyi, > >Are you sure both versions of Blast are getting the same e-val? The >RemoteBlast.pm default is 1e-3 but Perl.pm sets it at 1e-10. I'm not >sure what the standalone value is. > >-Madeleine > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Barry Moore Dept. of Human Genetics University of Utah Salt Lake City, UT From sdavis2 at mail.nih.gov Thu Mar 31 10:41:38 2005 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Thu Mar 31 10:35:52 2005 Subject: [Bioperl-l] BL2SEQ In-Reply-To: <58AF0CF509606A49B1770AB5DFF811CE11083F@UMR-CMAIL1.umr.edu> References: <58AF0CF509606A49B1770AB5DFF811CE11083F@UMR-CMAIL1.umr.edu> Message-ID: <54f578eea98dc5fc7d15e5a68d99fb00@mail.nih.gov> On Mar 31, 2005, at 10:15 AM, Mane, Ajay ((UMR-Student)) wrote: > Thanks for the reply. > > I have lots of sequence pairs which i want to align. Two sequences at > a time. I want to form some statistics on the alignment results. That > is about the coding regions in nucleotides. I need to manually look at > the coding regions everytime and do some analysis. Instead, i want to > run a perl file which runs the bl2seq on blast server, gets the > results, formats them to provide some statistics which i am interested > in. I do not want to install blast locally on my machine. I just have > lots of accession numbers in a file and want to display statistics of > all of them in some file/files using a perl file. I have succeeded to > a limit using the normal perl, by submitting accession numbers and > getting results from blast server using http::request methods. But i > want to know how to use Bioperl for this job. > I know you said you do not want to install blast on your machine, but if you have many accessions and want to blast all pairs, using local blast would be very convenient--just blast all sequences against all other sequences. There are binaries for many platforms, so you probably wouldn't even have to build the blast executables. Sean From barry.moore at genetics.utah.edu Thu Mar 31 15:04:05 2005 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Thu Mar 31 14:58:08 2005 Subject: [Bioperl-l] BL2SEQ In-Reply-To: <54f578eea98dc5fc7d15e5a68d99fb00@mail.nih.gov> References: <58AF0CF509606A49B1770AB5DFF811CE11083F@UMR-CMAIL1.umr.edu> <54f578eea98dc5fc7d15e5a68d99fb00@mail.nih.gov> Message-ID: <424C57B5.3040309@genetics.utah.edu> I agree with Sean. Installing BLAST locally is really quite easy on most platforms (Unix and Window from my experience). Download the binaries from here ftp://ftp.ncbi.nlm.nih.gov/blast/executables/LATEST, and get documentation here http://www.ncbi.nlm.nih.gov/BLAST/docs/. You would unpack and install the binaries, run formatdb on a fasta file of all your sequences, and then use bioperl to loop over all of your sequences and blast each one against the database. See the module documentation for StandAloneBlast here to see how to actually run local BLAST with bioperl. Using bioperl can also help alot with parsing all the resulting blast output. Read some of the HOWTO's at http://www.bioperl.org/Core/Latest/modules.html. Specifically, have a look at the SearchIO HOWTO, and if you're new to bioperl have a look at Beginners HOWTO and SeqIO HOWTO for starters. If you really want to move forward with remote BLAST see this modules documentation http://doc.bioperl.org/releases/bioperl-1.4/Bio/Tools/Run/RemoteBlast.html. However, if you have alot of sequences, this will become very slow. You won't like it, and neither will NCBI. Barry Sean Davis wrote: > > On Mar 31, 2005, at 10:15 AM, Mane, Ajay ((UMR-Student)) wrote: > >> Thanks for the reply. >> >> I have lots of sequence pairs which i want to align. Two sequences at >> a time. I want to form some statistics on the alignment results. That >> is about the coding regions in nucleotides. I need to manually look >> at the coding regions everytime and do some analysis. Instead, i want >> to run a perl file which runs the bl2seq on blast server, gets the >> results, formats them to provide some statistics which i am >> interested in. I do not want to install blast locally on my machine. >> I just have lots of accession numbers in a file and want to display >> statistics of all of them in some file/files using a perl file. I >> have succeeded to a limit using the normal perl, by submitting >> accession numbers and getting results from blast server using >> http::request methods. But i want to know how to use Bioperl for this >> job. >> > > I know you said you do not want to install blast on your machine, but > if you have many accessions and want to blast all pairs, using local > blast would be very convenient--just blast all sequences against all > other sequences. There are binaries for many platforms, so you > probably wouldn't even have to build the blast executables. > > Sean > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Barry Moore Dept. of Human Genetics University of Utah Salt Lake City, UT From Mark.Hoebeke at jouy.inra.fr Wed Mar 30 01:20:18 2005 From: Mark.Hoebeke at jouy.inra.fr (Hoebeke Mark) Date: Thu Mar 31 15:42:31 2005 Subject: [Bioperl-l] Hierarchical location parsing In-Reply-To: References: Message-ID: <1112163618.5683.16.camel@hurd> Hi Brian, you are right, I reloaded the Genbank file from : ftp://ftp.ncbi.nlm.nih.gov/genbank/genomes/Bacteria/Streptococcus_pyogenes_MGAS315/AE14074.gbk and indeed, the source feature has changed to an ordinary simple location. It seems they corrected the original submission : the modification date now reads "mar 9", whereas the date on the release I initially fetched read "18 jul 2002" (which happens to be the date mentioned in the LOCUS descriptor). I guess this makes parsing hierarchical location descriptors a moot point until I come up with another example... Mark Le mardi 29 mars 2005 ? 08:09 -0500, Brian Osborne a ?crit : > Mark, > > I didn't see any "join(join..." statements in that Genbank entry, as part of > a source feature or anywhere else. I'm used this URL: > > http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=21909536 > > > Brian O. > > > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Mark Hoebeke > Sent: Friday, March 25, 2005 3:24 PM > To: Brian Osborne > Cc: bioperl-l@portal.open-bio.org > Subject: RE: [Bioperl-l] Hierarchical location parsing > > > Brian, > > an example of a nested location is found in the 'source' feature of the > Genbank entry having accession AE014074 (Streptococcus pyogenes MGAS315 > complete genome). As the file is over 1 Meg in size once compressed it > might not be a good idea to attach it to this mail which is CC'ed to > bioperl-l ;D > > Regarding the performance hit of my fix, I feared that replacing a > compiled regexp with a split and a loop over every character of the > string could have a significant impact. As it stands, I timed a simple > parsing script swallowing Genbank files and spitting out each feature > location as a GFF string, on 131 complete microbial genomes. There is no > difference in output between the bioperl-live FTLocationFactory and its > patched version (basically meaning that this test sample did not contain > nested locations). The times are comparable, with even a slight > advantage to the patched version (915.66user 19.53system 15:42.19elapsed > 99%CPU vs. 938.06user 17.33system 16:04.15elapsed 99%CPU). > > When comparing the outputs of the parser run on a file with a nested > location, it appears that without the bugfix, the nested location yields > an incorrect GFF string as shown by the diff below. > > [mark@homer Loc]$ diff MGAS315 MGAS315_patched > 1c1 > < > join(1..749107,join(788646..977266,join(1018339..1137553,join(1171973..12301 > 14,join(1271911..1313193,join(1351400..1410541,1450556..1900521),) > --- > > > join(1..749107,join(788646..977266,join(1018339..1137553,join(1171973..12301 > 14,join(1271911..1313193,join(1351400..1410541,1450556..1900521)))))) > > I'm still cautious about the bugfix because I only produced the diffs > on microbial genomes, which probably have simpler location definitions > that higher eukaryotes. > > Greetings, > > Mark > > Le vendredi 25 mars 2005 ? 11:52 -0500, Brian Osborne a ?crit : > > Mark, > > > > Can you also attach the sequence file that you used in order to test your > > code? That way I can write a test specifically for the parsing of > > hierarchical locations. > > > > You wrote "I'm not sure the new patch won't slow down location parsing > > considerably..." Have you actually timed the parsing using the old and new > > code? > > > > Thanks again, > > > > Brian O. > > > > -- > --------------------------Mark.Hoebeke@jouy.inra.fr---------------------- > Unit? Statistique & G?nome Unit? MIG > +33 (0)1 60 87 38 03 T?l. +33 (0)1 34 65 28 85 > +33 (0)1 60 87 38 09 Fax. +33 (0)1 34 65 29 01 > Tour Evry 2, 523 pl. des Terrasses INRA - Domaine de Vilvert > F - 91000 Evry F - 78352 Jouy-en-Josas CEDEX > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- -------------------------Mark.Hoebeke@jouy.inra.fr--------------------- Unit? Statistique & G?nome Unit? MIG +33 (0)1 60 87 38 03 T?l. +33 (0)1 34 65 28 85 +33 (0)1 60 87 38 09 Fax. +33 (0)1 34 65 29 01 Tour Evry 2, 523 pl. des Terrasses INRA - Domaine de Vilvert F - 91000 Evry F - 78352 Jouy-en-Josas CEDEX -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: Ceci est une partie de message =?ISO-8859-1?Q?num=E9riquement?= =?ISO-8859-1?Q?_sign=E9e?= Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050330/3823412d/attachment-0001.bin From sallyli97 at yahoo.com Wed Mar 30 10:44:06 2005 From: sallyli97 at yahoo.com (Sally Li) Date: Thu Mar 31 15:42:35 2005 Subject: [Bioperl-l] $id && $self->display($id) Message-ID: <20050330154407.19321.qmail@web53608.mail.yahoo.com> Hi, there, I wonder what does it means in the following statement? $id && $self->display($id) This is from the constructor (new ()) of object PrimarySeq.pm. Thanks! Sally. __________________________________ Do you Yahoo!? Yahoo! Small Business - Try our new resources site! http://smallbusiness.yahoo.com/resources/ From awoolfe at rfcgr.mrc.ac.uk Thu Mar 31 08:37:55 2005 From: awoolfe at rfcgr.mrc.ac.uk (Adam Woolfe) Date: Thu Mar 31 15:42:37 2005 Subject: [Bioperl-l] Retrieving hits in order in SeqIO Message-ID: Hi Im trying to retrieve the top hit from a set of blast results in a single file using SeqIO. I assumed that SeqIO processed the hits in the same order as in the input file (i.e. from hits with the lowest evalue onwards) but Ive been getting some strange results back where the first hit is actually the last one in the list: e.g. the desciption lines of the hits in the input file is as follows: Score E Sequences producing significant alignments: (bits) Value EM:16 chromosome:NCBI35:16:49081023:50081022:1 277 1e-72 EM:20 chromosome:NCBI35:20:49832988:50832987:1 54 3e-05 EM:4 chromosome:NCBI35:4:103445867:104383982:1 40 0.52 EM:17 chromosome:NCBI35:17:32359244:33357400:1 40 0.52 EM:10 chromosome:NCBI35:10:106096982:107096981:1 40 0.52 EM:10 chromosome:NCBI35:10:77173138:78173137:1 40 0.52 As a test a highly stripped version of the perlscript: ------------------------------------------------------------- $file = "/path/to/infile.blast"; $in2 = new Bio::SearchIO( -format => 'blast', -file => "$file"); while( my $result = $in2->next_result ) { while( my $hit = $result->next_hit ) { while( my $hsp = $hit->next_hsp ) { print "hit:".$hit->name." ".$hit->description; } } } ------------------------------------------------------------ the output of this is: hit:EM:10 chromosome:NCBI35:10:77173138:78173137:1 hit:EM:16 chromosome:NCBI35:16:49081023:50081022:1 hit:EM:20 chromosome:NCBI35:20:49832988:50832987:1 hit:EM:4 chromosome:NCBI35:4:103445867:104383982:1 hit:EM:17 chromosome:NCBI35:17:32359244:33357400:1 hit:EM:10 chromosome:NCBI35:10:106096982:107096981:1 Why is it not giving me the results in the correct order? In other examples ive looked at, the top hit is not always the last (as in this example) so it seems like something very random is going on. Could anyone shed any light on this, I'd really appreciate it. many thanks, Adam P.S. Im using Bioperl 1.4 on Solaris9 From schuh at farmdale.com Thu Mar 31 16:14:24 2005 From: schuh at farmdale.com (Mike Schuh) Date: Thu Mar 31 16:09:38 2005 Subject: [Bioperl-l] $id && $self->display($id) In-Reply-To: <20050330154407.19321.qmail@web53608.mail.yahoo.com> Message-ID: Sally, >I wonder what does it means in the following >statement? > >$id && $self->display($id) Pretty standard Perl construct. The value of $id is checked and if it is "true" (defined and not zero), then the display method of the current object is called. This is shorthand for if(defined($id) && $id) { # to be slightly pedantic $self->display($id); } Similar patterns are used in shell scripts, etc. -- Mike Schuh -- Seattle, Washington USA http://www.farmdale.com From skirov at utk.edu Thu Mar 31 16:23:36 2005 From: skirov at utk.edu (Stefan Kirov) Date: Thu Mar 31 16:18:14 2005 Subject: [Bioperl-l] $id && $self->display($id) In-Reply-To: <20050330154407.19321.qmail@web53608.mail.yahoo.com> References: <20050330154407.19321.qmail@web53608.mail.yahoo.com> Message-ID: <424C6A58.2030501@utk.edu> if $id is defined $self->display($id) is being evaluated, which actually sets the display id of $self to be $id Stefan Sally Li wrote: >Hi, there, > >I wonder what does it means in the following >statement? > >$id && $self->display($id) > >This is from the constructor (new ()) of object >PrimarySeq.pm. > >Thanks! > >Sally. > > > >__________________________________ >Do you Yahoo!? >Yahoo! Small Business - Try our new resources site! >http://smallbusiness.yahoo.com/resources/ >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > From MEC at Stowers-Institute.org Thu Mar 31 16:41:42 2005 From: MEC at Stowers-Institute.org (Cook, Malcolm) Date: Thu Mar 31 16:36:07 2005 Subject: [Bioperl-l] patch to FeatureIO.pm for tied interface Message-ID: <200503312135.j2VLZZfY020817@portal.open-bio.org> bioperlers, The following patch to bioperl-live makes up for what was probably a copy and paste error and lets FeatureIO work with tied handle interface too. I would be happy to have write access to cvs repository for this and other such patches as discovered.... Cheers, Malcolm Cook Index: FeatureIO.pm =================================================================== RCS file: /home/repository/bioperl/bioperl-live/Bio/FeatureIO.pm,v retrieving revision 1.8 diff -c -r1.8 FeatureIO.pm *** FeatureIO.pm 18 Jan 2005 05:22:11 -0000 1.8 --- FeatureIO.pm 31 Mar 2005 21:34:33 -0000 *************** *** 507,526 **** sub TIEHANDLE { my ($class,$val) = @_; ! return bless {'seqio' => $val}, $class; } sub READLINE { my $self = shift; ! return $self->{'seqio'}->next_seq() unless wantarray; my (@list, $obj); ! push @list, $obj while $obj = $self->{'seqio'}->next_seq(); return @list; } sub PRINT { my $self = shift; ! $self->{'seqio'}->write_seq(@_); } 1; --- 507,526 ---- sub TIEHANDLE { my ($class,$val) = @_; ! return bless {'featio' => $val}, $class; } sub READLINE { my $self = shift; ! return $self->{'featio'}->next_feature() unless wantarray; my (@list, $obj); ! push @list, $obj while $obj = $self->{'featio'}->next_feature(); return @list; } sub PRINT { my $self = shift; ! $self->{'featio'}->write_feature(@_); } 1; From qfdong at iastate.edu Thu Mar 31 18:15:23 2005 From: qfdong at iastate.edu (Qunfeng) Date: Thu Mar 31 18:09:58 2005 Subject: [Bioperl-l] pubmed Message-ID: <6.1.2.0.2.20050331171052.03830ba8@qfdong.mail.iastate.edu> Hi there, http://bioperl.org/HOWTOs/Feature-Annotation/anno_from_genbank.html I am not very familiar with BioPerl. I tried to follow the example showing in the above page to retrieve pubmed ID under each Reference tag , i.e., $value->pubmed(), but it doesn't work for me for the seq gi#56961711. The authors() works for me. Appreciate any suggestions. Qunfeng From zhoujie at fudan.edu.cn Thu Mar 31 22:44:49 2005 From: zhoujie at fudan.edu.cn (zhoujie@fudan.edu.cn) Date: Thu Mar 31 23:30:13 2005 Subject: [Bioperl-l] Help with taxonomy db Message-ID: <135a991135d544.135d544135a991@fudan.edu.cn> Hi all, Would you please help me with this error message in using local taxonomy db? My test code is here: #------------------------------------------------------- use Bio::DB::Taxonomy; my $db = new Bio::DB::Taxonomy(-source => 'flatfile', -nodesfile => 'nodes.dmp', -namesfile => 'names.dmp', -directory => 'index'); my $id = $db->get_taxonid('Homo sapiens'); print "id is $id for Homo sapiens\n"; #------------------------------------------------------- The code generates three files in the index directory: 'nodes','names2id','id2names'. but after that I get an error message: ------------- EXCEPTION ------------- MSG: No such file or directory index/nodes STACK Bio::DB::Taxonomy::flatfile::_db_connect c:/Perl/site/lib/Bio\DB\Taxonomy\ flatfile.pm:325 STACK Bio::DB::Taxonomy::flatfile::new c:/Perl/site/lib/Bio\DB\Taxonomy\flatfile .pm:138 STACK Bio::DB::Taxonomy::new c:/Perl/site/lib/Bio/DB/Taxonomy.pm:104 STACK toplevel local_taxonomy_query.pl:10 -------------------------------------- I'm quite confused with this error, because the nodes file is just in there, but why "No such file"? Can anyone tell me what happening? Any suggestion is appreciated. J Z