From David.Messina at sbc.su.se Tue Dec 1 05:14:40 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 1 Dec 2009 11:14:40 +0100 Subject: [Bioperl-l] [Bug 2937] Strand in fasta35 output does not seem to be parsed In-Reply-To: <8D08960C647E64438CE5740657CBBDC50148731FDA@iahcexch1.iah.bbsrc.ac.uk> References: <8D08960C647E64438CE5740657CBBDC50148731E47@iahcexch1.iah.bbsrc.ac.uk> <50F0159A-DE58-4405-A2FE-4FA95A3CDDA4@sbc.su.se> <8D08960C647E64438CE5740657CBBDC50148731FDA@iahcexch1.iah.bbsrc.ac.uk> Message-ID: Hi Mick, Did you try running the test case that you had originally attached to the bug report? Or is the below from different code and a diffrent fasta output file? In any case, I'll need to look at the fasta35 output file and the parse2.pl you ran in order to reproduce and fix this -- could you please open a new bug report and attach them to it? Thanks, Dave On Nov 30, 2009, at 17:49, michael watson (IAH-C) wrote: > Hi Dave > > Just got round to looking at this. > > In bioperl-1.6.0, the strand didn't get parsed, but the module only warned about something: > > --------------------- WARNING --------------------- > MSG: Unrecognized alignment line (1) ' /usr/local/fasta3/bin/fasta35 -n -U -Q -H -A -E 2.0 -C 19 -m 0 -m 9i -O iltv_pre.fasta35 iltv_pre.fasta clusters.fasta' > --------------------------------------------------- > > However, in the bioperl-live I just downloaded, this had turned into a full-on stack trace: > > ------------- EXCEPTION ------------- > MSG: Unrecognized alignment line (1) ' /usr/local/fasta3/bin/fasta35 -n -U -Q -H -A -E 2.0 -C 19 -m 0 -m 9i -O iltv_pre.fasta35 iltv_pre.fasta clusters.fasta' > STACK Bio::SearchIO::fasta::next_result /usr/local/bioperl-live_301109//Bio/SearchIO/fasta.pm:1347 > STACK toplevel parse2.pl:20 > ------------------------------------- > > I'm not sure if this is even related to the strand issue (I suspect not, but you never know) but something changed between bioperl-1.6.0 and the live trunk I downloaded today to ensure I still can't use the module. > > Is this another bug report? > > Thanks again for all your help > > Mick > > -----Original Message----- > From: Dave Messina [mailto:David.Messina at sbc.su.se] > Sent: 23 November 2009 17:46 > To: michael watson (IAH-C) > Subject: Re: [Bug 2937] Strand in fasta35 output does not seem to be parsed > > Hi Mick, > > Sure thing -- the current build from subversion is packaged up every > night and available here: > http://www.bioperl.org/DIST/nightly_builds/ > > Just grab bioperl-live.tar.gz from there and you'll get the changes. > > > Dave > > > > > On Nov 23, 2009, at 6:34 PM, michael watson (IAH-C) wrote: > >> Hi Dave >> >> Thanks for the hard work. >> >> Trying to get the latest updates so I can use this... don't have svn >> on my server, tried to install it and I don't have python either, >> which is needed to install it. >> >> I face about 3 weeks whilst my IT department sort this out, unless I >> can access the changes any other way? >> >> Thanks >> Mick >> >> -----Original Message----- >> From: bugzilla-daemon at portal.open-bio.org [mailto:bugzilla- >> daemon at portal.open-bio.org] >> Sent: 20 November 2009 15:12 >> To: michael watson (IAH-C) >> Subject: [Bug 2937] Strand in fasta35 output does not seem to be >> parsed >> >> http://bugzilla.open-bio.org/show_bug.cgi?id=2937 >> >> >> online at davemessina.com changed: >> >> What |Removed |Added >> ---------------------------------------------------------------------------- >> Status|NEW |RESOLVED >> Resolution| |FIXED >> >> >> >> >> ------- Comment #7 from online at davemessina.com 2009-11-20 10:12 EST >> ------- >> Fixed in r16394. >> >> Michael, thanks for the report. Your test cases pass, but please >> reopen the bug >> if needed. >> >> >> -- >> Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi? >> tab=email >> ------- You are receiving this mail because: ------- >> You reported the bug, or are watching the reporter. > From e.osimo at gmail.com Tue Dec 1 13:05:48 2009 From: e.osimo at gmail.com (Emanuele Osimo) Date: Tue, 1 Dec 2009 19:05:48 +0100 Subject: [Bioperl-l] Statistics: how to obtain the p value of a T test Message-ID: <2ac05d0f0912011005n6140869aoc634ad08cdf10ca4@mail.gmail.com> Hello everyone, I'm trying to get the p value of a statistic made with Statistics::TTest I cannot find this function: I can find if the null hypothesis is rejected at a certain confidence level, but I cannot make the script show me the actual p value. Do you know other scripts that can do that? Thanks Emanuele From cjfields at illinois.edu Tue Dec 1 14:25:03 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 1 Dec 2009 13:25:03 -0600 Subject: [Bioperl-l] Fwd: [Utilities-announce] NCBI E-Utility Policy Change References: <7B6F170840CA6C4DA63EE0C8A7BB43EC09CA7387@NIHCESMLBX15.nih.gov> Message-ID: <964687F9-989B-4F11-B74B-977912A922EB@illinois.edu> I'll be adjusting the requisite parameters as indicated below. I'm reluctant to include a time-based limit on submissions (NCBI wants a max of 100 requests at peak hours), but it may become necessary if they request it. chris Begin forwarded message: > From: > Date: December 1, 2009 12:59:34 PM CST > To: > Subject: [Utilities-announce] NCBI E-Utility Policy Change > Reply-To: utilities-announce at ncbi.nlm.nih.gov > > As part of an ongoing effort to ensure efficient access to the Entrez Utilities (E-utilities) by all users, NCBI has decided to change the usage policy for the E-utilities effective June 1, 2010. Effective on June 1, 2010, all E-utility requests, either using standard URLs or SOAP, must contain non-null values for both the &tool and &email parameters. Any E-utility request made after June 1, 2010 that does not contain values for both parameters will return an error explaining that these parameters must be included in E-utility requests. > > The value of the &tool parameter should be a URI-safe string that is the name of the software package, script or web page producing the E-utility request. > > The value of the &email parameter should be a valid e-mail address for the appropriate contact person or group responsible for maintaining the tool producing the E-utility request. > > NCBI uses these parameters to contact users whose use of the E-utilities violates the standard usage policies described athttp://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html#UserSystemRequirements. These usage policies are designed to prevent excessive requests from a small group of users from reducing or eliminating the wider community's access to the E-utilities. NCBI will attempt to contact a user at the e-mail address provided in the &email parameter prior to blocking access to the E-utilities. > > NCBI realizes that this policy change will require many of our users to change their code. Based on past experience, we anticipate that most of our users should be able to make the necessary changes before the June 1, 2010 deadline. If you have any concerns about making these changes by that date, or if you have any questions about these policies, please contact eutilities at ncbi.nlm.nih.gov. > > Thank you for your understanding and cooperation in helping us continue to deliver a reliable and efficient web service. > > _______________________________________________ > Utilities-announce mailing list > http://www.ncbi.nlm.nih.gov/mailman/listinfo/utilities-announce From maj at fortinbras.us Tue Dec 1 21:27:06 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 1 Dec 2009 21:27:06 -0500 Subject: [Bioperl-l] test test test Message-ID: <95142B0024EC48928CB56A69A17A8559@NewLife> MAJ From ocarnorsk138 at gmail.com Tue Dec 1 21:59:48 2009 From: ocarnorsk138 at gmail.com (Ocar Campos) Date: Tue, 1 Dec 2009 23:59:48 -0300 Subject: [Bioperl-l] test test test In-Reply-To: <95142B0024EC48928CB56A69A17A8559@NewLife> References: <95142B0024EC48928CB56A69A17A8559@NewLife> Message-ID: test test test test back O'car Campos C. Bioinformatics Engineering Student. University of Talca. Chile. 2009/12/1 Mark A. Jensen > MAJ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From maj at fortinbras.us Tue Dec 1 22:08:23 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 1 Dec 2009 22:08:23 -0500 Subject: [Bioperl-l] test test test In-Reply-To: References: <95142B0024EC48928CB56A69A17A8559@NewLife> Message-ID: I love when people are paying attention! ----- Original Message ----- From: Ocar Campos To: Mark A. Jensen ; Bioperl Mailing List. Sent: Tuesday, December 01, 2009 9:59 PM Subject: Re: [Bioperl-l] test test test test test test test back O'car Campos C. Bioinformatics Engineering Student. University of Talca. Chile. 2009/12/1 Mark A. Jensen MAJ _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From rtbio.2009 at gmail.com Wed Dec 2 07:07:08 2009 From: rtbio.2009 at gmail.com (Roopa Raghuveer) Date: Wed, 2 Dec 2009 13:07:08 +0100 Subject: [Bioperl-l] Remote blast Message-ID: Hello everyone, I have a problem. I am new to Bioperl. I am working on RNAi tool wherein a cgi script was written which connects to NCBI blast using remote blast program,i.e., The input sequence given in the html page is taken as input and Remote blast is performed on this based on the code for Remote blast.But,I have a problem in the Remote blast code. My code goes like this @compseqs=blastcode($in{'Inputseq'}); sub blastcode { $input1= $_[0]; open(NUC,'>',$nuc); print NUC $input1; close(NUC); my $prog = 'blastn'; my $db = 'refseq_rna'; my $e_val= '1e-10'; my $organism= 'Trypanosoma Brucei'; $gb = new Bio::DB::GenBank; my @params = ( '-prog' => $prog, '-data' => $db, '-expect' => $e_val, '-readmethod' => 'SearchIO', '-Organism' => $organism ); my $factory = Bio::Tools::Run::RemoteBlast->new(@params); #change a paramter $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma brucei[ORGN]'; my $v = 1; #$v is just to turn on and off the messages my $str = Bio::SeqIO->new(-file => $nuc , '-format' => 'fasta' , '-organism' => 'Trypanosoma Brucei' ); while (my $input = $str->next_seq()) { #Blast a sequence against a database: #Alternatively, you could pass in a file with many #sequences rather than loop through sequence one at a time #Remove the loop starting 'while (my $input = $str->next_seq())' #and swap the two lines below for an example of that. my $r = $factory->submit_blast($input); print STDERR "waiting...." if($v>0); while ( my @rids = $factory->each_rid ) { foreach my $rid ( @rids ) { my $rc = $factory->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $factory->remove_rid($rid); } print STDERR "." if ( $v > 0 ); sleep 5; } else { my $result = $rc->next_result(); #save the output my $filename = $result->query_name()."\.out"; $factory->save_output($filename); $factory->remove_rid($rid); # open(BLASTDEBUGFILE,'>',$blastdebugfile); # print BLASTDEBUGFILE "Test1 $result"; # close(BLASTDEBUGFILE); open(OUTFILE,'>',$outfile); print OUTFILE "Test2 $result->database_name()"; close(OUTFILE); while ( my $hit = $result->next_hit ) { next unless ( $v > 0); # open(OUTFILE,'>',$outfile); # print OUTFILE "in while hits"; #close(OUTFILE); my $sequ = $gb->get_Seq_by_version($hit->name); my $dna = $sequ->seq(); # get the sequence as a string push(@seqs,$dna); } } } } } # open(OUTFILE,'>',$outfile); #print OUTFILE $seqs[0]; # close(OUTFILE); return(@seqs); } Here in the above code,my program is able to go till the 'else' part and writing the output file i.e.,this step. my $filename = $result->query_name()."\.out"; But when I tried to enter in to the next while loop where I can get the hits,the program is not entering into the while loop i.e., Not entering into this while ( my $hit = $result->next_hit ) { next unless ( $v > 0); Hence I am unable to get any hits for my query. Ex:-If the query's accession number is Tb11.02.2210, I could just get a file Tb11.02.2210.out file,it is just displaying the file name on the browser. Please help me in solving this problem and mail me regarding any confusions. Regards, Roopa. From ashvip at gmail.com Wed Dec 2 00:24:09 2009 From: ashvip at gmail.com (Vipin Singh) Date: Wed, 2 Dec 2009 10:54:09 +0530 Subject: [Bioperl-l] Problems with installation Message-ID: <8d766b180912012124q44c58f62hecc598615f65e99c@mail.gmail.com> Dear Sir/Madam, I have not been able to install bioperl on my Windows 32 machine despite repeated attempts. I have tried both Active Perl and Strwaberry perl but both do not seem to work. I have followed the instruction given at -- http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows Please guide. Thanks, Vipin. Vipin Singh, Senior Research Fellow, Centre for Cellular and Molecular Biology, Hyderabad - 500007 India. contact - 91-040-27192778 From scott at scottcain.net Wed Dec 2 09:18:37 2009 From: scott at scottcain.net (Scott Cain) Date: Wed, 2 Dec 2009 09:18:37 -0500 Subject: [Bioperl-l] Problems with installation In-Reply-To: <8d766b180912012124q44c58f62hecc598615f65e99c@mail.gmail.com> References: <8d766b180912012124q44c58f62hecc598615f65e99c@mail.gmail.com> Message-ID: <4536f7700912020618y31f8fa15i6e01ce9614a87341@mail.gmail.com> Hello Vipin, "do not seem to work" doesn't give us much to go on; can you tell us what happened? Scott On Wed, Dec 2, 2009 at 12:24 AM, Vipin Singh wrote: > Dear Sir/Madam, > I have not been able to install bioperl on my Windows 32 machine despite > repeated attempts. I have tried both Active Perl and Strwaberry perl but > both do not seem to work. > I have followed the instruction given at > -- http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows > > Please guide. > Thanks, > Vipin. > Vipin Singh, > Senior Research Fellow, > Centre for Cellular and Molecular Biology, > Hyderabad - 500007 > India. > contact - 91-040-27192778 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From maj at fortinbras.us Wed Dec 2 09:18:31 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 2 Dec 2009 09:18:31 -0500 Subject: [Bioperl-l] Problems with installation In-Reply-To: <8d766b180912012124q44c58f62hecc598615f65e99c@mail.gmail.com> References: <8d766b180912012124q44c58f62hecc598615f65e99c@mail.gmail.com> Message-ID: <4A3B25FFC79F43E1AF65E56FD1630F44@NewLife> Hi Vipin-- We need some more information; your commands, error messages you received. Thanks, Mark ----- Original Message ----- From: "Vipin Singh" To: Sent: Wednesday, December 02, 2009 12:24 AM Subject: [Bioperl-l] Problems with installation > Dear Sir/Madam, > I have not been able to install bioperl on my Windows 32 machine despite > repeated attempts. I have tried both Active Perl and Strwaberry perl but > both do not seem to work. > I have followed the instruction given at > -- http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows > > Please guide. > Thanks, > Vipin. > Vipin Singh, > Senior Research Fellow, > Centre for Cellular and Molecular Biology, > Hyderabad - 500007 > India. > contact - 91-040-27192778 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From bcantarel at som.umaryland.edu Wed Dec 2 13:36:27 2009 From: bcantarel at som.umaryland.edu (Brandi Cantarel) Date: Wed, 2 Dec 2009 13:36:27 -0500 Subject: [Bioperl-l] Parsing Genbank Message-ID: Hi all, I am not sure if this is normal, but when I use SEQIO to parse genbank files, it changes the coordinates of things on the minus strand. For example, I have a sequence that has a CDS on the minus strand at it is from 911 to 974. The sequence is 974 nt. x $cds->start 1 x $cds->end 64 How can I get the original coordinates? Is there a command for that or will I have to just do the math? Feature or Bug? ~~~~~~~~~~~~~~~~~~~~ Brandi Cantarel, PhD Bioinformatics Analyst Institute for Genome Sciences School of Medicine University of Maryland, Baltimore From maj at fortinbras.us Wed Dec 2 14:09:11 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 2 Dec 2009 14:09:11 -0500 Subject: [Bioperl-l] Parsing Genbank In-Reply-To: References: Message-ID: Hi Brandi- If $cds is a Bio::SeqFeature::Generic, that's weird (I believe); if its an ordinary Bio::Seq, that's normal. Can you elaborate by posting your code? cheers, MAJ ----- Original Message ----- From: "Brandi Cantarel" To: Sent: Wednesday, December 02, 2009 1:36 PM Subject: [Bioperl-l] Parsing Genbank > Hi all, > I am not sure if this is normal, but when I use SEQIO to parse genbank files, > it changes the coordinates of things on the minus strand. > > > For example, I have a sequence that has a CDS on the minus strand at it is > from 911 to 974. The sequence is 974 nt. > > x $cds->start > 1 > x $cds->end > 64 > > How can I get the original coordinates? Is there a command for that or will I > have to just do the math? > > Feature or Bug? > > > ~~~~~~~~~~~~~~~~~~~~ > Brandi Cantarel, PhD > Bioinformatics Analyst > Institute for Genome Sciences > School of Medicine > University of Maryland, Baltimore > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From bcantarel at som.umaryland.edu Wed Dec 2 14:29:56 2009 From: bcantarel at som.umaryland.edu (Brandi Cantarel) Date: Wed, 2 Dec 2009 14:29:56 -0500 Subject: [Bioperl-l] Parsing Genbank In-Reply-To: References: Message-ID: <854933BD-4EF5-4A74-8CE2-62E11AD64B6A@som.umaryland.edu> Here is some of my code, the real code actually enters the data into a database. $in = Bio::SeqIO->new(-file => $gbkfile, '-format' => 'genbank'); W1:while (my $seq = $in->next_seq()) { my @feats = $seq->get_all_SeqFeatures(); my $j = 0; F1:foreach $cds (@feats) { next F1 unless ($cds->primary_tag() eq 'CDS'); #do something with the cds start and cds end } } LOCUS subjpool12_contig3 974 bp DNA linear UNK 19-Nov-2009 ACCESSION subjpool12_contig3 KEYWORDS . SOURCE human metagenome ORGANISM human metagenome unclassified sequences; organismal metagenomes,metagenomes. FEATURES Location/Qualifiers source 1..974 /mol_type="genomic DNA" /isolation_source="Homo sapiens" /organism="human metagenome" /collection_date="19-Nov-2009" CDS complement(911..974) /locus_tag="subjpool12_contig3|metagene|gene_2" /translation="IRIMTVELINPYIRHVEHST" /score="2.52804" /product="hypothetical protein" /note="score=2.52804" /note="score=2.52804" /note="frame=1" ORIGIN #some sequence?. >From this example, I would like to get the coordinates 911 and 974, rather than 1 and 64. ~~~~~~~~~~~~~~~~~~~~ Brandi Cantarel, PhD Bioinformatics Analyst Institute for Genome Sciences School of Medicine University of Maryland, Baltimore On Dec 2, 2009, at 2:09 PM, Mark A. Jensen wrote: > Hi Brandi- > If $cds is a Bio::SeqFeature::Generic, that's weird (I believe); if its an ordinary Bio::Seq, that's normal. > Can you elaborate by posting your code? > cheers, > MAJ > ----- Original Message ----- From: "Brandi Cantarel" > To: > Sent: Wednesday, December 02, 2009 1:36 PM > Subject: [Bioperl-l] Parsing Genbank > > >> Hi all, >> I am not sure if this is normal, but when I use SEQIO to parse genbank files, it changes the coordinates of things on the minus strand. >> >> >> For example, I have a sequence that has a CDS on the minus strand at it is from 911 to 974. The sequence is 974 nt. >> >> x $cds->start >> 1 >> x $cds->end >> 64 >> >> How can I get the original coordinates? Is there a command for that or will I have to just do the math? >> >> Feature or Bug? >> >> >> ~~~~~~~~~~~~~~~~~~~~ >> Brandi Cantarel, PhD >> Bioinformatics Analyst >> Institute for Genome Sciences >> School of Medicine >> University of Maryland, Baltimore >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > From maj at fortinbras.us Wed Dec 2 14:48:44 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 2 Dec 2009 14:48:44 -0500 Subject: [Bioperl-l] Parsing Genbank In-Reply-To: <854933BD-4EF5-4A74-8CE2-62E11AD64B6A@som.umaryland.edu> References: <854933BD-4EF5-4A74-8CE2-62E11AD64B6A@som.umaryland.edu> Message-ID: <24B3D1A1667D44338CDE5A4FFE425C56@NewLife> with fake seq data and that header, I don't get a problem: DB<2> x $cds->location 0 Bio::Location::Simple=HASH(0x37b1df4) '_end' => 974 '_location_type' => 'EXACT' '_root_verbose' => 0 '_seqid' => 'subjpool12_contig3' '_start' => 911 '_strand' => '-1' Are you using the latest BioPerl (1.6.1 or the trunk) ? MAJ ----- Original Message ----- From: "Brandi Cantarel" Cc: Sent: Wednesday, December 02, 2009 2:29 PM Subject: Re: [Bioperl-l] Parsing Genbank Here is some of my code, the real code actually enters the data into a database. $in = Bio::SeqIO->new(-file => $gbkfile, '-format' => 'genbank'); W1:while (my $seq = $in->next_seq()) { my @feats = $seq->get_all_SeqFeatures(); my $j = 0; F1:foreach $cds (@feats) { next F1 unless ($cds->primary_tag() eq 'CDS'); ###>> debugger stops here for above output #do something with the cds start and cds end } } LOCUS subjpool12_contig3 974 bp DNA linear UNK 19-Nov-2009 ACCESSION subjpool12_contig3 KEYWORDS . SOURCE human metagenome ORGANISM human metagenome unclassified sequences; organismal metagenomes,metagenomes. FEATURES Location/Qualifiers source 1..974 /mol_type="genomic DNA" /isolation_source="Homo sapiens" /organism="human metagenome" /collection_date="19-Nov-2009" CDS complement(911..974) /locus_tag="subjpool12_contig3|metagene|gene_2" /translation="IRIMTVELINPYIRHVEHST" /score="2.52804" /product="hypothetical protein" /note="score=2.52804" /note="score=2.52804" /note="frame=1" ORIGIN #some sequence?. >From this example, I would like to get the coordinates 911 and 974, rather than >1 and 64. ~~~~~~~~~~~~~~~~~~~~ Brandi Cantarel, PhD Bioinformatics Analyst Institute for Genome Sciences School of Medicine University of Maryland, Baltimore On Dec 2, 2009, at 2:09 PM, Mark A. Jensen wrote: > Hi Brandi- > If $cds is a Bio::SeqFeature::Generic, that's weird (I believe); if its an > ordinary Bio::Seq, that's normal. > Can you elaborate by posting your code? > cheers, > MAJ > ----- Original Message ----- From: "Brandi Cantarel" > > To: > Sent: Wednesday, December 02, 2009 1:36 PM > Subject: [Bioperl-l] Parsing Genbank > > >> Hi all, >> I am not sure if this is normal, but when I use SEQIO to parse genbank files, >> it changes the coordinates of things on the minus strand. >> >> >> For example, I have a sequence that has a CDS on the minus strand at it is >> from 911 to 974. The sequence is 974 nt. >> >> x $cds->start >> 1 >> x $cds->end >> 64 >> >> How can I get the original coordinates? Is there a command for that or will >> I have to just do the math? >> >> Feature or Bug? >> >> >> ~~~~~~~~~~~~~~~~~~~~ >> Brandi Cantarel, PhD >> Bioinformatics Analyst >> Institute for Genome Sciences >> School of Medicine >> University of Maryland, Baltimore >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Dec 2 14:39:40 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 2 Dec 2009 13:39:40 -0600 Subject: [Bioperl-l] Parsing Genbank In-Reply-To: <854933BD-4EF5-4A74-8CE2-62E11AD64B6A@som.umaryland.edu> References: <854933BD-4EF5-4A74-8CE2-62E11AD64B6A@som.umaryland.edu> Message-ID: <0E82A338-9D28-4685-A7DA-5019060D96F5@illinois.edu> That one's odd; the coordinates should relate back to the original sequence. Any chance you could pass on the sequence file so we can confirm it? you can do this off-list if the information is sensitive, or you can create a faux sequence that has the same problem). chris On Dec 2, 2009, at 1:29 PM, Brandi Cantarel wrote: > Here is some of my code, the real code actually enters the data into a database. > > > $in = Bio::SeqIO->new(-file => $gbkfile, > '-format' => 'genbank'); > > W1:while (my $seq = $in->next_seq()) { > my @feats = $seq->get_all_SeqFeatures(); > my $j = 0; > F1:foreach $cds (@feats) { > next F1 unless ($cds->primary_tag() eq 'CDS'); > #do something with the cds start and cds end > } > } > > > LOCUS subjpool12_contig3 974 bp DNA linear UNK 19-Nov-2009 > ACCESSION subjpool12_contig3 > KEYWORDS . > SOURCE human metagenome > ORGANISM human metagenome > unclassified sequences; organismal metagenomes,metagenomes. > FEATURES Location/Qualifiers > source 1..974 > /mol_type="genomic DNA" > /isolation_source="Homo sapiens" > /organism="human metagenome" > /collection_date="19-Nov-2009" > CDS complement(911..974) > /locus_tag="subjpool12_contig3|metagene|gene_2" > /translation="IRIMTVELINPYIRHVEHST" > /score="2.52804" > /product="hypothetical protein" > /note="score=2.52804" > /note="score=2.52804" > /note="frame=1" > ORIGIN > #some sequence?. > > > > >> From this example, I would like to get the coordinates 911 and 974, rather than 1 and 64. > > > > ~~~~~~~~~~~~~~~~~~~~ > Brandi Cantarel, PhD > Bioinformatics Analyst > Institute for Genome Sciences > School of Medicine > University of Maryland, Baltimore > > On Dec 2, 2009, at 2:09 PM, Mark A. Jensen wrote: > >> Hi Brandi- >> If $cds is a Bio::SeqFeature::Generic, that's weird (I believe); if its an ordinary Bio::Seq, that's normal. >> Can you elaborate by posting your code? >> cheers, >> MAJ >> ----- Original Message ----- From: "Brandi Cantarel" >> To: >> Sent: Wednesday, December 02, 2009 1:36 PM >> Subject: [Bioperl-l] Parsing Genbank >> >> >>> Hi all, >>> I am not sure if this is normal, but when I use SEQIO to parse genbank files, it changes the coordinates of things on the minus strand. >>> >>> >>> For example, I have a sequence that has a CDS on the minus strand at it is from 911 to 974. The sequence is 974 nt. >>> >>> x $cds->start >>> 1 >>> x $cds->end >>> 64 >>> >>> How can I get the original coordinates? Is there a command for that or will I have to just do the math? >>> >>> Feature or Bug? >>> >>> >>> ~~~~~~~~~~~~~~~~~~~~ >>> Brandi Cantarel, PhD >>> Bioinformatics Analyst >>> Institute for Genome Sciences >>> School of Medicine >>> University of Maryland, Baltimore >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Wed Dec 2 15:52:28 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 2 Dec 2009 15:52:28 -0500 Subject: [Bioperl-l] Parsing Genbank In-Reply-To: <001B6793-D1C3-46EF-AA96-CCA1B684AD8E@som.umaryland.edu> References: <854933BD-4EF5-4A74-8CE2-62E11AD64B6A@som.umaryland.edu> <24B3D1A1667D44338CDE5A4FFE425C56@NewLife> <001B6793-D1C3-46EF-AA96-CCA1B684AD8E@som.umaryland.edu> Message-ID: <07332179362A4D53ACAA9A72AD208049@NewLife> Yes, 1.006 is 1.6. There is a later update 1.6.1, but it sounds as if there is a bug. If you can provide data that can reproduce it, as Chris suggests, we can get onto it. thanks MAJ ----- Original Message ----- From: Brandi Cantarel To: Mark A. Jensen Sent: Wednesday, December 02, 2009 3:38 PM Subject: Re: [Bioperl-l] Parsing Genbank How can I tell what version I am using?When I use the command from the website: perl -MBio::Root::Version -e 'printf "%vd\n", $Bio::Root::Version::VERSION' I get 1.006, but the bioperl lib was updated in July, so probably 1.6.0 version since that was the last stable release?. Brandi On Dec 2, 2009, at 2:48 PM, Mark A. Jensen wrote: with fake seq data and that header, I don't get a problem: DB<2> x $cds->location 0 Bio::Location::Simple=HASH(0x37b1df4) '_end' => 974 '_location_type' => 'EXACT' '_root_verbose' => 0 '_seqid' => 'subjpool12_contig3' '_start' => 911 '_strand' => '-1' Are you using the latest BioPerl (1.6.1 or the trunk) ? MAJ ----- Original Message ----- From: "Brandi Cantarel" Cc: Sent: Wednesday, December 02, 2009 2:29 PM Subject: Re: [Bioperl-l] Parsing Genbank Here is some of my code, the real code actually enters the data into a database. $in = Bio::SeqIO->new(-file => $gbkfile, '-format' => 'genbank'); W1:while (my $seq = $in->next_seq()) { my @feats = $seq->get_all_SeqFeatures(); my $j = 0; F1:foreach $cds (@feats) { next F1 unless ($cds->primary_tag() eq 'CDS'); ###>> debugger stops here for above output #do something with the cds start and cds end } } LOCUS subjpool12_contig3 974 bp DNA linear UNK 19-Nov-2009 ACCESSION subjpool12_contig3 KEYWORDS . SOURCE human metagenome ORGANISM human metagenome unclassified sequences; organismal metagenomes,metagenomes. FEATURES Location/Qualifiers source 1..974 /mol_type="genomic DNA" /isolation_source="Homo sapiens" /organism="human metagenome" /collection_date="19-Nov-2009" CDS complement(911..974) /locus_tag="subjpool12_contig3|metagene|gene_2" /translation="IRIMTVELINPYIRHVEHST" /score="2.52804" /product="hypothetical protein" /note="score=2.52804" /note="score=2.52804" /note="frame=1" ORIGIN #some sequence?. From this example, I would like to get the coordinates 911 and 974, rather than 1 and 64. ~~~~~~~~~~~~~~~~~~~~ Brandi Cantarel, PhD Bioinformatics Analyst Institute for Genome Sciences School of Medicine University of Maryland, Baltimore On Dec 2, 2009, at 2:09 PM, Mark A. Jensen wrote: Hi Brandi- If $cds is a Bio::SeqFeature::Generic, that's weird (I believe); if its an ordinary Bio::Seq, that's normal. Can you elaborate by posting your code? cheers, MAJ ----- Original Message ----- From: "Brandi Cantarel" To: Sent: Wednesday, December 02, 2009 1:36 PM Subject: [Bioperl-l] Parsing Genbank Hi all, I am not sure if this is normal, but when I use SEQIO to parse genbank files, it changes the coordinates of things on the minus strand. For example, I have a sequence that has a CDS on the minus strand at it is from 911 to 974. The sequence is 974 nt. x $cds->start 1 x $cds->end 64 How can I get the original coordinates? Is there a command for that or will I have to just do the math? Feature or Bug? ~~~~~~~~~~~~~~~~~~~~ Brandi Cantarel, PhD Bioinformatics Analyst Institute for Genome Sciences School of Medicine University of Maryland, Baltimore _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Dec 2 16:07:58 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 2 Dec 2009 15:07:58 -0600 Subject: [Bioperl-l] Parsing Genbank In-Reply-To: <07332179362A4D53ACAA9A72AD208049@NewLife> References: <854933BD-4EF5-4A74-8CE2-62E11AD64B6A@som.umaryland.edu> <24B3D1A1667D44338CDE5A4FFE425C56@NewLife> <001B6793-D1C3-46EF-AA96-CCA1B684AD8E@som.umaryland.edu> <07332179362A4D53ACAA9A72AD208049@NewLife> Message-ID: <23AE9399-B370-4DB3-94AA-AC8021AF321E@illinois.edu> One never knows, but I would be very surprised if this somehow snuck by the test suite we have, particularly since Gbrowse extensively uses SeqFeatures (any changes should have popped out along the way). Not much we can do unless we have something to help confirm the problem. Also might help to know the source of the genbank file itself. chris On Dec 2, 2009, at 2:52 PM, Mark A. Jensen wrote: > Yes, 1.006 is 1.6. There is a later update 1.6.1, but it sounds > as if there is a bug. If you can provide data that can reproduce > it, as Chris suggests, we can get onto it. > thanks MAJ > ----- Original Message ----- > From: Brandi Cantarel > To: Mark A. Jensen > Sent: Wednesday, December 02, 2009 3:38 PM > Subject: Re: [Bioperl-l] Parsing Genbank > > > How can I tell what version I am using?When I use the command from the website: > > > perl -MBio::Root::Version -e 'printf "%vd\n", $Bio::Root::Version::VERSION' > > > I get 1.006, but the bioperl lib was updated in July, so probably 1.6.0 version since that was the last stable release?. > > > Brandi > > > > > On Dec 2, 2009, at 2:48 PM, Mark A. Jensen wrote: > > > with fake seq data and that header, I don't get a problem: > > DB<2> x $cds->location > 0 Bio::Location::Simple=HASH(0x37b1df4) > '_end' => 974 > '_location_type' => 'EXACT' > '_root_verbose' => 0 > '_seqid' => 'subjpool12_contig3' > '_start' => 911 > '_strand' => '-1' > > Are you using the latest BioPerl (1.6.1 or the trunk) ? > MAJ > ----- Original Message ----- From: "Brandi Cantarel" > Cc: > Sent: Wednesday, December 02, 2009 2:29 PM > Subject: Re: [Bioperl-l] Parsing Genbank > > > Here is some of my code, the real code actually enters the data into a database. > > > $in = Bio::SeqIO->new(-file => $gbkfile, > '-format' => 'genbank'); > > W1:while (my $seq = $in->next_seq()) { > my @feats = $seq->get_all_SeqFeatures(); > my $j = 0; > F1:foreach $cds (@feats) { > next F1 unless ($cds->primary_tag() eq 'CDS'); > ###>> debugger stops here for above output > > #do something with the cds start and cds end > } > } > > > LOCUS subjpool12_contig3 974 bp DNA linear UNK 19-Nov-2009 > ACCESSION subjpool12_contig3 > KEYWORDS . > SOURCE human metagenome > ORGANISM human metagenome > unclassified sequences; organismal metagenomes,metagenomes. > FEATURES Location/Qualifiers > source 1..974 > /mol_type="genomic DNA" > /isolation_source="Homo sapiens" > /organism="human metagenome" > /collection_date="19-Nov-2009" > CDS complement(911..974) > /locus_tag="subjpool12_contig3|metagene|gene_2" > /translation="IRIMTVELINPYIRHVEHST" > /score="2.52804" > /product="hypothetical protein" > /note="score=2.52804" > /note="score=2.52804" > /note="frame=1" > ORIGIN > #some sequence?. > > > > > > From this example, I would like to get the coordinates 911 and 974, rather than 1 and 64. > > > > > ~~~~~~~~~~~~~~~~~~~~ > Brandi Cantarel, PhD > Bioinformatics Analyst > Institute for Genome Sciences > School of Medicine > University of Maryland, Baltimore > > On Dec 2, 2009, at 2:09 PM, Mark A. Jensen wrote: > > > Hi Brandi- > > If $cds is a Bio::SeqFeature::Generic, that's weird (I believe); if its an ordinary Bio::Seq, that's normal. > > Can you elaborate by posting your code? > > cheers, > > MAJ > > ----- Original Message ----- From: "Brandi Cantarel" > > To: > > Sent: Wednesday, December 02, 2009 1:36 PM > > Subject: [Bioperl-l] Parsing Genbank > > > > > > Hi all, > > I am not sure if this is normal, but when I use SEQIO to parse genbank files, it changes the coordinates of things on the minus strand. > > > > > > For example, I have a sequence that has a CDS on the minus strand at it is from 911 to 974. The sequence is 974 nt. > > > > x $cds->start > > 1 > > x $cds->end > > 64 > > > > How can I get the original coordinates? Is there a command for that or will I have to just do the math? > > > > Feature or Bug? > > > > > > ~~~~~~~~~~~~~~~~~~~~ > > Brandi Cantarel, PhD > > Bioinformatics Analyst > > Institute for Genome Sciences > > School of Medicine > > University of Maryland, Baltimore > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From lstein at cshl.edu Thu Dec 3 05:31:31 2009 From: lstein at cshl.edu (Lincoln Stein) Date: Thu, 3 Dec 2009 05:31:31 -0500 Subject: [Bioperl-l] modENCODE seeking data managers Message-ID: <6dce9a0b0912030231p740d0ecbj4a7e79a6ab71801d@mail.gmail.com> Hi All, My apologies for spamming the list, but this announcement may be of interest: The modENCODE Data Coordinating Center (Model Organism Encylopedia of DNA Elements; www.modencode.org) is seeking data managers to gather and curate large scale functional genomics data sets in fly and worm. For details, see http://blog.modencode.org/?p=350. Lincoln -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa From dan.bolser at gmail.com Thu Dec 3 06:44:40 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Thu, 3 Dec 2009 11:44:40 +0000 Subject: [Bioperl-l] Merging separate sequence and quality files to FASTQ ? Message-ID: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com> Hi, can someone test the script here on zero length fasta / qual files? http://www.bioperl.org/wiki/Merging_separate_sequence_and_quality_files_to_FASTQ It seems the output has an extra newline in the sequence part of the output (which throws off scripts that rely on the 'four lines per record' structure of the fastq (although I'm not sure if it's illegal fastq). Here is what I see BEGIN $ head one.fna >FVF7ZWH02PFOVG length=0 xy=2116_2074 region=2 $ head one.qual >FVF7ZWH02PFOVG length=0 xy=2116_2074 region=2 $ createFastq.plx one.fna one.qual @FVF7ZWH02PFOVG +FVF7ZWH02PFOVG END Currently I just put in a clause in the script to skip any zero length sequences, but I think the Qual shouldn't output an extra newline like this. Cheers, Dan. -- JHB: Bioinformatics is Biology and Biology is Bioinformatics. From biopython at maubp.freeserve.co.uk Thu Dec 3 07:12:15 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 3 Dec 2009 12:12:15 +0000 Subject: [Bioperl-l] Merging separate sequence and quality files to FASTQ ? In-Reply-To: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com> References: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com> Message-ID: <320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com> On Thu, Dec 3, 2009 at 11:44 AM, Dan Bolser wrote: > Hi, can someone test the script here on zero length fasta / qual files? > > http://www.bioperl.org/wiki/Merging_separate_sequence_and_quality_files_to_FASTQ > > It seems the output has an extra newline in the sequence part of the > output (which throws off scripts that rely on the 'four lines per > record' structure of the fastq (although I'm not sure if it's illegal > fastq). Hi Dan, The OBF consensus was FASTQ records with a zero length sequence might be useful, and should be output as exactly four lines (one blank sequence line, one blank quality line). However for parsing, any number of blank lines should be OK. http://lists.open-bio.org/pipermail/open-bio-l/2009-July/000522.html I can confirm the perl script currently outputs a FASTQ file with TWO blank lines for the sequence, giving five lines in total for the zero length record. That does suggest a bug. What version of BioPerl are you running? Peter P.S. The script is throwing away any description after the identifier. From dan.bolser at gmail.com Thu Dec 3 08:07:27 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Thu, 3 Dec 2009 13:07:27 +0000 Subject: [Bioperl-l] Merging separate sequence and quality files to FASTQ ? In-Reply-To: <320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com> References: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com> <320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com> Message-ID: <2c8757af0912030507h1b3d12b8p60c8b405792d6579@mail.gmail.com> 2009/12/3 Peter : > On Thu, Dec 3, 2009 at 11:44 AM, Dan Bolser wrote: >> Hi, can someone test the script here on zero length fasta / qual files? >> >> http://www.bioperl.org/wiki/Merging_separate_sequence_and_quality_files_to_FASTQ >> >> It seems the output has an extra newline in the sequence part of the >> output (which throws off scripts that rely on the 'four lines per >> record' structure of the fastq (although I'm not sure if it's illegal >> fastq). > > Hi Dan, > > The OBF consensus was FASTQ records with a zero length > sequence might be useful, and should be output as exactly > four lines (one blank sequence line, one blank quality line). > However for parsing, any number of blank lines should be OK. > http://lists.open-bio.org/pipermail/open-bio-l/2009-July/000522.html > > I can confirm the perl script currently outputs a FASTQ file > with TWO blank lines for the sequence, giving five lines in > total for the zero length record. That does suggest a bug. > What version of BioPerl are you running? Hi Peter, Basically, I'm not running the 'latest' version of BP, which is why I asked this question of the list rather than filing a bug report. What version are you running? ;-) Sounds like 5 lines instead of the expected 4 is a minor bug. (Thanks for the info). > Peter > > P.S. The script is throwing away any description after the > identifier. That's probably bad. Feel free to edit the script on the wiki. Sadly, MediaWiki's diff features are less than optimal, so developing scripts on the wiki isn't ideal. Anyone know how to plug git-hub into a script apparently hosted on a wiki? Or is git-hub basically designed to be 'wiki for code'? I'm wondering, because with the FlaggedRevs extension you could basically build a whole release in the wiki. Which would be fun if nothing else! -- JHP: Biology is bioinformatics and bioinformatics is biology. From heyne at informatik.uni-freiburg.de Thu Dec 3 08:19:51 2009 From: heyne at informatik.uni-freiburg.de (Steffen Heyne) Date: Thu, 03 Dec 2009 14:19:51 +0100 Subject: [Bioperl-l] problem with alignments and sequence locations In-Reply-To: References: <4AF962AA.7060908@informatik.uni-freiburg.de> Message-ID: <4B17BAF7.2050604@informatik.uni-freiburg.de> Hello, so I tried to fix the problem with the location. Currently it works for me with the following changes: LocatableSeq.pm sub get_nse{ ... my $ret; if ($self->strand() >= 0) { $ret = $id . $v. $char1 . $st . $char2 . $end ; } else { $ret = $id . $v. $char1 . $end . $char2 . $st ; } return $ret; } Then I recognized during the usage of $aln->remove_seq() that it cannot remove a seq as it uses a wrong NSE to lookup sequences. I changed the following: SimpleAlign.pm sub remove_seq { ... $id = $seq->id(); $start = $seq->start(); $end = $seq->end(); ## changed code: my $v = $seq->version ? '.'.$seq->version : ''; if ($seq->strand >=0){ $name = sprintf("%s%s/%d-%d",$id,$v,$start,$end); } elsif ($seq->strand == -1){ $name = sprintf("%s%s/%d-%d",$id,$v,$end,$start); } ... } The above code in LocatableSeq.pm worked in the case if I read an alignment in stockholm format and write it out in clustalw format. But if I read an alignment in clustalw and write it out as stockholm (or something else) it didn't worked, as the strand is not correctly set in ClustalW::next_aln. It works with the following changes: ClustalW.pm sub next_aln{ ... my ( $sname, $start, $end, $strand ); ## strand added $strand = 0; ## new, standard = 0??? foreach my $name ( sort { $order{$a} <=> $order{$b} } keys %alignments ) { if ( $name =~ /(\S+):(\d+)-(\d+)/ ) { ( $sname, $start, $end ) = ( $1, $2, $3 ); $strand = 1; ## new if ($start > $end) { ## new ($start, $end, $strand) = ($end, $start, -1); ##new } ## new } else { ( $sname, $start ) = ( $name, 1 ); my $str = $alignments{$name}; $str =~ s/[^A-Za-z]//g; $end = length($str); } my $seq = Bio::LocatableSeq->new( -seq => $alignments{$name}, -id => $sname, -start => $start, -end => $end, -strand=> $strand ## new ); ... } So I don't know if I changed things at their correct position. And I found them only because I used certain functions. I dont know how broad the effect of a changed NSE in LocatableSeq.pm is to other Modules and functions. But I'm happy with my changes (so far :-)...). Do you will change this to your proposed way in bioperl trunk? Thanks! steffen Chris Fields schrieb: > On Nov 10, 2009, at 6:55 AM, Steffen Heyne wrote: > >> Hi, >> >> I'm using Bioperl for my research and it is very useful! Thank you! >> >> Currently I have a problem with locations tags of sequences. I read in >> seed alignments of Rfam (in stockholm format, but I think it is >> similar to other formats). >> >> If the location is like: >> >> AB194432.1/908-846 >> >> the start/end values are changed to >> >> $seq->start = 846 >> $seq->end = 908 >> >> and therefore the new location (e.g.$seq->get_nse) is: >> >> AB194432.1/846-908 >> >> The $seq->strand tag is correctly set to -1 in this case, but if the >> alignment is written out again (clustal, stockholm,...) this strand >> info is lost and the sequences have this "wrong" location. But this >> information is important in respect to the sequence accession number. >> >> Is there a way to set the location back to the original one or is this >> behavior desired? Any manually setting with $seq->start($val) failed >> due to automatic checking. >> >> I'm using bioperl 1.6.1 >> >> Thanks! >> >> steffen > > This is a definite bug. We recently discussed amending the NSE format > due to this (the subject came up over the last few months or so); it's > fallen through the cracks. Fortunaely it is very easy to fix (the > relevant method is in LocatableSeq). > > Does anyone have a problem with me adding this in? It will change > output for only those instances where the strand is -1, so > > AB194432.1/908-846 > > would be start = 846, end = 908, strand = -1 > > AB194432.1/846-908 > > would be start = 846, end = 908, strand = 1 > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- --- Steffen Heyne, Dipl.-Bioinf. Lehrstuhl f?r Bioinformatik Institut f?r Informatik Albert-Ludwigs-Universit?t Freiburg Georges-K?hler-Allee 106 79110 Freiburg, Germany Tel: (+49) 761 203 7465 Fax: (+49) 761 203 7462 Mail: heyne at informatik.uni-freiburg.de From cjfields at illinois.edu Thu Dec 3 08:47:32 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 3 Dec 2009 07:47:32 -0600 Subject: [Bioperl-l] Merging separate sequence and quality files to FASTQ ? In-Reply-To: <2c8757af0912030507h1b3d12b8p60c8b405792d6579@mail.gmail.com> References: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com> <320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com> <2c8757af0912030507h1b3d12b8p60c8b405792d6579@mail.gmail.com> Message-ID: <747E7628-2A72-4064-A7B9-65A904ACDFE7@illinois.edu> Dan, On Dec 3, 2009, at 7:07 AM, Dan Bolser wrote: > 2009/12/3 Peter : >> On Thu, Dec 3, 2009 at 11:44 AM, Dan Bolser wrote: >>> Hi, can someone test the script here on zero length fasta / qual files? >>> >>> http://www.bioperl.org/wiki/Merging_separate_sequence_and_quality_files_to_FASTQ >>> >>> It seems the output has an extra newline in the sequence part of the >>> output (which throws off scripts that rely on the 'four lines per >>> record' structure of the fastq (although I'm not sure if it's illegal >>> fastq). >> >> Hi Dan, >> >> The OBF consensus was FASTQ records with a zero length >> sequence might be useful, and should be output as exactly >> four lines (one blank sequence line, one blank quality line). >> However for parsing, any number of blank lines should be OK. >> http://lists.open-bio.org/pipermail/open-bio-l/2009-July/000522.html >> >> I can confirm the perl script currently outputs a FASTQ file >> with TWO blank lines for the sequence, giving five lines in >> total for the zero length record. That does suggest a bug. >> What version of BioPerl are you running? > > Hi Peter, > > Basically, I'm not running the 'latest' version of BP, which is why I > asked this question of the list rather than filing a bug report. What > version are you running? ;-) > > Sounds like 5 lines instead of the expected 4 is a minor bug. (Thanks > for the info). FASTQ parsing had undergone a major revision prior to 1.6.1 (the latest release in CPAN). Basically, it now parses all three FASTQ variants. However, Peter indicates there may still be a problem, and it's likely he's running 1.6.1. Peter can you confirm that? >> Peter >> >> P.S. The script is throwing away any description after the >> identifier. > > That's probably bad. Feel free to edit the script on the wiki. Sadly, > MediaWiki's diff features are less than optimal, so developing scripts > on the wiki isn't ideal. Anyone know how to plug git-hub into a script > apparently hosted on a wiki? > > Or is git-hub basically designed to be 'wiki for code'? It's more an integrated solution for hosting code via git, with a wiki, bug queue, etc. Think Soourceforge, but a lot nicer and with no ads ;> BitBucket/Hg is another (very nice) solution along the same lines, developed in Python (Github is Ruby-centric). > I'm wondering, because with the FlaggedRevs extension you could > basically build a whole release in the wiki. Which would be fun if > nothing else! I'm not following you there. Could you elaborate on why that would be beneficial? I could see ( chris From biopython at maubp.freeserve.co.uk Thu Dec 3 09:20:32 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 3 Dec 2009 14:20:32 +0000 Subject: [Bioperl-l] Merging separate sequence and quality files to FASTQ ? In-Reply-To: <747E7628-2A72-4064-A7B9-65A904ACDFE7@illinois.edu> References: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com> <320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com> <2c8757af0912030507h1b3d12b8p60c8b405792d6579@mail.gmail.com> <747E7628-2A72-4064-A7B9-65A904ACDFE7@illinois.edu> Message-ID: <320fb6e00912030620m6ce87fc6t310750969e320be7@mail.gmail.com> On Thu, Dec 3, 2009 at 1:47 PM, Chris Fields wrote: > > FASTQ parsing had undergone a major revision prior to > 1.6.1 (the latest release in CPAN). ?Basically, it now parses > all three FASTQ variants. ?However, Peter indicates there > may still be a problem, and it's likely he's running 1.6.1. > Peter can you confirm that? I had BioPerl from SVN circa 1.6.1 (not sure if this was before or after the release of 1.6.1 now): $ perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"' 1.0069 $ perl -MBio::SeqIO -e 'print $Bio::SeqIO::VERSION,"\n"' 1.0069 If the tuples mean anything to you: $ perl -MBio::Root::Version -e 'printf "%vd\n", $Bio::Root::Version::VERSION' 49.46.48.48.54.57 $ perl -MBio::SeqIO -e 'printf "%vd\n", $Bio::SeqIO::VERSION' 49.46.48.48.54.57 I just updated to revision 16435, and retested. I get the same BioPerl version numbers, and the same extra blank line in the sequence FASTQ output as Dan reported. Peter From cjfields at illinois.edu Thu Dec 3 09:39:35 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 3 Dec 2009 08:39:35 -0600 Subject: [Bioperl-l] Merging separate sequence and quality files to FASTQ ? In-Reply-To: <320fb6e00912030620m6ce87fc6t310750969e320be7@mail.gmail.com> References: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com> <320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com> <2c8757af0912030507h1b3d12b8p60c8b405792d6579@mail.gmail.com> <747E7628-2A72-4064-A7B9-65A904ACDFE7@illinois.edu> <320fb6e00912030620m6ce87fc6t310750969e320be7@mail.gmail.com> Message-ID: On Dec 3, 2009, at 8:20 AM, Peter wrote: > On Thu, Dec 3, 2009 at 1:47 PM, Chris Fields wrote: >> >> FASTQ parsing had undergone a major revision prior to >> 1.6.1 (the latest release in CPAN). Basically, it now parses >> all three FASTQ variants. However, Peter indicates there >> may still be a problem, and it's likely he's running 1.6.1. >> Peter can you confirm that? > > I had BioPerl from SVN circa 1.6.1 (not sure if this was before > or after the release of 1.6.1 now): > > $ perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"' > 1.0069 > $ perl -MBio::SeqIO -e 'print $Bio::SeqIO::VERSION,"\n"' > 1.0069 > > If the tuples mean anything to you: > > $ perl -MBio::Root::Version -e 'printf "%vd\n", $Bio::Root::Version::VERSION' > 49.46.48.48.54.57 > $ perl -MBio::SeqIO -e 'printf "%vd\n", $Bio::SeqIO::VERSION' > 49.46.48.48.54.57 > > I just updated to revision 16435, and retested. I get the same > BioPerl version numbers, and the same extra blank line in the > sequence FASTQ output as Dan reported. > > Peter Okay I will try to look into it today (it should be an easy fix). There are two issues, correct? 1) extra blank line. 2) missing description Dan, could you go ahead and submit this as a bug, just in case (so we don't lose track)? Otherwise it might get lost on the mail list or wiki. chris From biopython at maubp.freeserve.co.uk Thu Dec 3 09:56:39 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 3 Dec 2009 14:56:39 +0000 Subject: [Bioperl-l] Merging separate sequence and quality files to FASTQ ? In-Reply-To: References: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com> <320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com> <2c8757af0912030507h1b3d12b8p60c8b405792d6579@mail.gmail.com> <747E7628-2A72-4064-A7B9-65A904ACDFE7@illinois.edu> <320fb6e00912030620m6ce87fc6t310750969e320be7@mail.gmail.com> Message-ID: <320fb6e00912030656p5b75a566t22e1d2037d945338@mail.gmail.com> On Thu, Dec 3, 2009 at 2:39 PM, Chris Fields wrote: > Okay I will try to look into it today (it should be an easy fix). ?There are two issues, correct? > > 1) extra blank line. Which seems to be a bug in BioPerl SeqIO itself. > 2) missing description This is just a trivial bug/omission in the wiki example, http://www.bioperl.org/wiki/Merging_separate_sequence_and_quality_files_to_FASTQ You just need to replace this: my $bsq_obj = Bio::Seq::Quality-> new( -id => $seq_obj->id, -seq => $seq_obj->seq, -qual => $qual_obj->qual, ); With: my $bsq_obj = Bio::Seq::Quality-> new( -id => $seq_obj->id, -description => $seq_obj->description, -seq => $seq_obj->seq, -qual => $qual_obj->qual, ); Look - I seem to be learning Perl by osmosis ;) Peter From dan.bolser at gmail.com Thu Dec 3 11:29:11 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Thu, 3 Dec 2009 16:29:11 +0000 Subject: [Bioperl-l] Merging separate sequence and quality files to FASTQ ? In-Reply-To: <320fb6e00912030656p5b75a566t22e1d2037d945338@mail.gmail.com> References: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com> <320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com> <2c8757af0912030507h1b3d12b8p60c8b405792d6579@mail.gmail.com> <747E7628-2A72-4064-A7B9-65A904ACDFE7@illinois.edu> <320fb6e00912030620m6ce87fc6t310750969e320be7@mail.gmail.com> <320fb6e00912030656p5b75a566t22e1d2037d945338@mail.gmail.com> Message-ID: <2c8757af0912030829t54e87a4bmf166370ca10e966a@mail.gmail.com> 2009/12/3 Peter : > On Thu, Dec 3, 2009 at 2:39 PM, Chris Fields wrote: >> Okay I will try to look into it today (it should be an easy fix). ?There are two issues, correct? ... >> 2) missing description > > This is just a trivial bug/omission in the wiki example, ... > Look - I seem to be learning Perl by osmosis ;) Yay! From dan.bolser at gmail.com Thu Dec 3 11:30:44 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Thu, 3 Dec 2009 16:30:44 +0000 Subject: [Bioperl-l] Merging separate sequence and quality files to FASTQ ? In-Reply-To: <747E7628-2A72-4064-A7B9-65A904ACDFE7@illinois.edu> References: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com> <320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com> <2c8757af0912030507h1b3d12b8p60c8b405792d6579@mail.gmail.com> <747E7628-2A72-4064-A7B9-65A904ACDFE7@illinois.edu> Message-ID: <2c8757af0912030830n718f8cc7hc9e501919435e4a8@mail.gmail.com> 2009/12/3 Chris Fields : > Dan, > > On Dec 3, 2009, at 7:07 AM, Dan Bolser wrote: ... >> I'm wondering, because with the FlaggedRevs extension you could >> basically build a whole release in the wiki. Which would be fun if >> nothing else! > > I'm not following you there. ?Could you elaborate on why that would be beneficial? ?I could see ( I never said it would be beneficial, only that it would be fun. http://www.mediawiki.org/wiki/Flaggedrevs From florent.angly at gmail.com Thu Dec 3 13:26:57 2009 From: florent.angly at gmail.com (Florent Angly) Date: Thu, 03 Dec 2009 10:26:57 -0800 Subject: [Bioperl-l] problem with alignments and sequence locations In-Reply-To: <4B17BAF7.2050604@informatik.uni-freiburg.de> References: <4AF962AA.7060908@informatik.uni-freiburg.de> <4B17BAF7.2050604@informatik.uni-freiburg.de> Message-ID: <4B1802F1.1040304@gmail.com> Hi all, Like Steffen, I've had a few burning questions too regarding LocatableSeq lately. I've had an occasional issue with LocatableSeq. Most assembly-related modules use LocatableSeq objects. They specify the sequence start but not the sequence end. This works in most cases, but I've recently encountered very occasional error messages related to having not explicitely set the end of the sequence. I've been unable to put together a small test case to reproduce the bug easily. My question is. If the start of the sequence is set, is it mandatory to set the end of the sequence? If so, then maybe the documentation needs to be explicit about it and maybe there needs to be a check that enforces that the end is set. In fact, it seems like if I provide a sequence and its start position, the LocatableSeq code should be able to automatically calculate its end, no? Florent Steffen Heyne wrote: > Hello, > > so I tried to fix the problem with the location. Currently it works for > me with the following changes: > > LocatableSeq.pm > > sub get_nse{ > > ... > > my $ret; > if ($self->strand() >= 0) { > $ret = $id . $v. $char1 . $st . $char2 . $end ; > } else { > $ret = $id . $v. $char1 . $end . $char2 . $st ; > } > return $ret; > } > > Then I recognized during the usage of $aln->remove_seq() that it cannot > remove a seq as it uses a wrong NSE to lookup sequences. I changed the > following: > > SimpleAlign.pm > > sub remove_seq { > > ... > $id = $seq->id(); > $start = $seq->start(); > $end = $seq->end(); > > ## changed code: > > my $v = $seq->version ? '.'.$seq->version : ''; > if ($seq->strand >=0){ > $name = sprintf("%s%s/%d-%d",$id,$v,$start,$end); > } elsif ($seq->strand == -1){ > $name = sprintf("%s%s/%d-%d",$id,$v,$end,$start); > } > ... > > } > > The above code in LocatableSeq.pm worked in the case if I read an > alignment in stockholm format and write it out in clustalw format. But > if I read an alignment in clustalw and write it out as stockholm (or > something else) it didn't worked, as the strand is not correctly set in > ClustalW::next_aln. It works with the following changes: > > ClustalW.pm > > sub next_aln{ > > ... > > my ( $sname, $start, $end, $strand ); ## strand added > $strand = 0; ## new, standard = 0??? > foreach my $name ( sort { $order{$a} <=> $order{$b} } keys > %alignments ) { > if ( $name =~ /(\S+):(\d+)-(\d+)/ ) { > ( $sname, $start, $end ) = ( $1, $2, $3 ); > $strand = 1; ## new > if ($start > $end) { ## new > ($start, $end, $strand) = ($end, $start, -1); ##new > } ## new > > } > else { > ( $sname, $start ) = ( $name, 1 ); > my $str = $alignments{$name}; > $str =~ s/[^A-Za-z]//g; > $end = length($str); > } > > my $seq = Bio::LocatableSeq->new( > -seq => $alignments{$name}, > -id => $sname, > -start => $start, > -end => $end, > -strand=> $strand ## new > ); > > ... > > } > > So I don't know if I changed things at their correct position. And I > found them only because I used certain functions. I dont know how broad > the effect of a changed NSE in LocatableSeq.pm is to other Modules and > functions. But I'm happy with my changes (so far :-)...). > > Do you will change this to your proposed way in bioperl trunk? > > Thanks! > > steffen > > > Chris Fields schrieb: > >> On Nov 10, 2009, at 6:55 AM, Steffen Heyne wrote: >> >> >>> Hi, >>> >>> I'm using Bioperl for my research and it is very useful! Thank you! >>> >>> Currently I have a problem with locations tags of sequences. I read in >>> seed alignments of Rfam (in stockholm format, but I think it is >>> similar to other formats). >>> >>> If the location is like: >>> >>> AB194432.1/908-846 >>> >>> the start/end values are changed to >>> >>> $seq->start = 846 >>> $seq->end = 908 >>> >>> and therefore the new location (e.g.$seq->get_nse) is: >>> >>> AB194432.1/846-908 >>> >>> The $seq->strand tag is correctly set to -1 in this case, but if the >>> alignment is written out again (clustal, stockholm,...) this strand >>> info is lost and the sequences have this "wrong" location. But this >>> information is important in respect to the sequence accession number. >>> >>> Is there a way to set the location back to the original one or is this >>> behavior desired? Any manually setting with $seq->start($val) failed >>> due to automatic checking. >>> >>> I'm using bioperl 1.6.1 >>> >>> Thanks! >>> >>> steffen >>> >> This is a definite bug. We recently discussed amending the NSE format >> due to this (the subject came up over the last few months or so); it's >> fallen through the cracks. Fortunaely it is very easy to fix (the >> relevant method is in LocatableSeq). >> >> Does anyone have a problem with me adding this in? It will change >> output for only those instances where the strand is -1, so >> >> AB194432.1/908-846 >> >> would be start = 846, end = 908, strand = -1 >> >> AB194432.1/846-908 >> >> would be start = 846, end = 908, strand = 1 >> >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > From cjfields at illinois.edu Thu Dec 3 23:16:48 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 3 Dec 2009 22:16:48 -0600 Subject: [Bioperl-l] Merging separate sequence and quality files to FASTQ ? In-Reply-To: <2c8757af0912030830n718f8cc7hc9e501919435e4a8@mail.gmail.com> References: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com> <320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com> <2c8757af0912030507h1b3d12b8p60c8b405792d6579@mail.gmail.com> <747E7628-2A72-4064-A7B9-65A904ACDFE7@illinois.edu> <2c8757af0912030830n718f8cc7hc9e501919435e4a8@mail.gmail.com> Message-ID: <37058F8C-419E-4E88-AC4F-543FF9B563E1@illinois.edu> On Dec 3, 2009, at 10:30 AM, Dan Bolser wrote: > 2009/12/3 Chris Fields : >> Dan, >> >> On Dec 3, 2009, at 7:07 AM, Dan Bolser wrote: > > ... > >>> I'm wondering, because with the FlaggedRevs extension you could >>> basically build a whole release in the wiki. Which would be fun if >>> nothing else! >> >> I'm not following you there. Could you elaborate on why that would be beneficial? I could see ( > > I never said it would be beneficial, only that it would be fun. > > http://www.mediawiki.org/wiki/Flaggedrevs Ah, okay, that makes some sense. Just to stay on subject, committed a fix (r16439) to bioperl-live that addresses the additional newline issue. chris From rtbio.2009 at gmail.com Fri Dec 4 08:57:21 2009 From: rtbio.2009 at gmail.com (Roopa Raghuveer) Date: Fri, 4 Dec 2009 14:57:21 +0100 Subject: [Bioperl-l] Regarding Organism based search in Remote blast Message-ID: Hello all, I am working on Remote blast.Here,I am trying to get 2 parameters into the remote blast code.They are 1.The input sequence that has to be sent to blast 2.Organism (The organism which has to be searched for ex:-Trypanasoma brucei etc.,) When I tried to take the organism parameter as an input from the user,through a web page,the Remote blast was not giving any results i.e., it says that there are no alignments found. But,when I hard coded the organism in the code,it gives me the results i.e., 3hits. I could not understand this problem.Could any body please help me in this regard? My code is sub blastcode { $input1= $_[0]; $organ= $_[1]; open(NUC,'>',$nuc); print NUC $input1; close(NUC); my $prog = 'blastn'; my $db = 'refseq_rna'; my $e_val= '1e-10'; my $organism= $organ; $gb = new Bio::DB::GenBank; my @params = ( '-prog' => $prog, '-data' => $db, '-expect' => $e_val, '-readmethod' => 'SearchIO', '-Organism' => $organism ); open(OUTFILE,'>',$debugfile); print OUTFILE @params; close(OUTFILE); my $factory = Bio::Tools::Run::RemoteBlast->new(@params); #change a paramter $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$organism[ORGN]'; #change a paramter # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]'; my $v = 1; #$v is just to turn on and off the messages my $str = Bio::SeqIO->new(-file => $nuc , '-format' => 'fasta' , '-Organism' => $organism ); while (my $input = $str->next_seq()) { #Blast a sequence against a database: #Alternatively, you could pass in a file with many #sequences rather than loop through sequence one at a time #Remove the loop starting 'while (my $input = $str->next_seq())' #and swap the two lines below for an example of that. my $r = $factory->submit_blast($input); # my $r = $factory->submit_blast('amino.fa'); print STDERR "waiting...." if($v>0); while ( my @rids = $factory->each_rid ) { foreach my $rid ( @rids ) { my $rc = $factory->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $factory->remove_rid($rid); } print STDERR "." if ( $v > 0 ); sleep 5; } else { my $result = $rc->next_result(); #save the output $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; # open(BLASTDEBUGFILE,'>',$debugfile); # print BLASTDEBUGFILE $result->next_hit(); # close(BLASTDEBUGFILE); my $filename = $serverpath."/blastdata_".time().$result->query_name()."\.out"; # open(DEBUGFILE,'>',$debugfile); # open(new,'>',$filename); # @arra=; # print DEBUGFILE @arra; # close(DEBUGFILE); # close(new); $factory->save_output($filename); # open(BLASTDEBUGFILE,'>',$debugfile); # print BLASTDEBUGFILE "Hello $rid"; # close(BLASTDEBUGFILE); $factory->remove_rid($rid); open(BLASTDEBUGFILE,'>',$blastdebugfile); print BLASTDEBUGFILE $organism; close(BLASTDEBUGFILE); # open(OUTFILE,'>',$outfile); # print OUTFILE "Test2 $result->database_name()"; # close(OUTFILE); #$hit = $result->next_hit; #open(new,'>',$debugfile); #print $hit; #close(new); while ( my $hit = $result->next_hit ) { next unless ( $v > 0); # open(OUTFILE,'>',$debugfile); # print OUTFILE "$hit in while hits"; # close(OUTFILE); my $sequ = $gb->get_Seq_by_version($hit->name); my $dna = $sequ->seq(); # get the sequence as a string push(@seqs,$dna); } } } } } #open(OUTFILE,'>',$debugfile); #print OUTFILE $seqs[0]; #close(OUTFILE); return(@seqs); } Regards, Roopa. From cjfields at illinois.edu Fri Dec 4 09:59:17 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 4 Dec 2009 08:59:17 -0600 Subject: [Bioperl-l] Regarding Organism based search in Remote blast In-Reply-To: References: Message-ID: <77EDAB6B-68B5-460C-AD9F-EB45B9C3AFF7@illinois.edu> Roopa, At one point a couple of parameters differed between NCBI's web interface and our RemoteBlast-based BLAST interface to URLAPI (this should be indicated in your BLAST reports). See here: http://thread.gmane.org/gmane.comp.lang.perl.bio.general/14155 Also, are the returned hits specific for the genome? You shoudl double-check; in some cases you have to set both HEADER and RETRIEVALHEADER to get the expected results (not sure why): http://article.gmane.org/gmane.comp.lang.perl.bio.general/18737/match=remoteblast+ncbi chris On Dec 4, 2009, at 7:57 AM, Roopa Raghuveer wrote: > Hello all, > > I am working on Remote blast.Here,I am trying to get 2 parameters into the > remote blast code.They are > > 1.The input sequence that has to be sent to blast > > 2.Organism (The organism which has to be searched for ex:-Trypanasoma brucei > etc.,) > > When I tried to take the organism parameter as an input from the > user,through a web page,the Remote blast was not giving any results i.e., it > says that there are no alignments found. > > But,when I hard coded the organism in the code,it gives me the results i.e., > 3hits. > > I could not understand this problem.Could any body please help me in this > regard? > > My code is > > sub blastcode > { > > $input1= $_[0]; > > $organ= $_[1]; > > open(NUC,'>',$nuc); > print NUC $input1; > close(NUC); > > my $prog = 'blastn'; > my $db = 'refseq_rna'; > my $e_val= '1e-10'; > my $organism= $organ; > > $gb = new Bio::DB::GenBank; > > my @params = ( '-prog' => $prog, > '-data' => $db, > '-expect' => $e_val, > '-readmethod' => 'SearchIO', > '-Organism' => $organism ); > > open(OUTFILE,'>',$debugfile); > print OUTFILE @params; > close(OUTFILE); > > > my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > > #change a paramter > $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$organism[ORGN]'; > #change a paramter > # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]'; > > my $v = 1; > #$v is just to turn on and off the messages > > my $str = Bio::SeqIO->new(-file => $nuc , '-format' => 'fasta' , > '-Organism' => $organism ); > > while (my $input = $str->next_seq()) > > { > #Blast a sequence against a database: > #Alternatively, you could pass in a file with many > #sequences rather than loop through sequence one at a time > #Remove the loop starting 'while (my $input = $str->next_seq())' > #and swap the two lines below for an example of that. > > my $r = $factory->submit_blast($input); > > # my $r = $factory->submit_blast('amino.fa'); > > print STDERR "waiting...." if($v>0); > > while ( my @rids = $factory->each_rid ) { > > foreach my $rid ( @rids ) { > > my $rc = $factory->retrieve_blast($rid); > > if( !ref($rc) ) > { > if( $rc < 0 ) > { > $factory->remove_rid($rid); > } > print STDERR "." if ( $v > 0 ); > sleep 5; > } > else { > my $result = $rc->next_result(); > #save the output > $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; > > # open(BLASTDEBUGFILE,'>',$debugfile); > # print BLASTDEBUGFILE $result->next_hit(); > # close(BLASTDEBUGFILE); > > my $filename = > $serverpath."/blastdata_".time().$result->query_name()."\.out"; > > # open(DEBUGFILE,'>',$debugfile); > # open(new,'>',$filename); > # @arra=; > # print DEBUGFILE @arra; > # close(DEBUGFILE); > # close(new); > $factory->save_output($filename); > > # open(BLASTDEBUGFILE,'>',$debugfile); > # print BLASTDEBUGFILE "Hello $rid"; > # close(BLASTDEBUGFILE); > > $factory->remove_rid($rid); > > open(BLASTDEBUGFILE,'>',$blastdebugfile); > print BLASTDEBUGFILE $organism; > close(BLASTDEBUGFILE); > > # open(OUTFILE,'>',$outfile); > # print OUTFILE "Test2 $result->database_name()"; > # close(OUTFILE); > > #$hit = $result->next_hit; > #open(new,'>',$debugfile); > #print $hit; > #close(new); > > while ( my $hit = $result->next_hit ) { > > next unless ( $v > 0); > > # open(OUTFILE,'>',$debugfile); > # print OUTFILE "$hit in while hits"; > # close(OUTFILE); > > my $sequ = $gb->get_Seq_by_version($hit->name); > my $dna = $sequ->seq(); # get the sequence as a string > push(@seqs,$dna); > } > } > } > } > } > > #open(OUTFILE,'>',$debugfile); > #print OUTFILE $seqs[0]; > #close(OUTFILE); > > return(@seqs); > } > > Regards, > Roopa. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From robert.bradbury at gmail.com Fri Dec 4 13:27:38 2009 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Fri, 4 Dec 2009 13:27:38 -0500 Subject: [Bioperl-l] Gene critical region analysis -- visual display Message-ID: Background: I have been involved in aging research off and on for ~16 years. My initial focus was in the eventual decline of the "program" (because DNA has no ECC and only limited redundancy) therefore my initial work (in the early 1990's was focused on DNA repair genes (of which there about 150 in the human genome) [1,2]. Most recently I have focused in on the DNA double strand break repair processes (NHEJ) as a fundamental cause of aging because it may fundamentally corrupt the genomes of individual cells. (And as most programmers would agree -- break the code and you break the program). Michael Lieber at UCLA has estimated that by the time a human is ~70 on the order of several hundred genes in ones cells have been corrupted (which may be an indeterminate effect on the cells functioning). Problem: Just looking at the GenBank output for the human Artemis (DCLRE1C) gene there are on the order of 18 SNPs and 8 possible phosphorylation sites (not to mention other potential modification sites) -- this combined with the fact that Methionine and Tryptophan and to a lesser extent Cysteine are more susceptible to single base mutations (due the alteration of the codon->amino acid coding even involving single base mutations/repairs) . There are various programs to analyze such proteins for the critical sites -- SIFT and the various programs pointed to by their sites. Now it seems to me that one could attack this problem by integrating SNPs, mutations, etc. at the critical sites (where "critical" may or may not be at normal SNPs -- which presumably are primarily at non-critical sites -- and those proteins where if you change the coding sequence to non-synomonous amino acids you potentially break the protein (the real interpretation of which will not be understood until population studies are done). So, in the process of looking at the DCLRE1C protein I asked myself, "Why is there not a BioPerl function which simply enables a visual interpretation of the critical sites of the protein?" I.e. some color-coded representation of the protein (which presumably has some augmented functionality to determine things like probability or statistical information). I.e. hand the function a .fasta file and it will give you an visual (colored) analysis of the critical nature of specific a.a. -- i.e. something which could be used by genomic or SNP analysis (such as I presume that being done by 23andme -- as well as other organizations) to begin to separate out the variations in the human genome (e.g. SNPs) from the mutations which may effect individuals. I have the C programming and to a lesser extent Perl experience to contribute to this -- I lack the BioPerl wisdom to make it generally available. If anyone has some suggestions as to what functions/modules might be of use (in providing a "single-look" view of gene a.a. whose mutations may be more or less detrimental) I would appreciate hearing from them. Robert Bradbury 1. "DNA Repair and Mutagenesis", E.C. Friedberg et al, 2nd Ed., ASM Press (2006) 2. "Aging of the Genome", J. Vijg, Oxford University Press (2007) From maj at fortinbras.us Sun Dec 6 17:54:00 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sun, 6 Dec 2009 17:54:00 -0500 Subject: [Bioperl-l] bioperl-mode new feature: base class browsing Message-ID: <59494F4102D84535B3A5D05B595ACBF7@NewLife> Hi All, You can now browse pod of the base/parent classes of bioperl modules with one keystroke using the latest update of bioperl-mode. See http://bioperl.org/wiki/Emacs_bioperl-mode Press "B" or "P" while in pod view to get a completion list of the parent classes for the module whose pod you're viewing. cheers, MAJ From mmokrejs at ribosome.natur.cuni.cz Mon Dec 7 15:33:48 2009 From: mmokrejs at ribosome.natur.cuni.cz (=?UTF-8?B?TWFydGluIE1PS1JFSsWg?=) Date: Mon, 07 Dec 2009 21:33:48 +0100 Subject: [Bioperl-l] Generalized reciprocal blast In-Reply-To: References: Message-ID: <4B1D66AC.4080804@ribosome.natur.cuni.cz> Hi, I just stumbled across this older posting ... maybe you want to exploit SIMAP (http://webclu.bio.wzw.tum.de/portal/web/simap/). I think it has remote API available. Martin Robert Bradbury wrote: > I would like to know whether or not anyone has attempted to create a > "generalized" reciprocal blast component for BioPerl? > > One sees papers all the time where they discuss running reciprocal blasts to > compare a new species to an old "standard" species or a set of species or > running an all-to-all set of comparisons to match up all of the "known" > proteins from species and determine which are outliers (and therefore > "novel"). There are also accumulating merged sets in NCBI HomoloGene (which > seems to be a some strict subset (perhaps a dozen) "well sequenced" genomes) > and Ensembl (which seems to be working with a much larger set of 40-50 > genomes some of which may be somewhat incomplete and are certainly poorly > "explored". > > I have, I believe, seen code "fragments" from various authors, perhaps some > on the BioPerl list, which perform some major subset of a typical > "reciprocal blast". > > Now what I am looking for is a relatively generalizable some-to-some > reciprocal blast utility. I want to be able to specify the genes (or gene > family), e.g. some of the ~150 known DNA repair genes. It would be helpful > to also specify how "tolerant" the blast "true reciprocal" criteria are. > There are some genes where there is a very strict 1-to-1 relationship across > many genomes. But for genes which involve relatively standard domains, e.g. > "helicase" domains, the 1-to-1 relationship becomes cloudy -- in mammals for > example its more like 5-to-5 and it would be really nice to be able to > specify the strictness or quality level [1] for "matching" genes (and even > which genes are to be excluded because they are known to be false > homologues). > > Then to top this off I want to be able to combine known public e.g. > (HomoloGene / Uniigene / Ensembl) databases with perhaps local private > databases or database subsets (e.g. emerging or specialized genomes). > > The goal here of course to determine the precise phylogenetic relationships > between all of the DNA repair genes and how there may be gain / loss / > evolution of function that can be related to species characteristics (size, > longevity, etc.). > > Is there a generalized reciprocal blast component in BioPerl? Or is it a > "build-it-yourself" situation (that I have to believe has been built > probably a few dozen times by various researchers / organizations / > companies)? > > Thanks, > Robert Bradbury > > 1. This would be handled in BioPerl with a customizable user function which > could be tailored to handle specific cases -- for example a function which > when handed a set of 100 potential "matches" could go through those 100 > matches, identify common domains, and then "re-rate" matches based on > considerations such as the type and number of common domains, domains being > in the same order, etc. I.e. criteria which may be difficult to completely > generalize across entire genomes but are fairly obvious if you are looking > at a graphical replication of a gene set in HomoloGene. From robert.bradbury at gmail.com Mon Dec 7 15:41:54 2009 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Mon, 7 Dec 2009 15:41:54 -0500 Subject: [Bioperl-l] Remote blast fork errors / Process limit restrictions Message-ID: This comment could also have a subject line: "Why does Bioperl/get_sequence> fork at all! Why are not all operations sequential? And if this is a "default" mode that I'm unaware of -- How to I ever write a reliable BioPerl script if I have little or no capability of what the program uses when it runs? I may have days so I can bear the burden of relatively slow results (and so can use sequential processing rather than parallel). I've got a perl script that uses remote blast to blast a sequence against a subset of the NCBI sequences. It "mostly" works, in that it returns a seemingly complete .bls result file but when attempting to look at the sequences (so it can more accurately summarize the information from the results than a standard blast report allows) it terminates prematurely with errors. The error is: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Couldn't fork: Resource temporarily unavailable STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/vendor_perl/5.8.8/Bio/Root/Root.pm:368 STACK: Bio::DB::WebDBSeqI::_open_pipe /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:722 STACK: Bio::DB::WebDBSeqI::get_seq_stream /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:463 STACK: Bio::DB::NCBIHelper::get_Stream_by_acc /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/NCBIHelper.pm:479 STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:186 STACK: Bio::Perl::get_sequence /usr/lib/perl5/vendor_perl/5.8.8/Bio/Perl.pm:520 STACK: main::acc_2_desc /home/bradbury/Genomes/bin/RB.pl:182 STACK: /home/bradbury/Genomes/bin/RB.pl:155 ----------------------------------------------------------- The precise line (in my code) whcih appears to be generating the error is: $seq = get_sequence('GenBank', $accsn); Now this can be a problem if NCBI/Genbank fails due to load conditions -- but this specific failure (which is repeatable is due to most likely hitting the user process limit restrictions) -- but the small blast results work fine -- its only if the Blast has returned several hundred hits that it runs into this problem. Now what it sounds like to me is an attempt to do multiple asynchronous NCBI queries (to get a sequence) with complete disregard of the environment (process limits, NCBI limits, etc.). But I do not know enough about how this works to point a finger at some specific function. As a result get_sequence process results are accumulated, summarized, etc. without ever having issued to respect "wait-variant()) calls to collect former children [This IMO would clearly be a bug.] It could be adjusted to by allowing the BioPerl library to run in 3 modes. (1) completely synchronous -- if you fork you wait until its done -- and you collect "it" and any fork fails then one either collects the process or switches to the non-conservative mode. Robert From cjfields at illinois.edu Mon Dec 7 16:08:40 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 7 Dec 2009 15:08:40 -0600 Subject: [Bioperl-l] Remote blast fork errors / Process limit restrictions In-Reply-To: References: Message-ID: Robert, If you use the relative components directly (by that I mean use Bio::DB::GenBank and Bio::Tools::Run::RemoteBlast instead of Bio::Perl), you can control whether the process forks or not. All Bio::Perl does is wrap those modules for simple beginner tasks; if you want full control over the various parts of the pipeline you will need to use those tools directly. See the POD for those specific modules for more information. chris On Dec 7, 2009, at 2:41 PM, Robert Bradbury wrote: > This comment could also have a subject line: "Why does Bioperl/get_sequence> > fork at all! Why are not all operations sequential? And if this is a > "default" mode that I'm unaware of -- How to I ever write a reliable BioPerl > script if I have little or no capability of what the program uses when it > runs? I may have days so I can bear the burden of relatively slow results > (and so can use sequential processing rather than parallel). > > I've got a perl script that uses remote blast to blast a sequence against a > subset of the NCBI sequences. It "mostly" works, in that it returns a > seemingly complete .bls result file but when attempting to look at the > sequences (so it can more accurately summarize the information from the > results than a standard blast report allows) it terminates prematurely with > errors. > > The error is: > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Couldn't fork: Resource temporarily unavailable > STACK: Error::throw > STACK: Bio::Root::Root::throw > /usr/lib/perl5/vendor_perl/5.8.8/Bio/Root/Root.pm:368 > STACK: Bio::DB::WebDBSeqI::_open_pipe > /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:722 > STACK: Bio::DB::WebDBSeqI::get_seq_stream > /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:463 > STACK: Bio::DB::NCBIHelper::get_Stream_by_acc > /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/NCBIHelper.pm:479 > STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc > /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:186 > STACK: Bio::Perl::get_sequence > /usr/lib/perl5/vendor_perl/5.8.8/Bio/Perl.pm:520 > STACK: main::acc_2_desc /home/bradbury/Genomes/bin/RB.pl:182 > STACK: /home/bradbury/Genomes/bin/RB.pl:155 > ----------------------------------------------------------- > > The precise line (in my code) whcih appears to be generating the error is: > $seq = get_sequence('GenBank', $accsn); > > Now this can be a problem if NCBI/Genbank fails due to load conditions -- > but this specific failure (which is repeatable is due to most likely hitting > the user process limit restrictions) -- but the small blast results work > fine -- its only if the Blast has returned several hundred hits that it runs > into this problem. > > Now what it sounds like to me is an attempt to do multiple asynchronous NCBI > queries (to get a sequence) with complete disregard of the environment > (process limits, NCBI limits, etc.). But I do not know enough about how > this works to point a finger at some specific function. As a result > get_sequence process results are accumulated, summarized, etc. without ever > having issued to respect "wait-variant()) calls to collect former children > [This IMO would clearly be a bug.] > > It could be adjusted to by allowing the BioPerl library to run in 3 modes. > (1) completely synchronous -- if you fork you wait until its done -- and > you collect "it" and any fork fails then one either collects the process or > switches to the non-conservative mode. > > Robert > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason at bioperl.org Mon Dec 7 16:24:54 2009 From: jason at bioperl.org (Jason Stajich) Date: Mon, 7 Dec 2009 13:24:54 -0800 Subject: [Bioperl-l] Remote blast fork errors / Process limit restrictions In-Reply-To: References: Message-ID: <39420695-FC14-446F-A4C2-C187E2221E44@bioperl.org> Robert - You seem to be mixing the blast remote and the sequence query retrieval problems. These messages are related to the remote retrieval of sequences. It is hard to tell from your message specifically which modules you are using or how you are querying NCBI - there are several ways to do this either with the NCBI tools or the Bio::DB::GenBank. If you are using Bio::DB::Query::GenBank that allows for async access and has built in controls to adhere to the wait variant that NCBI requests but I don't think Bio::DB::GenBank get_Seq_by_acc method does any sort of thing (at least when it was originally written). I always advocate if you want highly available and reliable access to sequences you should download the nr or whichever DB and use the local indexing tools for the retrieval. Once you start doing hundreds of queries I don't see any good reason to be doing the query against NCBI directly given unreliabilities of the web and services. Local databases are faster and more reliable for most people so I urge you take advantage of the tools which provide local database access with the same APIs. I would like to comment that the tone of your posts to the list are not particularly helpful. I wonder if you are actually asking for help or just interested in complaining about when things don't work as you expect? This is a collaborative and volunteer-only project, with the principles of working together to make useful toolkit. We encourage you to build programs and applications from this base that suit your needs, but not all things will be directly implemented in the toolkit if they aren't generic enough (at least that is my feeling, the other Core devs help with these decisions). If there is a useful, generic, and reusable part we would like that to be part of the API. Otherwise we suggest the new application that fits a developer's vision. We encourage you to write (and publish) that application separately, but certainly encourage bug (and fixes) submissions and also code contributions for new features where they can be seen as generally useful. -jason On Dec 7, 2009, at 12:41 PM, Robert Bradbury wrote: > This comment could also have a subject line: "Why does Bioperl/ > get_sequence> > fork at all! Why are not all operations sequential? And if this is a > "default" mode that I'm unaware of -- How to I ever write a reliable > BioPerl > script if I have little or no capability of what the program uses > when it > runs? I may have days so I can bear the burden of relatively slow > results > (and so can use sequential processing rather than parallel). > > I've got a perl script that uses remote blast to blast a sequence > against a > subset of the NCBI sequences. It "mostly" works, in that it returns a > seemingly complete .bls result file but when attempting to look at the > sequences (so it can more accurately summarize the information from > the > results than a standard blast report allows) it terminates > prematurely with > errors. > > The error is: > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Couldn't fork: Resource temporarily unavailable > STACK: Error::throw > STACK: Bio::Root::Root::throw > /usr/lib/perl5/vendor_perl/5.8.8/Bio/Root/Root.pm:368 > STACK: Bio::DB::WebDBSeqI::_open_pipe > /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:722 > STACK: Bio::DB::WebDBSeqI::get_seq_stream > /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:463 > STACK: Bio::DB::NCBIHelper::get_Stream_by_acc > /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/NCBIHelper.pm:479 > STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc > /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:186 > STACK: Bio::Perl::get_sequence > /usr/lib/perl5/vendor_perl/5.8.8/Bio/Perl.pm:520 > STACK: main::acc_2_desc /home/bradbury/Genomes/bin/RB.pl:182 > STACK: /home/bradbury/Genomes/bin/RB.pl:155 > ----------------------------------------------------------- > > The precise line (in my code) whcih appears to be generating the > error is: > $seq = get_sequence('GenBank', $accsn); > > Now this can be a problem if NCBI/Genbank fails due to load > conditions -- > but this specific failure (which is repeatable is due to most likely > hitting > the user process limit restrictions) -- but the small blast results > work > fine -- its only if the Blast has returned several hundred hits that > it runs > into this problem. > > Now what it sounds like to me is an attempt to do multiple > asynchronous NCBI > queries (to get a sequence) with complete disregard of the environment > (process limits, NCBI limits, etc.). But I do not know enough about > how > this works to point a finger at some specific function. As a result > get_sequence process results are accumulated, summarized, etc. > without ever > having issued to respect "wait-variant()) calls to collect former > children > [This IMO would clearly be a bug.] > > It could be adjusted to by allowing the BioPerl library to run in 3 > modes. > (1) completely synchronous -- if you fork you wait until its done -- > and > you collect "it" and any fork fails then one either collects the > process or > switches to the non-conservative mode. > > Robert > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From Jonas_Schaer at gmx.de Tue Dec 8 10:21:58 2009 From: Jonas_Schaer at gmx.de (Jonas Schaer) Date: Tue, 8 Dec 2009 16:21:58 +0100 Subject: [Bioperl-l] fasta format Message-ID: <36E9C2F3282347918FD3B3ACA0EC8126@jonas> Hi there, I have a little question concerning bioperl. I have BioPerl-1.6.1.tar.gz installed and i use the fasta.pm module to read in some fasta files. first it worked fine, but now i have some fastafiles in slightly different format (not all lines have the same length!). ------------- EXCEPTION ------------- MSG: Each line of the fasta entry must be the same length except the last. Line above #49 ' ..' is 28 != 101 chars. STACK Bio::DB::Fasta::calculate_offsets C:/Perl/site/lib/Bio/DB/Fasta.pm:771 STACK Bio::DB::Fasta::index_file C:/Perl/site/lib/Bio/DB/Fasta.pm:681 STACK Bio::DB::Fasta::new C:/Perl/site/lib/Bio/DB/Fasta.pm:491 STACK Bio::DB::Fasta::newFh C:/Perl/site/lib/Bio/DB/Fasta.pm:513 STACK main::readfasta blast_eval.pm:174 STACK toplevel blast_eval.pm:83 ------------------------------------- indexing was interrupted, so unlinking test.fasta.index at C:/Perl/site/lib/Bio/ DB/Fasta.pm line 1054. Is there any way to use these fasta files with diffrent length of lines with this fasta.pm module or will i have to change the format of my fasta-files(big databases...) ? Thanks in advance for any help! Regards, Jonas From awitney at sgul.ac.uk Tue Dec 8 12:01:58 2009 From: awitney at sgul.ac.uk (Adam Witney) Date: Tue, 8 Dec 2009 17:01:58 +0000 Subject: [Bioperl-l] package to associate genes with branches on trees? Message-ID: Hi, I have been generating some trees with Phylip (pars) and then processing them with Bioperl. These trees are generated by comparing multiple strains of a bacterial organism by presence/absence (0/1) calls for each gene. I was wondering of there was any package in Bioperl to try to determine if any specific genes were associated with specific branches of the trees? Or if anyone knew of another tool that can do this? thanks for any help adam From jason at bioperl.org Tue Dec 8 12:44:43 2009 From: jason at bioperl.org (Jason Stajich) Date: Tue, 8 Dec 2009 09:44:43 -0800 Subject: [Bioperl-l] fasta format In-Reply-To: <36E9C2F3282347918FD3B3ACA0EC8126@jonas> References: <36E9C2F3282347918FD3B3ACA0EC8126@jonas> Message-ID: you can run sreformat (HMMER) or bp_sreformat.pl script in scripts/utilties (or that is installed when you install the Bioperl scripts) $ bp_sreformat.pl -if fasta -of fasta -i yourfile.fa -o yournewfile.fa # rename it back $ mv yournewfile.fa yourfile.fa or $ sreformat fasta yourfile.fa > yournewfile.fa $ mv yournewfile.fa yourfile.fa -jason On Dec 8, 2009, at 7:21 AM, Jonas Schaer wrote: > Hi there, > I have a little question concerning bioperl. I have > BioPerl-1.6.1.tar.gz installed and i use the fasta.pm module to read > in some fasta files. first it worked fine, but now i have some > fastafiles in slightly different format (not all lines have the same > length!). > > ------------- EXCEPTION ------------- > MSG: Each line of the fasta entry must be the same length except the > last. > Line above #49 ' > ..' is 28 != 101 chars. > STACK Bio::DB::Fasta::calculate_offsets C:/Perl/site/lib/Bio/DB/ > Fasta.pm:771 > STACK Bio::DB::Fasta::index_file C:/Perl/site/lib/Bio/DB/Fasta.pm:681 > STACK Bio::DB::Fasta::new C:/Perl/site/lib/Bio/DB/Fasta.pm:491 > STACK Bio::DB::Fasta::newFh C:/Perl/site/lib/Bio/DB/Fasta.pm:513 > STACK main::readfasta blast_eval.pm:174 > STACK toplevel blast_eval.pm:83 > ------------------------------------- > > indexing was interrupted, so unlinking test.fasta.index at C:/Perl/ > site/lib/Bio/ > DB/Fasta.pm line 1054. > > > Is there any way to use these fasta files with diffrent length of > lines with this fasta.pm module or will i have to change the format > of my fasta-files(big databases...) ? > > Thanks in advance for any help! > > Regards, Jonas > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From cjfields at illinois.edu Tue Dec 8 23:30:26 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 8 Dec 2009 22:30:26 -0600 Subject: [Bioperl-l] [ANNOUNCEMENT] BioPerl Meeting at the GMOD Conference Message-ID: <1BC089CD-75C3-437E-86A5-22220D724DF6@illinois.edu> All, For those interested, we will be holding a general BioPerl meeting, tentatively scheduled for January 13, 2010, just prior to the GMOD Community Meeting from Jan 14-15 in San Diego. This will be just following the Plant and Animal Genome (PAG) conference Jan 9-13. The exact day and time is somewhat flexible depending on attendees' schedules. For those interested, sign up here: http://www.bioperl.org/wiki/GMOD_2010_Meeting For those interested in attending the GMOD meeting or PAG: http://gmod.org/wiki/January_2010_GMOD_Meeting I can envision the following items popping up: * Refactoring of Alignment and GFF3/FeatureIO * Addressing BioPerl's monolithic nature * Moose and Perl 6 * Documentation Any others? chris From akarger at CGR.Harvard.edu Wed Dec 9 10:01:45 2009 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Wed, 9 Dec 2009 10:01:45 -0500 Subject: [Bioperl-l] fasta format In-Reply-To: <36E9C2F3282347918FD3B3ACA0EC8126@jonas> References: <36E9C2F3282347918FD3B3ACA0EC8126@jonas> Message-ID: <1B12003244CE894E85B4726023637888055929@FASXCH01.fasmail.priv> > Is there any way to use these fasta files with diffrent length of > lines with this fasta.pm module or will i have to change the format > of my fasta-files(big databases...) ? > Jonas, It's not Bioperl, but for a quick fix you can use the Scriptome. Use the change_fasta_to_tab script (http://sysbio.harvard.edu/csb/resources/computational/scriptome/Windows/Tools/Change.html#change_a_fasta_file_into_tabular_format__change_fasta_to_tab_) to change your FASTA into a tab-delimited file. Then use the next tool (change_tab_to_fasta) to change your files back. To use a tool: change the input and output file names on the website, then cut and paste the Perl script from the green box into a CMD window. The script works one sequence at a time, so it doesn't need a lot of memory. (As long as you have enough disk space to store the tab-delimited copy). The recreated FASTAs will be 60 characters per line (although you can hand-edit the line after you paste it to be whatever number of characters you'd like). Let me know if you have a problem. -Amir Karger Life Sciences Research Computing, FAS IT Harvard University From Kevin.M.Brown at asu.edu Wed Dec 9 10:26:22 2009 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Wed, 9 Dec 2009 08:26:22 -0700 Subject: [Bioperl-l] fasta format In-Reply-To: <1B12003244CE894E85B4726023637888055929@FASXCH01.fasmail.priv> References: <36E9C2F3282347918FD3B3ACA0EC8126@jonas> <1B12003244CE894E85B4726023637888055929@FASXCH01.fasmail.priv> Message-ID: <1A4207F8295607498283FE9E93B775B4066B4D53@EX02.asurite.ad.asu.edu> Even easier to accomplish in one step. Read in the fasta file and output it right to another fasta file with SeqIO my $in = Bio::SeqIO->new(-format=>'fasta',-file=>$file); my $out = Bio::SeqIO->new(-format=>'fasta',-file=>'>file.fasta'); while (my $seq = $in->next){$out->write_seq($seq);} Kevin Brown Center for Innovations in Medicine Biodesign Institute Arizona State University > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Amir Karger > Sent: Wednesday, December 09, 2009 8:02 AM > To: Jonas Schaer; bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] fasta format > > > Is there any way to use these fasta files with diffrent length of > > lines with this fasta.pm module or will i have to change the format > > of my fasta-files(big databases...) ? > > > > Jonas, > > It's not Bioperl, but for a quick fix you can use the > Scriptome. Use the change_fasta_to_tab script > (http://sysbio.harvard.edu/csb/resources/computational/scripto > me/Windows/Tools/Change.html#change_a_fasta_file_into_tabular_ > format__change_fasta_to_tab_) to change your FASTA into a > tab-delimited file. Then use the next tool > (change_tab_to_fasta) to change your files back. > > To use a tool: change the input and output file names on the > website, then cut and paste the Perl script from the green > box into a CMD window. The script works one sequence at a > time, so it doesn't need a lot of memory. (As long as you > have enough disk space to store the tab-delimited copy). > > The recreated FASTAs will be 60 characters per line (although > you can hand-edit the line after you paste it to be whatever > number of characters you'd like). > > Let me know if you have a problem. > > -Amir Karger > Life Sciences Research Computing, FAS IT > Harvard University > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From Russell.Smithies at agresearch.co.nz Wed Dec 9 14:44:41 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 10 Dec 2009 08:44:41 +1300 Subject: [Bioperl-l] fasta format In-Reply-To: <1A4207F8295607498283FE9E93B775B4066B4D53@EX02.asurite.ad.asu.edu> References: <36E9C2F3282347918FD3B3ACA0EC8126@jonas> <1B12003244CE894E85B4726023637888055929@FASXCH01.fasmail.priv> <1A4207F8295607498283FE9E93B775B4066B4D53@EX02.asurite.ad.asu.edu> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32B6603815F@exchsth.agresearch.co.nz> It's even easier as the script is already written for you :-) bp_seqconvert.pl --from fasta --to fasta < file.in.fa > file.out.fa --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Kevin Brown > Sent: Thursday, 10 December 2009 4:26 a.m. > To: bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] fasta format > > Even easier to accomplish in one step. Read in the fasta file and output > it right to another fasta file with SeqIO > > my $in = Bio::SeqIO->new(-format=>'fasta',-file=>$file); > my $out = Bio::SeqIO->new(-format=>'fasta',-file=>'>file.fasta'); > while (my $seq = $in->next){$out->write_seq($seq);} > > Kevin Brown > Center for Innovations in Medicine > Biodesign Institute > Arizona State University > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org > > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Amir Karger > > Sent: Wednesday, December 09, 2009 8:02 AM > > To: Jonas Schaer; bioperl-l at bioperl.org > > Subject: Re: [Bioperl-l] fasta format > > > > > Is there any way to use these fasta files with diffrent length of > > > lines with this fasta.pm module or will i have to change the format > > > of my fasta-files(big databases...) ? > > > > > > > Jonas, > > > > It's not Bioperl, but for a quick fix you can use the > > Scriptome. Use the change_fasta_to_tab script > > (http://sysbio.harvard.edu/csb/resources/computational/scripto > > me/Windows/Tools/Change.html#change_a_fasta_file_into_tabular_ > > format__change_fasta_to_tab_) to change your FASTA into a > > tab-delimited file. Then use the next tool > > (change_tab_to_fasta) to change your files back. > > > > To use a tool: change the input and output file names on the > > website, then cut and paste the Perl script from the green > > box into a CMD window. The script works one sequence at a > > time, so it doesn't need a lot of memory. (As long as you > > have enough disk space to store the tab-delimited copy). > > > > The recreated FASTAs will be 60 characters per line (although > > you can hand-edit the line after you paste it to be whatever > > number of characters you'd like). > > > > Let me know if you have a problem. > > > > -Amir Karger > > Life Sciences Research Computing, FAS IT > > Harvard University > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From maj at fortinbras.us Wed Dec 9 15:18:08 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 9 Dec 2009 15:18:08 -0500 Subject: [Bioperl-l] fasta format In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32B6603815F@exchsth.agresearch.co.nz> References: <36E9C2F3282347918FD3B3ACA0EC8126@jonas><1B12003244CE894E85B4726023637888055929@FASXCH01.fasmail.priv><1A4207F8295607498283FE9E93B775B4066B4D53@EX02.asurite.ad.asu.edu> <18DF7D20DFEC044098A1062202F5FFF32B6603815F@exchsth.agresearch.co.nz> Message-ID: <5C992E6556584BDFBF39604FDEA8ECE0@NewLife> $ perl -MPerlIO::via::SeqIO -e 'open($f, "<:via(SeqIO)", shift); open($g, ">:via(SeqIO::fasta)", shift); while (<$f>) { print $g $_; }' in.fas out.fas ----- Original Message ----- From: "Smithies, Russell" To: "'Kevin Brown'" ; Sent: Wednesday, December 09, 2009 2:44 PM Subject: Re: [Bioperl-l] fasta format > It's even easier as the script is already written for you :-) > > bp_seqconvert.pl --from fasta --to fasta < file.in.fa > file.out.fa > > > --Russell > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Kevin Brown >> Sent: Thursday, 10 December 2009 4:26 a.m. >> To: bioperl-l at bioperl.org >> Subject: Re: [Bioperl-l] fasta format >> >> Even easier to accomplish in one step. Read in the fasta file and output >> it right to another fasta file with SeqIO >> >> my $in = Bio::SeqIO->new(-format=>'fasta',-file=>$file); >> my $out = Bio::SeqIO->new(-format=>'fasta',-file=>'>file.fasta'); >> while (my $seq = $in->next){$out->write_seq($seq);} >> >> Kevin Brown >> Center for Innovations in Medicine >> Biodesign Institute >> Arizona State University >> >> > -----Original Message----- >> > From: bioperl-l-bounces at lists.open-bio.org >> > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Amir Karger >> > Sent: Wednesday, December 09, 2009 8:02 AM >> > To: Jonas Schaer; bioperl-l at bioperl.org >> > Subject: Re: [Bioperl-l] fasta format >> > >> > > Is there any way to use these fasta files with diffrent length of >> > > lines with this fasta.pm module or will i have to change the format >> > > of my fasta-files(big databases...) ? >> > > >> > >> > Jonas, >> > >> > It's not Bioperl, but for a quick fix you can use the >> > Scriptome. Use the change_fasta_to_tab script >> > (http://sysbio.harvard.edu/csb/resources/computational/scripto >> > me/Windows/Tools/Change.html#change_a_fasta_file_into_tabular_ >> > format__change_fasta_to_tab_) to change your FASTA into a >> > tab-delimited file. Then use the next tool >> > (change_tab_to_fasta) to change your files back. >> > >> > To use a tool: change the input and output file names on the >> > website, then cut and paste the Perl script from the green >> > box into a CMD window. The script works one sequence at a >> > time, so it doesn't need a lot of memory. (As long as you >> > have enough disk space to store the tab-delimited copy). >> > >> > The recreated FASTAs will be 60 characters per line (although >> > you can hand-edit the line after you paste it to be whatever >> > number of characters you'd like). >> > >> > Let me know if you have a problem. >> > >> > -Amir Karger >> > Life Sciences Research Computing, FAS IT >> > Harvard University >> > >> > >> > >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From kellert at ohsu.edu Wed Dec 9 19:36:13 2009 From: kellert at ohsu.edu (Tom Keller) Date: Wed, 9 Dec 2009 16:36:13 -0800 Subject: [Bioperl-l] how to map ensembl id to NCBI gi Message-ID: <435849B7-B66E-4553-988B-0645775E785E@ohsu.edu> Greetings, Is there a simple way to map a list of ensembl ids to the NCBI gis? thanks, Tom Thomas (Tom) Keller kellert at ohsu.edu 503.494.2442 6339b R Jones Hall (BSc/CROET) www.ohsu.edu/xd/research/research-cores/dna-analysis/ From cjfields at illinois.edu Wed Dec 9 20:59:37 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 9 Dec 2009 19:59:37 -0600 Subject: [Bioperl-l] how to map ensembl id to NCBI gi In-Reply-To: <435849B7-B66E-4553-988B-0645775E785E@ohsu.edu> References: <435849B7-B66E-4553-988B-0645775E785E@ohsu.edu> Message-ID: <14495B1F-911C-4FE7-8224-A3F050F7E03C@illinois.edu> Tom, Probably best to do this via BioMart: http://www.ensembl.org/biomart/ I would assume you can also do this via the ensembl perl API as well. Also, have a look at the UniProt ID Mapper: http://www.uniprot.org/?tab=mapping chris On Dec 9, 2009, at 6:36 PM, Tom Keller wrote: > Greetings, > Is there a simple way to map a list of ensembl ids to the NCBI gis? > > thanks, > Tom > > Thomas (Tom) Keller > kellert at ohsu.edu > 503.494.2442 > 6339b R Jones Hall (BSc/CROET) > www.ohsu.edu/xd/research/research-cores/dna-analysis/ > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From lovebaby39 at gmail.com Thu Dec 10 09:22:14 2009 From: lovebaby39 at gmail.com (Hsueh) Date: Thu, 10 Dec 2009 22:22:14 +0800 Subject: [Bioperl-l] about bioperl issue Message-ID: <5F281DC3E4514B3AAA8881169B240227@SHAPC> Dear The following is code. -------------------------------------------------------------------------------- my at params_rb = ( 'program' => 'blastn', 'database' => 'DB\\RB_GUS\\RB_GUS'); my $factory_rb = Bio::Tools::Run::StandAloneBlast->new(@params_rb); my $input_rb = Bio::Seq->new(-id =>"test_query", -seq => $testline2); my $blast_report_rb = $factory_rb->blastall($input_rb); while (my $result_rb = $blast_report_rb-> next_result ) { while (my $hit_rb = $result_rb->next_hit()){ while (my $hsp_rb = $hit_rb->next_hsp()){ print $hit_rb->name,"\nevalue = " , $hsp_rb->evalue , "\t score = " , $hsp_rb->score , "\n" ; #print " ",$hit->name,"\n"; } } } -------------------------------------------------------------------------------- I know how to get "name", "evalue" and "score", but I don't know how to get the word which is in red color. (or please see attachment.) ------------------------------------------------------------------------------------------------------------------ Query: 147 ctctttactcttaggtttacccgccggggtatcgtggcaaacaaggatagtttaaacaga 206 |||||| |||||||||||||||||| |||| || |||||| |||||||||||| || Sbjct: 114 ctcttttctcttaggtttacccgccaatatatcctgtcaaacactgatagtttaaactga 173 ------------------------------------------------------------------------------------------------------------------ I will appreciate if you could tell me how to do it. Thank you. Reginald Hsueh -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: R20080801-1.seq.txt URL: From SMarkel at accelrys.com Thu Dec 10 09:47:36 2009 From: SMarkel at accelrys.com (Scott Markel) Date: Thu, 10 Dec 2009 06:47:36 -0800 Subject: [Bioperl-l] about bioperl issue In-Reply-To: <5F281DC3E4514B3AAA8881169B240227@SHAPC> References: <5F281DC3E4514B3AAA8881169B240227@SHAPC> Message-ID: <5ACBA19439E77B43A06F4CAB897EC977067C6E@EXCH1-COLO.accelrys.net> Reginald, I didn't see anything highlighted in red but the three strings in the pairwise alignment display can be obtained from an HSP using $hsp->query_string() $hsp->hit_string() $hsp->homology_string() Scott Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel at accelrys.com Accelrys (SciTegic R&D) mobile: +1 858 205 3653 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 San Diego, CA 92121 fax: +1 858 799 5222 USA web: http://www.accelrys.com http://www.linkedin.com/in/smarkel Vice President, Board of Directors: International Society for Computational Biology Chair: ISCB Publications Committee Associate Editor: PLoS Computational Biology Editorial Board: Briefings in Bioinformatics -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hsueh Sent: Thursday, 10 December 2009 6:22 AM To: bioperl-l at bioperl.org Subject: [Bioperl-l] about bioperl issue Importance: High Dear The following is code. -------------------------------------------------------------------------------- my at params_rb = ( 'program' => 'blastn', 'database' => 'DB\\RB_GUS\\RB_GUS'); my $factory_rb = Bio::Tools::Run::StandAloneBlast->new(@params_rb); my $input_rb = Bio::Seq->new(-id =>"test_query", -seq => $testline2); my $blast_report_rb = $factory_rb->blastall($input_rb); while (my $result_rb = $blast_report_rb-> next_result ) { while (my $hit_rb = $result_rb->next_hit()){ while (my $hsp_rb = $hit_rb->next_hsp()){ print $hit_rb->name,"\nevalue = " , $hsp_rb->evalue , "\t score = " , $hsp_rb->score , "\n" ; #print " ",$hit->name,"\n"; } } } -------------------------------------------------------------------------------- I know how to get "name", "evalue" and "score", but I don't know how to get the word which is in red color. (or please see attachment.) ------------------------------------------------------------------------------------------------------------------ Query: 147 ctctttactcttaggtttacccgccggggtatcgtggcaaacaaggatagtttaaacaga 206 |||||| |||||||||||||||||| |||| || |||||| |||||||||||| || Sbjct: 114 ctcttttctcttaggtttacccgccaatatatcctgtcaaacactgatagtttaaactga 173 ------------------------------------------------------------------------------------------------------------------ I will appreciate if you could tell me how to do it. Thank you. Reginald Hsueh From David.Messina at sbc.su.se Thu Dec 10 10:09:31 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 10 Dec 2009 16:09:31 +0100 Subject: [Bioperl-l] about bioperl issue In-Reply-To: <5F281DC3E4514B3AAA8881169B240227@SHAPC> References: <5F281DC3E4514B3AAA8881169B240227@SHAPC> Message-ID: <107080B6-BC05-470C-B426-5DB69BD574C1@sbc.su.se> Hi Reginald, None of the words in your email or the attachment are colored red ? unfortunately any kind of formatting tends to get removed from emails send to mailing lists. Could you be more specific about what part of the blast report you are not able to get? You could even just copy and paste that particular bit of the report into your reply if it's not clear what to call it. Dave From David.Messina at sbc.su.se Thu Dec 10 10:36:49 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 10 Dec 2009 16:36:49 +0100 Subject: [Bioperl-l] about bioperl issue In-Reply-To: <9DEC7152C11A4F00B2F919B653E6D572@SHAPC> References: <5F281DC3E4514B3AAA8881169B240227@SHAPC> <107080B6-BC05-470C-B426-5DB69BD574C1@sbc.su.se> <9DEC7152C11A4F00B2F919B653E6D572@SHAPC> Message-ID: <15F92119-7625-4491-899A-0D49CE1BC861@sbc.su.se> Hi Reginald, Please keep all replies on the list so that everyone can follow the thread. In a separate email, Scott gave the answer you were looking for, I think. Namely: $hsp->query_string() OR $hsp->hit_string() Dave On Dec 10, 2009, at 16:31, Hsueh wrote: > Dear Dave Messina > > I need to get the string that is "ctctttactcttaggtttacccgccggggtatcgtggcaaacaaggatagtttaaacaga". > > Thank you > > Reginald Hsueh > > ------------------------------------------------------------------------------------------------------------------------------ > Query: 147 ctctttactcttaggtttacccgccggggtatcgtggcaaacaaggatagtttaaacaga 206 > |||||| |||||||||||||||||| |||| || |||||| |||||||||||| || > Sbjct: 114 ctcttttctcttaggtttacccgccaatatatcctgtcaaacactgatagtttaaactga 173 > ------------------------------------------------------------------------------------------------------------------------------ > > > > > -------------------------------------------------- > From: "Dave Messina" > Sent: Thursday, December 10, 2009 11:09 PM > To: "Hsueh" > Cc: > Subject: Re: [Bioperl-l] about bioperl issue > >> Hi Reginald, >> >> None of the words in your email or the attachment are colored red ? unfortunately any kind of formatting tends to get removed from emails send to mailing lists. >> >> Could you be more specific about what part of the blast report you are not able to get? You could even just copy and paste that particular bit of the report into your reply if it's not clear what to call it. >> >> >> Dave From lovebaby39 at gmail.com Thu Dec 10 10:53:00 2009 From: lovebaby39 at gmail.com (Hsueh) Date: Thu, 10 Dec 2009 23:53:00 +0800 Subject: [Bioperl-l] about bioperl issue In-Reply-To: <15F92119-7625-4491-899A-0D49CE1BC861@sbc.su.se> References: <5F281DC3E4514B3AAA8881169B240227@SHAPC> <107080B6-BC05-470C-B426-5DB69BD574C1@sbc.su.se> <9DEC7152C11A4F00B2F919B653E6D572@SHAPC> <15F92119-7625-4491-899A-0D49CE1BC861@sbc.su.se> Message-ID: Dear Dave Messina Thank you for your replies. Reginald Hsueh -------------------------------------------------- From: "Dave Messina" Sent: Thursday, December 10, 2009 11:36 PM To: "Hsueh" Cc: Subject: Re: [Bioperl-l] about bioperl issue > Hi Reginald, > > Please keep all replies on the list so that everyone can follow the > thread. > > In a separate email, Scott gave the answer you were looking for, I think. > > Namely: > $hsp->query_string() > OR > $hsp->hit_string() > > > > Dave > > > > > On Dec 10, 2009, at 16:31, Hsueh wrote: > >> Dear Dave Messina >> >> I need to get the string that is >> "ctctttactcttaggtttacccgccggggtatcgtggcaaacaaggatagtttaaacaga". >> >> Thank you >> >> Reginald Hsueh >> >> ------------------------------------------------------------------------------------------------------------------------------ >> Query: 147 ctctttactcttaggtttacccgccggggtatcgtggcaaacaaggatagtttaaacaga >> 206 >> |||||| |||||||||||||||||| |||| || |||||| >> |||||||||||| || >> Sbjct: 114 ctcttttctcttaggtttacccgccaatatatcctgtcaaacactgatagtttaaactga >> 173 >> ------------------------------------------------------------------------------------------------------------------------------ >> >> >> >> >> -------------------------------------------------- >> From: "Dave Messina" >> Sent: Thursday, December 10, 2009 11:09 PM >> To: "Hsueh" >> Cc: >> Subject: Re: [Bioperl-l] about bioperl issue >> >>> Hi Reginald, >>> >>> None of the words in your email or the attachment are colored red ? >>> unfortunately any kind of formatting tends to get removed from emails >>> send to mailing lists. >>> >>> Could you be more specific about what part of the blast report you are >>> not able to get? You could even just copy and paste that particular bit >>> of the report into your reply if it's not clear what to call it. >>> >>> >>> Dave >>>>Dear >>>> >>>>The following is code. >>>> >>>> >>>>-------------------------------------------------------------------------------- >>>> >>>>my at params_rb = ( 'program' => 'blastn', >>>> 'database' => 'DB\\RB_GUS\\RB_GUS'); >>>>my $factory_rb = Bio::Tools::Run::StandAloneBlast->new(@params_rb); >>>> >>>>my $input_rb = Bio::Seq->new(-id =>"test_query", >>>> -seq => $testline2); >>>>my $blast_report_rb = $factory_rb->blastall($input_rb); >>>> >>>> while (my $result_rb = $blast_report_rb-> next_result ) { >>>> while (my $hit_rb = $result_rb->next_hit()){ >>>> while (my $hsp_rb = $hit_rb->next_hsp()){ >>>> print $hit_rb->name,"\nevalue = " , $hsp_rb->evalue , "\t score = " >>>> , $hsp_rb->score , "\n" ; >>>> #print " ",$hit->name,"\n"; >>>> } >>>> } >>>> } >>>> >>>>-------------------------------------------------------------------------------- >>>> >>>> >>>>I know how to get "name", "evalue" and "score", but I don't know how >>>>to get the word which is in red color. (or please see attachment.) >>>>------------------------------------------------------------------------------------------------------------------ >>>>Query: 147 ctctttactcttaggtttacccgccggggtatcgtggcaaacaaggatagtttaaacaga >>>>206 >>>> |||||| |||||||||||||||||| |||| || |||||| >>>> |||||||||||| || >>>>Sbjct: 114 >>>>ctcttttctcttaggtttacccgccaatatatcctgtcaaacactgatagtttaaactga 173 >>>>------------------------------------------------------------------------------------------------------------------ >>>> >>>>I will appreciate if you could tell me how to do it. >>>>Thank you. >>>> >>>>Reginald Hsueh From pg4 at sanger.ac.uk Thu Dec 10 15:50:40 2009 From: pg4 at sanger.ac.uk (Pablo Marin-Garcia) Date: Thu, 10 Dec 2009 20:50:40 +0000 (GMT) Subject: [Bioperl-l] how to map ensembl id to NCBI gi In-Reply-To: References: Message-ID: If you are mapping ensembl genes to NCBI genes (via ensemblaPI or biomart) please read this recent thread at ensembl-dev: http://listserver.ebi.ac.uk/mailing-lists-archives/ensembl-dev/msg05417.html Seems that the ensembl gene mapping to NCBI is done through translation so the noncoding genes do not have the corresponding NCBI gene mapped. -Pablo > ------------------------------ > > Message: 4 > Date: Wed, 9 Dec 2009 19:59:37 -0600 > From: Chris Fields > Subject: Re: [Bioperl-l] how to map ensembl id to NCBI gi > To: Tom Keller > Cc: BioPerl-List > Message-ID: <14495B1F-911C-4FE7-8224-A3F050F7E03C at illinois.edu> > Content-Type: text/plain; charset=us-ascii > > Tom, > > Probably best to do this via BioMart: > > http://www.ensembl.org/biomart/ > > I would assume you can also do this via the ensembl perl API as well. > > Also, have a look at the UniProt ID Mapper: > > http://www.uniprot.org/?tab=mapping > > chris > > On Dec 9, 2009, at 6:36 PM, Tom Keller wrote: > >> Greetings, >> Is there a simple way to map a list of ensembl ids to the NCBI gis? >> >> thanks, >> Tom >> >> Thomas (Tom) Keller >> kellert at ohsu.edu >> 503.494.2442 >> 6339b R Jones Hall (BSc/CROET) >> www.ohsu.edu/xd/research/research-cores/dna-analysis/ >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > ==================================================================== Pablo Marin-Garcia, PhD \\// (Argiope bruennichi \/\/`(||>O:'\/\/ with stabilimentum) //\\ Sanger Institute | PostDoc / Computer Biologist Wellcome Trust Genome Campus | team : 128/108 (Human Genetics) Hinxton, Cambridge CB10 1HH | room : N333 United Kingdom | email: pablo.marin at sanger.ac.uk ==================================================================== -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From umjsm at leeds.ac.uk Fri Dec 11 11:44:42 2009 From: umjsm at leeds.ac.uk (Joan Segura Mora) Date: Fri, 11 Dec 2009 16:44:42 +0000 Subject: [Bioperl-l] extract and write a pdb chain Message-ID: <1260549882.6484.11.camel@limm-pc1254> Hello, I am trying to do a very easy think but I don't get it. I want to write in a file a chain of a pdb. I have try a lot of thinks but what I think that it should work is the next script: use Bio::Structure::IO; use strict; my $structio = Bio::Structure::IO->new(-file => "101m.pdb",'-format' => 'pdb'); my $struc = $structio->next_structure; my $new_entry = Bio::Structure::Entry->new( -id => 'structure_id'); for my $chain ($struc->get_chains) { if($chain->id eq "A"){ $new_entry->chain($chain); last; } } my $out = Bio::Structure::IO->new(-file => ">out.pdb",'-format' => 'pdb');# $out->write_structure($new_entry); it doesn't. I get the next error: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: add_chain: first argument needs to be a Model object () STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:368 STACK: Bio::Structure::Entry::add_chain /usr/local/share/perl/5.8.8/Bio/Structure/Entry.pm:335 STACK: Bio::Structure::Entry::get_chains /usr/local/share/perl/5.8.8/Bio/Structure/Entry.pm:391 STACK: Bio::Structure::Entry::chain /usr/local/share/perl/5.8.8/Bio/Structure/Entry.pm:304 STACK: read_pdb.pl:10 ----------------------------------------------------------- As far I understand the documentation, the method chain of the object Bio::Structure::Entry requires an as input an object of type Chain. Any solution will be very welcome. best regards, Joan From wkretzsch at gmail.com Fri Dec 11 14:22:31 2009 From: wkretzsch at gmail.com (Warren W. Kretzschmar) Date: Fri, 11 Dec 2009 14:22:31 -0500 Subject: [Bioperl-l] Proposed project: SeqIO module for msOUT files generated by Hudson's ms Message-ID: <5d2ac05c0912111122p1fea0961rfff0f1cf7aa8f97f@mail.gmail.com> Hi, I'm new to the bioperl community. I've created a perl module that reads in msOUT files generated by Hudson's ms. As far as I understand, there is no SeqIO module to read and output these files? If so, I propose to create a module that does this. Any suggestions? Thanks, Warren Kretzschmar From maj at fortinbras.us Fri Dec 11 14:59:53 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 11 Dec 2009 14:59:53 -0500 Subject: [Bioperl-l] Proposed project: SeqIO module for msOUT filesgenerated by Hudson's ms In-Reply-To: <5d2ac05c0912111122p1fea0961rfff0f1cf7aa8f97f@mail.gmail.com> References: <5d2ac05c0912111122p1fea0961rfff0f1cf7aa8f97f@mail.gmail.com> Message-ID: <07382508ED0B41F4B8289813B734239B@NewLife> Hi Warren, I say go for it. You'll want to have a look at http://bio.perl.org/wiki/Advanced_BioPerl which explains most of our tips and "policies" for prospective code contributors, as well as http://bio.perl.org/wiki/HOWTO:SeqIO which details SeqIO from the user's perspective. Look carefully at some Bio::SeqIO::* modules for implementation details. If you have code to propose, use http://bugzilla.bioperl.org and enter a new enhancement, where you can upload your module for us to review. MAJ ----- Original Message ----- From: "Warren W. Kretzschmar" To: Sent: Friday, December 11, 2009 2:22 PM Subject: [Bioperl-l] Proposed project: SeqIO module for msOUT filesgenerated by Hudson's ms > Hi, > I'm new to the bioperl community. I've created a perl module that > reads in msOUT files generated by Hudson's ms. As far as I > understand, there is no SeqIO module to read and output these files? > If so, I propose to create a module that does this. Any suggestions? > > Thanks, > Warren Kretzschmar > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From bosborne11 at verizon.net Fri Dec 11 15:37:45 2009 From: bosborne11 at verizon.net (Brian Osborne) Date: Fri, 11 Dec 2009 15:37:45 -0500 Subject: [Bioperl-l] extract and write a pdb chain In-Reply-To: <1260549882.6484.11.camel@limm-pc1254> References: <1260549882.6484.11.camel@limm-pc1254> Message-ID: Joan, It looks to me like the first argument to the add_chain() method has to be a Model object, the second is the Chain itself. See Structure/ Entry.pm, for example. However if you're seeing some documentation that says something else then tell us where, it needs to be corrected. In Bio::Structure an Entry consists of one or Models, each of which has one or more Chains. This allows you to build macromolecular complexes (an Entry), which could have more than one defined proteins or protein complexes (Models). Brian O. On Dec 11, 2009, at 11:44 AM, Joan Segura Mora wrote: > Hello, > > I am trying to do a very easy think but I don't get it. I want to > write > in a file a chain of a pdb. I have try a lot of thinks but what I > think > that it should work is the next script: > > use Bio::Structure::IO; > use strict; > > my $structio = Bio::Structure::IO->new(-file => "101m.pdb",'-format' > => > 'pdb'); > my $struc = $structio->next_structure; > > my $new_entry = Bio::Structure::Entry->new( -id => 'structure_id'); > > for my $chain ($struc->get_chains) { > if($chain->id eq "A"){ > $new_entry->chain($chain); > last; > } > } > > my $out = Bio::Structure::IO->new(-file => ">out.pdb",'-format' => > 'pdb');# > $out->write_structure($new_entry); > > it doesn't. I get the next error: > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: add_chain: first argument needs to be a Model object () > > STACK: Error::throw > STACK: > Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/Root.pm: > 368 > STACK: > Bio::Structure::Entry::add_chain /usr/local/share/perl/5.8.8/Bio/ > Structure/Entry.pm:335 > STACK: > Bio::Structure::Entry::get_chains /usr/local/share/perl/5.8.8/Bio/ > Structure/Entry.pm:391 > STACK: > Bio::Structure::Entry::chain /usr/local/share/perl/5.8.8/Bio/ > Structure/Entry.pm:304 > STACK: read_pdb.pl:10 > ----------------------------------------------------------- > > As far I understand the documentation, the method chain of the object > Bio::Structure::Entry requires an as input an object of type Chain. > > Any solution will be very welcome. > > best regards, > Joan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From awitney at sgul.ac.uk Sun Dec 13 16:48:13 2009 From: awitney at sgul.ac.uk (Adam Witney) Date: Sun, 13 Dec 2009 21:48:13 +0000 Subject: [Bioperl-l] combining tree image with heatmap Message-ID: <4B25611D.6050009@sgul.ac.uk> I am trying to draw a tree on the side of a heatmap image, much like you see after clustering data. I was wondering if anyone has managed to do this using bioperl? I can draw the two separately, but can't quite seem to work out how to put the two together and get the nodes to line up with the correct row of clustering data. Is there any particular module to look at? thanks for any help adam From dhwani1030 at gmail.com Sat Dec 12 15:04:01 2009 From: dhwani1030 at gmail.com (dhwani gandhi) Date: Sat, 12 Dec 2009 15:04:01 -0500 Subject: [Bioperl-l] Bioperl code help Message-ID: Hi, I am very new to Bioperl but I am somewhat familiar to perl though. I write my perl programs in Notepad++ and run them in cmd. Now, I want to run Bioperl programs. I just installed bioperl on my computer. And I have a program using bioperl modules in Notepad++. My question is how to run these programs? Can they be ran in cmd as well? or do I use ppm? Please help. Thanks, -Dhwani Gandhi. From eric_donaldson at med.unc.edu Sun Dec 13 18:15:24 2009 From: eric_donaldson at med.unc.edu (eric_donaldson at med.unc.edu) Date: Sun, 13 Dec 2009 18:15:24 -0500 Subject: [Bioperl-l] problem with install Message-ID: Hello, Today I downloaded bioperl 1.61 on my new macbook pro using fink.? I used the fink install bioperl.pm-588 as I could not get it to instal using the perl version 5.10. But now I get an error when trying to run a bioperl script. Here is the error: Can't locate Bio/Tools/BPlite.pm in @INC (@INC contains: /sw/lib/perl5/darwin-thread-multi-2level /sw/lib/perl5 /sw/lib/perl5/darwin /Library/Perl/Updates/5.10.0 /System/Library/Perl/5.10.0/darwin-thread-multi-2level /System/Library/Perl/5.10.0 /Library/Perl/5.10.0/darwin-thread-multi-2level /Library/Perl/5.10.0 /Network/Library/Perl/5.10.0/darwin-thread-multi-2level /Network/Library/Perl/5.10.0 /Network/Library/Perl /System/Library/Perl/Extras/5.10.0/darwin-thread-multi-2level /System/Library/Perl/Extras/5.10.0 .) at blastparser.pl line 8. BEGIN failed--compilation aborted at blastparser.pl line 8. I am a novice at unix and bioperl so I do not know how to troubleshoot this, would you please hleo me? Thank you, Eric Eric F. Donaldson, Ph.D. Research Assistant Professor, Ralph Baric Lab University of North Carolina Department of Epidemiology -------------- next part -------------- begin:vcard n:Donaldson;Eric fn:Eric F. Donaldson, PhD tel;work:919.966.3881 org:University of North Carolina, School of Medicine;Epidemiology adr:;;2107 McGavran-Greenberg Hall CB# 7435 ;Chapel Hill;NC;27599;USA email;internet:eric_donaldson at med.unc.edu email;home;internet:viralnerd at gmail.com title:Research Assistant Professor version:2.1 end:vcard From jason at bioperl.org Sun Dec 13 20:24:26 2009 From: jason at bioperl.org (Jason Stajich) Date: Sun, 13 Dec 2009 17:24:26 -0800 Subject: [Bioperl-l] problem with install In-Reply-To: References: Message-ID: <119F436D-D36D-4D28-BAE7-6EB17D665FC2@bioperl.org> Hi Eric - Bio::Tools::BPlite is no longer supported in Bioperl - it was deprecated several releases ago. It was replaced with Bio::SearchIO -jason On Dec 13, 2009, at 3:15 PM, eric_donaldson at med.unc.edu wrote: > Hello, > > Today I downloaded bioperl 1.61 on my new macbook pro using fink. I > used the > > fink install bioperl.pm-588 as I could not get it to instal using > the perl version 5.10. > > But now I get an error when trying to run a bioperl script. > > Here is the error: > > Can't locate Bio/Tools/BPlite.pm in @INC (@INC contains: /sw/lib/ > perl5/darwin-thread-multi-2level /sw/lib/perl5 /sw/lib/perl5/darwin / > Library/Perl/Updates/5.10.0 /System/Library/Perl/5.10.0/darwin- > thread-multi-2level /System/Library/Perl/5.10.0 /Library/Perl/5.10.0/ > darwin-thread-multi-2level /Library/Perl/5.10.0 /Network/Library/ > Perl/5.10.0/darwin-thread-multi-2level /Network/Library/Perl/5.10.0 / > Network/Library/Perl /System/Library/Perl/Extras/5.10.0/darwin- > thread-multi-2level /System/Library/Perl/Extras/5.10.0 .) at > blastparser.pl line 8. > BEGIN failed--compilation aborted at blastparser.pl line 8. > > > I am a novice at unix and bioperl so I do not know how to > troubleshoot this, would you please hleo me? > > Thank you, > > Eric > > > Eric F. Donaldson, Ph.D. > Research Assistant Professor, Ralph Baric Lab > University of North Carolina > Department of Epidemiology > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From jason at bioperl.org Sun Dec 13 23:09:45 2009 From: jason at bioperl.org (Jason Stajich) Date: Sun, 13 Dec 2009 20:09:45 -0800 Subject: [Bioperl-l] problem with install In-Reply-To: References: <119F436D-D36D-4D28-BAE7-6EB17D665FC2@bioperl.org> Message-ID: <404D2600-58D3-4491-834E-8C9F860D3ACC@bioperl.org> So you installed perl-5.10 or using system perl? I'm confused if you actually installed bioperl.pm or not via fink? It seems like since your @INC or $PERL5LIB points to /sw/lib/perl5 which is one of the dirs it would have installed in, but I don't think you actually installed bioperl. you can try and do: $ locate Bio/SearchIO.pm We'll see if any of the other osx/fink gurus are on the list that can help or you can install it via CPAN I guess. -jason On Dec 13, 2009, at 6:39 PM, eric_donaldson at med.unc.edu wrote: > > I actually tried a different blastparser that uses BIO::SearchIO and > got the same message: > > Can't locate Bio/SearchIO.pm in @INC (@INC contains: /sw/lib/perl5/ > darwin-thread-multi-2level /sw/lib/perl5 /sw/lib/perl5/darwin / > Library/Perl/Updates/5.10.0 /System/Library/Perl/5.10.0/darwin- > thread-multi-2level /System/Library/Perl/5.10.0 /Library/Perl/5.10.0/ > darwin-thread-multi-2level /Library/Perl/5.10.0 /Network/Library/ > Perl/5.10.0/darwin-thread-multi-2level /Network/Library/Perl/5.10.0 / > Network/Library/Perl /System/Library/Perl/Extras/5.10.0/darwin- > thread-multi-2level /System/Library/Perl/Extras/5.10.0 .) at > blastparser.new.pl line 8. > BEGIN failed--compilation aborted at blastparser.new.pl line 8. > > I suspect there is a path problem, but am not savvy enough to know > how to fix it. I am really just a hacker.... I have several scripts > that I use regularly and that I know how to modify, but am lost when > they don't work... > > Thanks for any help, > > Eric > > ----- Original Message ----- > From: Jason Stajich > Date: Sunday, December 13, 2009 8:24 pm > Subject: Re: [Bioperl-l] problem with install > To: eric_donaldson at med.unc.edu > Cc: bioperl-l at bioperl.org > >> Hi Eric - >> >> Bio::Tools::BPlite is no longer supported in Bioperl - it >> was >> deprecated several releases ago. >> It was replaced with Bio::SearchIO >> >> -jason >> On Dec 13, 2009, at 3:15 PM, eric_donaldson at med.unc.edu wrote: >> >>> Hello, >>> >>> Today I downloaded bioperl 1.61 on my new macbook pro using >> fink. I >>> used the >>> >>> fink install bioperl.pm-588 as I could not get it to instal >> using >>> the perl version 5.10. >>> >>> But now I get an error when trying to run a bioperl script. >>> >>> Here is the error: >>> >>> Can't locate Bio/Tools/BPlite.pm in @INC (@INC contains: >> /sw/lib/ >>> perl5/darwin-thread-multi-2level /sw/lib/perl5 >> /sw/lib/perl5/darwin / >>> Library/Perl/Updates/5.10.0 /System/Library/Perl/5.10.0/darwin- >> >>> thread-multi-2level /System/Library/Perl/5.10.0 >> /Library/Perl/5.10.0/ >>> darwin-thread-multi-2level /Library/Perl/5.10.0 >> /Network/Library/ >>> Perl/5.10.0/darwin-thread-multi-2level >> /Network/Library/Perl/5.10.0 / >>> Network/Library/Perl /System/Library/Perl/Extras/5.10.0/darwin- >> >>> thread-multi-2level /System/Library/Perl/Extras/5.10.0 .) >> at >>> blastparser.pl line 8. >>> BEGIN failed--compilation aborted at blastparser.pl line 8. >>> >>> >>> I am a novice at unix and bioperl so I do not know how >> to >>> troubleshoot this, would you please hleo me? >>> >>> Thank you, >>> >>> Eric >>> >>> >>> Eric F. Donaldson, Ph.D. >>> Research Assistant Professor, Ralph Baric Lab >>> University of North Carolina >>> Department of Epidemiology >>> >>> >>> >> < >> eric_donaldson.vcf>_______________________________________________> >> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Jason Stajich >> jason.stajich at gmail.com >> jason at bioperl.org >> >> > > Eric F. Donaldson, Ph.D. > Research Assistant Professor, Ralph Baric Lab > University of North Carolina > Department of Epidemiology > > > -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From jason at bioperl.org Mon Dec 14 00:10:54 2009 From: jason at bioperl.org (Jason Stajich) Date: Sun, 13 Dec 2009 21:10:54 -0800 Subject: [Bioperl-l] problem with install In-Reply-To: References: <119F436D-D36D-4D28-BAE7-6EB17D665FC2@bioperl.org> <404D2600-58D3-4491-834E-8C9F860D3ACC@bioperl.org> Message-ID: <7B2EBA9A-E9DF-49A5-ABC7-C42512BA9C9A@bioperl.org> Eric - please CC the bioperl list when responding so others can help - I can't be the only answerer. But since your @INC message doesn't include /sw/lib/perl5/5.8.8/ you would need to make sure that is added to your PERL5LIB. There are some help docs on the perl sites I expect on how to get your PATHs in order. Or you can just install via CPAN which will put it in the right path - there are docs on the bioperl website about installing via CPAN. -jason On Dec 13, 2009, at 9:02 PM, eric_donaldson at med.unc.edu wrote: > Hi Jason, > > The fink package did not have support for perl 5.10, so I attempted > to install the perl 5.8.6 package. > > When I attempted: locate Bio/SearchIO.pm > I got: -bash: $: command not found > > So even though I can find SearchIO.pm in sw/lib/perl5/5.8.8/Bio/ > SearchIO.pm I cannot access it. Do I need to use the older version > of perl? > > Would it be better to install with CPAN? If so, can you send me to > a page that has instructions? > > Thank you so much! > > ERic > > > ----- Original Message ----- > From: Jason Stajich > Date: Sunday, December 13, 2009 11:10 pm > Subject: Re: [Bioperl-l] problem with install > To: eric_donaldson at med.unc.edu > Cc: BioPerl List > >> So you installed perl-5.10 or using system perl? I'm >> confused if you >> actually installed bioperl.pm or not via fink? >> >> It seems like since your @INC or $PERL5LIB points to >> /sw/lib/perl5 >> which is one of the dirs it would have installed in, but I don't >> think >> you actually installed bioperl. >> >> you can try and do: >> $ locate Bio/SearchIO.pm >> >> We'll see if any of the other osx/fink gurus are on the list >> that can >> help or you can install it via CPAN I guess. >> >> -jason >> On Dec 13, 2009, at 6:39 PM, eric_donaldson at med.unc.edu wrote: >> >>> >>> I actually tried a different blastparser that uses >> BIO::SearchIO and >>> got the same message: >>> >>> Can't locate Bio/SearchIO.pm in @INC (@INC contains: >> /sw/lib/perl5/ >>> darwin-thread-multi-2level /sw/lib/perl5 /sw/lib/perl5/darwin >> / >>> Library/Perl/Updates/5.10.0 /System/Library/Perl/5.10.0/darwin- >> >>> thread-multi-2level /System/Library/Perl/5.10.0 >> /Library/Perl/5.10.0/ >>> darwin-thread-multi-2level /Library/Perl/5.10.0 >> /Network/Library/ >>> Perl/5.10.0/darwin-thread-multi-2level >> /Network/Library/Perl/5.10.0 / >>> Network/Library/Perl /System/Library/Perl/Extras/5.10.0/darwin- >> >>> thread-multi-2level /System/Library/Perl/Extras/5.10.0 .) >> at >>> blastparser.new.pl line 8. >>> BEGIN failed--compilation aborted at blastparser.new.pl line 8. >>> >>> I suspect there is a path problem, but am not savvy enough to >> know >>> how to fix it. I am really just a hacker.... I have >> several scripts >>> that I use regularly and that I know how to modify, but am >> lost when >>> they don't work... >>> >>> Thanks for any help, >>> >>> Eric >>> >>> ----- Original Message ----- >>> From: Jason Stajich >>> Date: Sunday, December 13, 2009 8:24 pm >>> Subject: Re: [Bioperl-l] problem with install >>> To: eric_donaldson at med.unc.edu >>> Cc: bioperl-l at bioperl.org >>> >>>> Hi Eric - >>>> >>>> Bio::Tools::BPlite is no longer supported in Bioperl - it >>>> was >>>> deprecated several releases ago. >>>> It was replaced with Bio::SearchIO >>>> >>>> -jason >>>> On Dec 13, 2009, at 3:15 PM, eric_donaldson at med.unc.edu wrote: >>>> >>>>> Hello, >>>>> >>>>> Today I downloaded bioperl 1.61 on my new macbook pro using >>>> fink. I >>>>> used the >>>>> >>>>> fink install bioperl.pm-588 as I could not get it to instal >>>> using >>>>> the perl version 5.10. >>>>> >>>>> But now I get an error when trying to run a bioperl script. >>>>> >>>>> Here is the error: >>>>> >>>>> Can't locate Bio/Tools/BPlite.pm in @INC (@INC contains: >>>> /sw/lib/ >>>>> perl5/darwin-thread-multi-2level /sw/lib/perl5 >>>> /sw/lib/perl5/darwin / >>>>> Library/Perl/Updates/5.10.0 >> /System/Library/Perl/5.10.0/darwin- >>>> >>>>> thread-multi-2level /System/Library/Perl/5.10.0 >>>> /Library/Perl/5.10.0/ >>>>> darwin-thread-multi-2level /Library/Perl/5.10.0 >>>> /Network/Library/ >>>>> Perl/5.10.0/darwin-thread-multi-2level >>>> /Network/Library/Perl/5.10.0 / >>>>> Network/Library/Perl >> /System/Library/Perl/Extras/5.10.0/darwin- >>>> >>>>> thread-multi-2level /System/Library/Perl/Extras/5.10.0 .) >>>> at >>>>> blastparser.pl line 8. >>>>> BEGIN failed--compilation aborted at blastparser.pl line 8. >>>>> >>>>> >>>>> I am a novice at unix and bioperl so I do not know how >>>> to >>>>> troubleshoot this, would you please hleo me? >>>>> >>>>> Thank you, >>>>> >>>>> Eric >>>>> >>>>> >>>>> Eric F. Donaldson, Ph.D. >>>>> Research Assistant Professor, Ralph Baric Lab >>>>> University of North Carolina >>>>> Department of Epidemiology >>>>> >>>>> >>>>> >>>> < >>>> >> eric_donaldson.vcf>_______________________________________________> >>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> -- >>>> Jason Stajich >>>> jason.stajich at gmail.com >>>> jason at bioperl.org >>>> >>>> >>> >>> Eric F. Donaldson, Ph.D. >>> Research Assistant Professor, Ralph Baric Lab >>> University of North Carolina >>> Department of Epidemiology >>> >>> >>> >> >> -- >> Jason Stajich >> jason.stajich at gmail.com >> jason at bioperl.org >> >> > > Eric F. Donaldson, Ph.D. > Research Assistant Professor, Ralph Baric Lab > University of North Carolina > Department of Epidemiology > > > -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From awitney at sgul.ac.uk Mon Dec 14 04:36:19 2009 From: awitney at sgul.ac.uk (Adam Witney) Date: Mon, 14 Dec 2009 09:36:19 +0000 Subject: [Bioperl-l] Bioperl code help In-Reply-To: References: Message-ID: <4B260713.3070402@sgul.ac.uk> bioperl programs are just perl programs so you should run them in exactly the same way as your perl prorgrams, from the command line HTH adam On 12/12/2009 20:04, dhwani gandhi wrote: > Hi, > I am very new to Bioperl but I am somewhat familiar to perl though. > > I write my perl programs in Notepad++ and run them in cmd. > > Now, I want to run Bioperl programs. I just installed bioperl on my > computer. And I have a program using bioperl modules in Notepad++. > > My question is how to run these programs? Can they be ran in cmd as well? or > do I use ppm? > > Please help. > > Thanks, > -Dhwani Gandhi. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From umjsm at leeds.ac.uk Mon Dec 14 05:39:32 2009 From: umjsm at leeds.ac.uk (Joan Segura Mora) Date: Mon, 14 Dec 2009 10:39:32 +0000 Subject: [Bioperl-l] extract and write a pdb chain In-Reply-To: References: <1260549882.6484.11.camel@limm-pc1254> Message-ID: <1260787172.7359.0.camel@limm-pc1254> Hi Brian, I am not calling the method add_chain, I am calling the method chain http://doc.bioperl.org/releases/bioperl-1.0.1/Bio/Structure/Entry.html#POD6 and if I don't use as an argument an object of type Bio::Structure::Chain I get an error like this (-->depends of the argument<--) ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Supplied a -->Bio::Structure::Residue=HASH(0x11be6a0)<-- to chain, we want a Bio::Structure::Chain or a list of these STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:368 STACK: Bio::Structure::Entry::chain /usr/local/share/perl/5.8.8/Bio/Structure/Entry.pm:314 STACK: read_pdb.pl:11 ----------------------------------------------------------- And if I use a Chain object I get the error that I told you. I have try this code: use Bio::Structure::IO; use strict; my $structio = Bio::Structure::IO->new(-file => "101m.pdb",'-format' => 'pdb'); my $struc = $structio->next_structure; my $new_entry = Bio::Structure::Entry->new( -id => 'structure_id'); my $model = Bio::Structure::Model->new( -id => '0'); for my $chain ($struc->get_chains) { if($chain->id eq "A"){ $new_entry->add_chain($model,$chain); last; } } $new_entry->add_model($model); my $out = Bio::Structure::IO->new(-file => ">out.pdb",'-format' => 'pdb'); $out->write_structure($new_entry); But I get an empty pdb HEADER DEFAULT CLASSIFICATION 24-JAN-70 stru REMARK 1 TER 1 A 0 MASTER END I am trying a lot of combinations, but I can't write a single chain into a file. I don't know what I am doing wrong. Thanks for helping regards, Joan On Fri, 2009-12-11 at 15:37 -0500, Brian Osborne wrote: > Joan, > > It looks to me like the first argument to the add_chain() method has > to be a Model object, the second is the Chain itself. See Structure/ > Entry.pm, for example. However if you're seeing some documentation > that says something else then tell us where, it needs to be corrected. > > In Bio::Structure an Entry consists of one or Models, each of which > has one or more Chains. This allows you to build macromolecular > complexes (an Entry), which could have more than one defined proteins > or protein complexes (Models). > > Brian O. > > On Dec 11, 2009, at 11:44 AM, Joan Segura Mora wrote: > > > Hello, > > > > I am trying to do a very easy think but I don't get it. I want to > > write > > in a file a chain of a pdb. I have try a lot of thinks but what I > > think > > that it should work is the next script: > > > > use Bio::Structure::IO; > > use strict; > > > > my $structio = Bio::Structure::IO->new(-file => "101m.pdb",'-format' > > => > > 'pdb'); > > my $struc = $structio->next_structure; > > > > my $new_entry = Bio::Structure::Entry->new( -id => 'structure_id'); > > > > for my $chain ($struc->get_chains) { > > if($chain->id eq "A"){ > > $new_entry->chain($chain); > > last; > > } > > } > > > > my $out = Bio::Structure::IO->new(-file => ">out.pdb",'-format' => > > 'pdb');# > > $out->write_structure($new_entry); > > > > it doesn't. I get the next error: > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: add_chain: first argument needs to be a Model object () > > > > STACK: Error::throw > > STACK: > > Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/Root.pm: > > 368 > > STACK: > > Bio::Structure::Entry::add_chain /usr/local/share/perl/5.8.8/Bio/ > > Structure/Entry.pm:335 > > STACK: > > Bio::Structure::Entry::get_chains /usr/local/share/perl/5.8.8/Bio/ > > Structure/Entry.pm:391 > > STACK: > > Bio::Structure::Entry::chain /usr/local/share/perl/5.8.8/Bio/ > > Structure/Entry.pm:304 > > STACK: read_pdb.pl:10 > > ----------------------------------------------------------- > > > > As far I understand the documentation, the method chain of the object > > Bio::Structure::Entry requires an as input an object of type Chain. > > > > Any solution will be very welcome. > > > > best regards, > > Joan > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From fs5 at sanger.ac.uk Mon Dec 14 07:18:17 2009 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Mon, 14 Dec 2009 12:18:17 +0000 Subject: [Bioperl-l] parse EMBL Feature Table only Message-ID: <1260793098.17180.184.camel@deskpro15336.dynamic.sanger.ac.uk> Hi, Maybe I'm really missing something here but I can't find how to parse a file that is basically just the Feature Table from an EMBL file, looking like this: FT CDS join(37467..37521,38078..38195,38312..38400,38859..38936,39067..39154,39379..39675,39818..39842) FT /colour=7 FT /product="RNA-binding protein, putative" FT CDS 213199..214812 FT /colour=7 FT /product="eukaryotic translation initiation factor 3 FT subunit 7, putative" ...[more of the same] So the file has no header and no actual sequence and it is used simply to annotate a chromosome in a genome assembly. I've always used GFF for that purpose but have been given this file now. BioSeqIO->new(-format=>"EMBL") complains about the missing header and if I stick in a fake ID line, it warns about the missing sequence and the fact that the features don't fit on the sequence (of length 0). Of course it's not difficult to write my own parser but I'm sure there must be a BioPerl way of doing that that I have just overlooked. Thanks for your help. -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From David.Messina at sbc.su.se Mon Dec 14 09:06:54 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 14 Dec 2009 15:06:54 +0100 Subject: [Bioperl-l] parse EMBL Feature Table only In-Reply-To: <1260793098.17180.184.camel@deskpro15336.dynamic.sanger.ac.uk> References: <1260793098.17180.184.camel@deskpro15336.dynamic.sanger.ac.uk> Message-ID: <0F8203F6-06D8-43EF-BB35-EB723F4B9DFA@sbc.su.se> Hi Frank, You will need to look at the feature table parsing code that Bio::SeqIO::embl itself uses to read those lines, probably the _read_FTHelper_EMBL method: http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/SeqIO/embl.html#POD12 Since you're trying to parse what is effectively a part of an EMBL record, and a somewhat complicated part at that, as you might imagine this could be a little hairy. It might be easier to go the route you started down: add a fake header and a (relatively long) fake sequence, and go through Bio::SeqIO in the normal way. Dave PS ? I suspect you may already be familiar with it, but for an overview on how to get at data in feature tables, look at the Feature Annotation HOWTO: http://www.bioperl.org/wiki/HOWTO:Feature-Annotation From eric_donaldson at med.unc.edu Mon Dec 14 09:22:40 2009 From: eric_donaldson at med.unc.edu (eric_donaldson at med.unc.edu) Date: Mon, 14 Dec 2009 09:22:40 -0500 Subject: [Bioperl-l] problem with install In-Reply-To: <7B2EBA9A-E9DF-49A5-ABC7-C42512BA9C9A@bioperl.org> References: <119F436D-D36D-4D28-BAE7-6EB17D665FC2@bioperl.org> <404D2600-58D3-4491-834E-8C9F860D3ACC@bioperl.org> <7B2EBA9A-E9DF-49A5-ABC7-C42512BA9C9A@bioperl.org> Message-ID: Thank you Jason.? I appreciate the help. Eric ----- Original Message ----- From: Jason Stajich Date: Monday, December 14, 2009 12:10 am Subject: Re: [Bioperl-l] problem with install To: eric_donaldson at med.unc.edu Cc: BioPerl List > Eric - > please CC the bioperl list when responding so others can help - > I? > can't be the only answerer. > > But since your @INC message doesn't include /sw/lib/perl5/5.8.8/ > you? > would need to make sure that is added to your PERL5LIB. > There are some help docs on the perl sites I expect on how to > get your? > PATHs in order. > > Or you can just install via CPAN which will put it in the right > path -? > there are docs on the bioperl website about installing via CPAN. > > -jason > On Dec 13, 2009, at 9:02 PM, eric_donaldson at med.unc.edu wrote: > > > Hi Jason, > > > > The fink package did not have support for perl 5.10, so I > attempted? > > to install the perl 5.8.6 package. > > > > When I attempted: locate Bio/SearchIO.pm > > I got: -bash: $: command not found > > > > So even though I can find SearchIO.pm in > sw/lib/perl5/5.8.8/Bio/ > > SearchIO.pm? I cannot access it.? Do I need to use > the older version? > > of perl? > > > > Would it be better to install with CPAN?? If so, can you > send me to? > > a page that has instructions? > > > > Thank you so much! > > > > ERic > > > > > > ----- Original Message ----- > > From: Jason Stajich > > Date: Sunday, December 13, 2009 11:10 pm > > Subject: Re: [Bioperl-l] problem with install > > To: eric_donaldson at med.unc.edu > > Cc: BioPerl List > > > >> So you installed perl-5.10 or using system perl?? I'm > >> confused if you > >> actually installed bioperl.pm or not via fink? > >> > >> It seems like since your @INC or $PERL5LIB points to > >> /sw/lib/perl5 > >> which is one of the dirs it would have installed in, but I don't > >> think > >> you actually installed bioperl. > >> > >> you can try and do: > >> $ locate Bio/SearchIO.pm > >> > >> We'll see if any of the other osx/fink gurus are on the list > >> that can > >> help or you can install it via CPAN I guess. > >> > >> -jason > >> On Dec 13, 2009, at 6:39 PM, eric_donaldson at med.unc.edu wrote: > >> > >>> > >>> I actually tried a different blastparser that uses > >> BIO::SearchIO and > >>> got the same message: > >>> > >>> Can't locate Bio/SearchIO.pm in @INC (@INC contains: > >> /sw/lib/perl5/ > >>> darwin-thread-multi-2level /sw/lib/perl5 /sw/lib/perl5/darwin > >> / > >>> Library/Perl/Updates/5.10.0 > /System/Library/Perl/5.10.0/darwin- > >> > >>> thread-multi-2level /System/Library/Perl/5.10.0 > >> /Library/Perl/5.10.0/ > >>> darwin-thread-multi-2level /Library/Perl/5.10.0 > >> /Network/Library/ > >>> Perl/5.10.0/darwin-thread-multi-2level > >> /Network/Library/Perl/5.10.0 / > >>> Network/Library/Perl > /System/Library/Perl/Extras/5.10.0/darwin- > >> > >>> thread-multi-2level /System/Library/Perl/Extras/5.10.0 .) > >> at > >>> blastparser.new.pl line 8. > >>> BEGIN failed--compilation aborted at blastparser.new.pl line 8. > >>> > >>> I suspect there is a path problem, but am not savvy enough to > >> know > >>> how to fix it.? I am really just a hacker.... I have > >> several scripts > >>> that I use regularly and that I know how to modify, but am > >> lost when > >>> they don't work... > >>> > >>> Thanks for any help, > >>> > >>> Eric > >>> > >>> ----- Original Message ----- > >>> From: Jason Stajich > >>> Date: Sunday, December 13, 2009 8:24 pm > >>> Subject: Re: [Bioperl-l] problem with install > >>> To: eric_donaldson at med.unc.edu > >>> Cc: bioperl-l at bioperl.org > >>> > >>>> Hi Eric - > >>>> > >>>> Bio::Tools::BPlite is no longer supported in Bioperl - it > >>>> was > >>>> deprecated several releases ago. > >>>> It was replaced with Bio::SearchIO > >>>> > >>>> -jason > >>>> On Dec 13, 2009, at 3:15 PM, eric_donaldson at med.unc.edu wrote: > >>>> > >>>>> Hello, > >>>>> > >>>>> Today I downloaded bioperl 1.61 on my new macbook pro using > >>>> fink.? I > >>>>> used the > >>>>> > >>>>> fink install bioperl.pm-588 as I could not get it to instal > >>>> using > >>>>> the perl version 5.10. > >>>>> > >>>>> But now I get an error when trying to run a bioperl script. > >>>>> > >>>>> Here is the error: > >>>>> > >>>>> Can't locate Bio/Tools/BPlite.pm in @INC (@INC contains: > >>>> /sw/lib/ > >>>>> perl5/darwin-thread-multi-2level /sw/lib/perl5 > >>>> /sw/lib/perl5/darwin / > >>>>> Library/Perl/Updates/5.10.0 > >> /System/Library/Perl/5.10.0/darwin- > >>>> > >>>>> thread-multi-2level /System/Library/Perl/5.10.0 > >>>> /Library/Perl/5.10.0/ > >>>>> darwin-thread-multi-2level /Library/Perl/5.10.0 > >>>> /Network/Library/ > >>>>> Perl/5.10.0/darwin-thread-multi-2level > >>>> /Network/Library/Perl/5.10.0 / > >>>>> Network/Library/Perl > >> /System/Library/Perl/Extras/5.10.0/darwin- > >>>> > >>>>> thread-multi-2level /System/Library/Perl/Extras/5.10.0 .) > >>>> at > >>>>> blastparser.pl line 8. > >>>>> BEGIN failed--compilation aborted at blastparser.pl line 8. > >>>>> > >>>>> > >>>>> I am a novice at unix and bioperl so I do not know how > >>>> to > >>>>> troubleshoot this, would you please hleo me? > >>>>> > >>>>> Thank you, > >>>>> > >>>>> Eric > >>>>> > >>>>> > >>>>> Eric F. Donaldson, Ph.D. > >>>>> Research Assistant Professor, Ralph Baric Lab > >>>>> University of North Carolina > >>>>> Department of Epidemiology > >>>>> > >>>>> > >>>>> > >>>> < > >>>> > >> eric_donaldson.vcf>_______________________________________________> > >>>> Bioperl-l mailing list > >>>>> Bioperl-l at lists.open-bio.org > >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>> > >>>> -- > >>>> Jason Stajich > >>>> jason.stajich at gmail.com > >>>> jason at bioperl.org > >>>> > >>>> > >>> > >>> Eric F. Donaldson, Ph.D. > >>> Research Assistant Professor, Ralph Baric Lab > >>> University of North Carolina > >>> Department of Epidemiology > >>> > >>> > >>> > >> > >> -- > >> Jason Stajich > >> jason.stajich at gmail.com > >> jason at bioperl.org > >> > >> > > > > Eric F. Donaldson, Ph.D. > > Research Assistant Professor, Ralph Baric Lab > > University of North Carolina > > Department of Epidemiology > > > > > > > > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > > Eric F. Donaldson, Ph.D. Research Assistant Professor, Ralph Baric Lab University of North Carolina Department of Epidemiology -------------- next part -------------- begin:vcard n:Donaldson;Eric fn:Eric F. Donaldson, PhD tel;work:919.966.3881 org:University of North Carolina, School of Medicine;Epidemiology adr:;;2107 McGavran-Greenberg Hall CB# 7435 ;Chapel Hill;NC;27599;USA email;internet:eric_donaldson at med.unc.edu email;home;internet:viralnerd at gmail.com title:Research Assistant Professor version:2.1 end:vcard From umjsm at leeds.ac.uk Mon Dec 14 11:58:03 2009 From: umjsm at leeds.ac.uk (Joan Segura Mora) Date: Mon, 14 Dec 2009 16:58:03 +0000 Subject: [Bioperl-l] extract and write a pdb chain In-Reply-To: <1260787172.7359.0.camel@limm-pc1254> References: <1260549882.6484.11.camel@limm-pc1254> <1260787172.7359.0.camel@limm-pc1254> Message-ID: <1260809883.7359.15.camel@limm-pc1254> Hi again, To extract a pdb chain in a file, I have had to do it adding atom by atom to a new structure. use Bio::Structure::IO; use strict; my $structio = Bio::Structure::IO->new(-file => "101m.pdb",'-format' => 'pdb'); my $struc = $structio->next_structure; my $new_struct = Bio::Structure::Entry->new( -id => 'structure_id'); for my $model ($struc->get_models){ $new_struct->add_model($model); for my $chain ($struc->get_chains) { $new_struct->add_chain($model,$chain); if($chain->id eq "A"){ foreach my $res ($struc->get_residues($chain)){ $new_struct->add_residue($chain,$res); foreach my $atom ($struc->get_atoms($res)){ $new_struct->add_atom($res,$atom); } } } last; } last; } my $out = Bio::Structure::IO->new(-file => ">out.pdb",'-format' => 'pdb'); $out->write_structure($new_struct); I suppose that there should be a more elegant way to do it. If someone knows it and can explain it I will be very grateful. kind regards, Joan On Mon, 2009-12-14 at 10:39 +0000, Joan Segura Mora wrote: > Hi Brian, > > I am not calling the method add_chain, I am calling the method chain > > http://doc.bioperl.org/releases/bioperl-1.0.1/Bio/Structure/Entry.html#POD6 > > and if I don't use as an argument an object of type > > Bio::Structure::Chain > > I get an error like this (-->depends of the argument<--) > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Supplied a -->Bio::Structure::Residue=HASH(0x11be6a0)<-- to chain, > we want a Bio::Structure::Chain or a list of these > > STACK: Error::throw > STACK: > Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:368 > STACK: > Bio::Structure::Entry::chain /usr/local/share/perl/5.8.8/Bio/Structure/Entry.pm:314 > STACK: read_pdb.pl:11 > ----------------------------------------------------------- > > > And if I use a Chain object I get the error that I told you. > > I have try this code: > > use Bio::Structure::IO; > use strict; > > my $structio = Bio::Structure::IO->new(-file => "101m.pdb",'-format' => > 'pdb'); > my $struc = $structio->next_structure; > my $new_entry = Bio::Structure::Entry->new( -id => 'structure_id'); > my $model = Bio::Structure::Model->new( -id => '0'); > > for my $chain ($struc->get_chains) { > if($chain->id eq "A"){ > $new_entry->add_chain($model,$chain); > > last; > } > } > $new_entry->add_model($model); > my $out = Bio::Structure::IO->new(-file => ">out.pdb",'-format' => > 'pdb'); > $out->write_structure($new_entry); > > > But I get an empty pdb > > HEADER DEFAULT CLASSIFICATION 24-JAN-70 > stru > REMARK > 1 > TER 1 A > 0 > MASTER > END > > I am trying a lot of combinations, but I can't write a single chain into > a file. I don't know what I am doing wrong. > > Thanks for helping > > regards, > Joan > > > On Fri, 2009-12-11 at 15:37 -0500, Brian Osborne wrote: > > Joan, > > > > It looks to me like the first argument to the add_chain() method has > > to be a Model object, the second is the Chain itself. See Structure/ > > Entry.pm, for example. However if you're seeing some documentation > > that says something else then tell us where, it needs to be corrected. > > > > In Bio::Structure an Entry consists of one or Models, each of which > > has one or more Chains. This allows you to build macromolecular > > complexes (an Entry), which could have more than one defined proteins > > or protein complexes (Models). > > > > Brian O. > > > > On Dec 11, 2009, at 11:44 AM, Joan Segura Mora wrote: > > > > > Hello, > > > > > > I am trying to do a very easy think but I don't get it. I want to > > > write > > > in a file a chain of a pdb. I have try a lot of thinks but what I > > > think > > > that it should work is the next script: > > > > > > use Bio::Structure::IO; > > > use strict; > > > > > > my $structio = Bio::Structure::IO->new(-file => "101m.pdb",'-format' > > > => > > > 'pdb'); > > > my $struc = $structio->next_structure; > > > > > > my $new_entry = Bio::Structure::Entry->new( -id => 'structure_id'); > > > > > > for my $chain ($struc->get_chains) { > > > if($chain->id eq "A"){ > > > $new_entry->chain($chain); > > > last; > > > } > > > } > > > > > > my $out = Bio::Structure::IO->new(-file => ">out.pdb",'-format' => > > > 'pdb');# > > > $out->write_structure($new_entry); > > > > > > it doesn't. I get the next error: > > > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > > MSG: add_chain: first argument needs to be a Model object () > > > > > > STACK: Error::throw > > > STACK: > > > Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/Root.pm: > > > 368 > > > STACK: > > > Bio::Structure::Entry::add_chain /usr/local/share/perl/5.8.8/Bio/ > > > Structure/Entry.pm:335 > > > STACK: > > > Bio::Structure::Entry::get_chains /usr/local/share/perl/5.8.8/Bio/ > > > Structure/Entry.pm:391 > > > STACK: > > > Bio::Structure::Entry::chain /usr/local/share/perl/5.8.8/Bio/ > > > Structure/Entry.pm:304 > > > STACK: read_pdb.pl:10 > > > ----------------------------------------------------------- > > > > > > As far I understand the documentation, the method chain of the object > > > Bio::Structure::Entry requires an as input an object of type Chain. > > > > > > Any solution will be very welcome. > > > > > > best regards, > > > Joan > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From gowthaman.ramasamy at sbri.org Mon Dec 14 14:16:32 2009 From: gowthaman.ramasamy at sbri.org (Gowthaman Ramasamy) Date: Mon, 14 Dec 2009 11:16:32 -0800 Subject: [Bioperl-l] GO::Parser / GO::Model::Term In-Reply-To: <67E6A22C-6968-460D-B192-E129773A0BA5@vecna.com> Message-ID: Hi All, I have a list of GO terms. And would like to pull GO accessions for them. I can easily do the revere of it using get_term("GO::00000051"). But can someone tell me how to get the GO accessions from GO Terms , for eg: retrive GO accession for "citrulline metabolic process". Thanks very much, Gowtham From lsbrath at gmail.com Mon Dec 14 14:41:39 2009 From: lsbrath at gmail.com (Mgavi Brathwaite) Date: Mon, 14 Dec 2009 14:41:39 -0500 Subject: [Bioperl-l] Issues with loading BioPerl-1.6.0 on to my Mac Message-ID: <69367b8f0912141141n5bf94978k61dc6e31e54a4a8a@mail.gmail.com> Hello, I have loaded BioPerl -1.6.0 onto my Mac. When I run my script I get the following error message: Can't locate Bio/SeqIO.pm in @INC (@INC contains: /sw/lib/perl5 /sw/lib/perl5/darwin /System/Library/Perl/5.8.8/darwin-thread-multi-2level /System/Library/Perl/5.8.8 /Library/Perl/5.8.8/darwin-thread-multi-2level /Library/Perl/5.8.8 /Library/Perl /Network/Library/Perl/5.8.8/darwin-thread-multi-2level /Network/Library/Perl/5.8.8 /Network/Library/Perl /System/Library/Perl/Extras/5.8.8/darwin-thread-multi-2level /System/Library/Perl/Extras/5.8.8 /Library/Perl/5.8.6 /Library/Perl/5.8.1 .) at project_example.pl line 4. BEGIN failed--compilation aborted at project_example.pl line 4. I moved the BioPerl dir to /sw/lib/perl5 and I still get the error message. Any ideas? MEB From scott at scottcain.net Mon Dec 14 14:47:05 2009 From: scott at scottcain.net (Scott Cain) Date: Mon, 14 Dec 2009 14:47:05 -0500 Subject: [Bioperl-l] Issues with loading BioPerl-1.6.0 on to my Mac In-Reply-To: <69367b8f0912141141n5bf94978k61dc6e31e54a4a8a@mail.gmail.com> References: <69367b8f0912141141n5bf94978k61dc6e31e54a4a8a@mail.gmail.com> Message-ID: <4536f7700912141147ld16d67av1a58bbf5c1fc5e9e@mail.gmail.com> Hi Mgavi, I think Jason may have already started helping, but the question is: is SeqIO.pm anywhere in those directories? If not, why not? If so, why can't the perl you are using find it? Do you have more than one instance of perl on your machine (fairly likely if you are using a fink-installed BioPerl)? When you execute your script, which perl are you using? Scott On Mon, Dec 14, 2009 at 2:41 PM, Mgavi Brathwaite wrote: > Hello, > > I have loaded BioPerl -1.6.0 onto my Mac. When I run my script I get the > following error message: > > Can't locate Bio/SeqIO.pm in @INC (@INC contains: /sw/lib/perl5 > /sw/lib/perl5/darwin /System/Library/Perl/5.8.8/darwin-thread-multi-2level > /System/Library/Perl/5.8.8 /Library/Perl/5.8.8/darwin-thread-multi-2level > /Library/Perl/5.8.8 /Library/Perl > /Network/Library/Perl/5.8.8/darwin-thread-multi-2level > /Network/Library/Perl/5.8.8 /Network/Library/Perl > /System/Library/Perl/Extras/5.8.8/darwin-thread-multi-2level > /System/Library/Perl/Extras/5.8.8 /Library/Perl/5.8.6 /Library/Perl/5.8.1 .) > at project_example.pl line 4. > BEGIN failed--compilation aborted at project_example.pl line 4. > > I moved the BioPerl dir to /sw/lib/perl5 and I still get the error message. > Any ideas? > > MEB > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From bosborne11 at verizon.net Mon Dec 14 14:45:35 2009 From: bosborne11 at verizon.net (Brian Osborne) Date: Mon, 14 Dec 2009 14:45:35 -0500 Subject: [Bioperl-l] Issues with loading BioPerl-1.6.0 on to my Mac In-Reply-To: <69367b8f0912141141n5bf94978k61dc6e31e54a4a8a@mail.gmail.com> References: <69367b8f0912141141n5bf94978k61dc6e31e54a4a8a@mail.gmail.com> Message-ID: <38104B41-104B-42D7-94FA-30016E110BFD@verizon.net> Mgavi, So there's a directory called /sw/lib/perl5/Bio? Or is it called something else? Brian O. On Dec 14, 2009, at 2:41 PM, Mgavi Brathwaite wrote: > Hello, > > I have loaded BioPerl -1.6.0 onto my Mac. When I run my script I get > the > following error message: > > Can't locate Bio/SeqIO.pm in @INC (@INC contains: /sw/lib/perl5 > /sw/lib/perl5/darwin /System/Library/Perl/5.8.8/darwin-thread- > multi-2level > /System/Library/Perl/5.8.8 /Library/Perl/5.8.8/darwin-thread- > multi-2level > /Library/Perl/5.8.8 /Library/Perl > /Network/Library/Perl/5.8.8/darwin-thread-multi-2level > /Network/Library/Perl/5.8.8 /Network/Library/Perl > /System/Library/Perl/Extras/5.8.8/darwin-thread-multi-2level > /System/Library/Perl/Extras/5.8.8 /Library/Perl/5.8.6 /Library/Perl/ > 5.8.1 .) > at project_example.pl line 4. > BEGIN failed--compilation aborted at project_example.pl line 4. > > I moved the BioPerl dir to /sw/lib/perl5 and I still get the error > message. > Any ideas? > > MEB > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason at bioperl.org Mon Dec 14 16:42:09 2009 From: jason at bioperl.org (Jason Stajich) Date: Mon, 14 Dec 2009 13:42:09 -0800 Subject: [Bioperl-l] fasta format In-Reply-To: References: <36E9C2F3282347918FD3B3ACA0EC8126@jonas> Message-ID: <614B8A2C-3B17-4E3B-AAC5-3210C7435BB5@bioperl.org> you can read the man page from sean Eddy or use it exactly as I showed you sreformat fasta filename > filename.new you can also use the 1st example which is a bioperl solution. -jason On Dec 13, 2009, at 7:02 AM, Jonas Schaer wrote: > Hi Jason, > thank you very much for your answer. > i am sorry to bother u again but i'm afraid i need some help with > that because i don't see how to use sreformat? > i dont get it managed to write a script that works. > > thank u again :) > jonas > > > ----- Original Message ----- From: "Jason Stajich" > To: "Jonas Schaer" > Cc: > Sent: Tuesday, December 08, 2009 6:44 PM > Subject: Re: [Bioperl-l] fasta format > > >> you can run >> sreformat (HMMER) or bp_sreformat.pl script in scripts/utilties (or >> that is installed when you install the Bioperl scripts) >> $ bp_sreformat.pl -if fasta -of fasta -i yourfile.fa -o >> yournewfile.fa >> # rename it back >> $ mv yournewfile.fa yourfile.fa >> >> or >> $ sreformat fasta yourfile.fa > yournewfile.fa >> $ mv yournewfile.fa yourfile.fa >> >> >> -jason >> On Dec 8, 2009, at 7:21 AM, Jonas Schaer wrote: >> >>> Hi there, >>> I have a little question concerning bioperl. I have >>> BioPerl-1.6.1.tar.gz installed and i use the fasta.pm module to read >>> in some fasta files. first it worked fine, but now i have some >>> fastafiles in slightly different format (not all lines have the same >>> length!). >>> >>> ------------- EXCEPTION ------------- >>> MSG: Each line of the fasta entry must be the same length except the >>> last. >>> Line above #49 ' >>> ..' is 28 != 101 chars. >>> STACK Bio::DB::Fasta::calculate_offsets C:/Perl/site/lib/Bio/DB/ >>> Fasta.pm:771 >>> STACK Bio::DB::Fasta::index_file C:/Perl/site/lib/Bio/DB/Fasta.pm: >>> 681 >>> STACK Bio::DB::Fasta::new C:/Perl/site/lib/Bio/DB/Fasta.pm:491 >>> STACK Bio::DB::Fasta::newFh C:/Perl/site/lib/Bio/DB/Fasta.pm:513 >>> STACK main::readfasta blast_eval.pm:174 >>> STACK toplevel blast_eval.pm:83 >>> ------------------------------------- >>> >>> indexing was interrupted, so unlinking test.fasta.index at C:/Perl/ >>> site/lib/Bio/ >>> DB/Fasta.pm line 1054. >>> >>> >>> Is there any way to use these fasta files with diffrent length of >>> lines with this fasta.pm module or will i have to change the format >>> of my fasta-files(big databases...) ? >>> >>> Thanks in advance for any help! >>> >>> Regards, Jonas >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Jason Stajich >> jason.stajich at gmail.com >> jason at bioperl.org > > > -------------------------------------------------------------------------------- > > > > No virus found in this incoming message. > Checked by AVG - www.avg.com > Version: 8.5.426 / Virus Database: 270.14.98/2552 - Release Date: > 12/08/09 07:34:00 > -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From cjfields at illinois.edu Mon Dec 14 20:23:05 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 14 Dec 2009 19:23:05 -0600 Subject: [Bioperl-l] LocatableSeq NSE format - suggested changes Message-ID: <579AEE4E-A79C-496B-881C-226616BF0A2B@illinois.edu> All, The current output for NSE format (Name/Start-End) via Bio::LocatableSeq::get_nse() currently doesn't allow for strandedness. I have seen two variations of NSE that incorporate strandedness: 1) Stockholm Rfam reverses start and end if the strand == -1 chrY/598-1 2) Sheldon McKay's Gbrowse_syn uses Name(strand)/start-end rice-3(+)/16598648-16600199 The former breaks fewer things within BioPerl, but the latter seems more explicit. Any preferences? Do we want a new method that creates this, and deprecate out simple non-stranded NSE? chris From bernd.web at gmail.com Tue Dec 15 03:37:44 2009 From: bernd.web at gmail.com (Bernd Web) Date: Tue, 15 Dec 2009 09:37:44 +0100 Subject: [Bioperl-l] GO::Parser / GO::Model::Term In-Reply-To: References: <67E6A22C-6968-460D-B192-E129773A0BA5@vecna.com> Message-ID: <716af09c0912150037k513c6efah442a236cb323e14e@mail.gmail.com> Dear Gowthaman, A non-BioPerl solution: the Ontology Lookup service at EBI. It also provides a web service interface. http://www.ebi.ac.uk/ontology-lookup/ citrulline metabolic process has to be selected from the pull-down list in the interactive page. This will return the ID (GO:0000052) and addional info: definition The chemical reactions and pathways involving citrulline, N5-carbamoyl-L-ornithine, an alpha amino acid not found in proteins. preferred name citrulline metabolic process exact synonym citrulline metabolism subset Prokaryotic GO subset xref_definition ISBN:209853"Oxford Dictionary of Biochemistry and Molecular Biology" The webservice is described at http://www.ebi.ac.uk/ontology-lookup/WSDLDocumentation.do Regards, Bernd On Mon, Dec 14, 2009 at 8:16 PM, Gowthaman Ramasamy wrote: > > Hi All, > I have a list of GO terms. And would like to pull GO accessions for them. > I can easily do the revere of it using get_term("GO::00000051"). > > But can someone tell me how to get the GO accessions from GO Terms , for eg: retrive GO accession for "citrulline metabolic process". > > > Thanks very much, > Gowtham > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From fs5 at sanger.ac.uk Tue Dec 15 05:38:40 2009 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Tue, 15 Dec 2009 10:38:40 +0000 Subject: [Bioperl-l] parse EMBL Feature Table only In-Reply-To: <0F8203F6-06D8-43EF-BB35-EB723F4B9DFA@sbc.su.se> References: <1260793098.17180.184.camel@deskpro15336.dynamic.sanger.ac.uk> <0F8203F6-06D8-43EF-BB35-EB723F4B9DFA@sbc.su.se> Message-ID: <1260873520.17180.215.camel@deskpro15336.dynamic.sanger.ac.uk> Thanks Dave, good to know that I haven't overlooked something bleedingly obvious in Bioperl that already does this :-) No problem, I have already implemented a simple parser to do it, which works fine for my files. Thanks Frank On Mon, 2009-12-14 at 15:06 +0100, Dave Messina wrote: > Hi Frank, > > You will need to look at the feature table parsing code that Bio::SeqIO::embl itself uses to read those lines, probably the _read_FTHelper_EMBL method: > http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/SeqIO/embl.html#POD12 > > Since you're trying to parse what is effectively a part of an EMBL record, and a somewhat complicated part at that, as you might imagine this could be a little hairy. > > It might be easier to go the route you started down: add a fake header and a (relatively long) fake sequence, and go through Bio::SeqIO in the normal way. > > > Dave > > > PS ? I suspect you may already be familiar with it, but for an overview on how to get at data in feature tables, look at the Feature Annotation HOWTO: > > http://www.bioperl.org/wiki/HOWTO:Feature-Annotation > > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From rmb32 at cornell.edu Tue Dec 15 10:09:43 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Tue, 15 Dec 2009 07:09:43 -0800 Subject: [Bioperl-l] AGI's fpc stuff: Bio::Map::Physical, Bio::MapIO::fpc, etc Message-ID: <4B27A6B7.6090709@cornell.edu> Hi all, Recently I caught an interesting thing related to making GFF files out of FPC maps built recently using Bio::MapIO;:fpc. All of the coordinates in the resulting GFF3 and the sizes of the contigs and clones seem to be dilated by 4x from where they should be. This didn't happen with some earlier FPC datasets I ran through these modules. I haven't gone through any of this very thoroughly, but I notice in Bio::Map::Physical::print_gffstyle() at line 765 there's a line like 'my $basepair = 4096', and the routine goes on to use $basepair as a sort of multiplier for converting the native physical map units into basepairs for GFF-style output. This makes me wonder if the newer FPC datasets coming out require a different $basepairs value, maybe 1024? Are the original authors of these modules still around on this list? Rob -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From tristan.lefebure at gmail.com Tue Dec 15 12:18:26 2009 From: tristan.lefebure at gmail.com (Tristan Lefebure) Date: Tue, 15 Dec 2009 12:18:26 -0500 Subject: [Bioperl-l] ncurses and bioperl? Message-ID: <200912151218.26357.tristan.lefebure@gmail.com> Hello, (Be careful: the following is a very naive question) Something that I find myself missing is a simple way to look at alignments and trees on remote machines where I don't have access to X. Since, (1) one can make wonderful terminal programs like screen and emacs by using ncurses, (2) that alignment and tree objects are already well handled in bioperl, and (3) that there is a CPAN Curses module; doing 1+2+3, may I dream of a curse/bioperl perl program to render alignment and trees? I suppose a plain C program would be much better, but well I am a biologist... Thanks, --Tristan From jason at bioperl.org Tue Dec 15 12:50:52 2009 From: jason at bioperl.org (Jason Stajich) Date: Tue, 15 Dec 2009 09:50:52 -0800 Subject: [Bioperl-l] ncurses and bioperl? In-Reply-To: <200912151218.26357.tristan.lefebure@gmail.com> References: <200912151218.26357.tristan.lefebure@gmail.com> Message-ID: not to say this isn't a good idea, but currently for curses I would use the treeviewing with retree from PHYLIP and for short read alignments the samtools tview or Gambit (MarthLab) works great or something like ralee for viewing MSA alignments (though targeted for RNA editing) http://personalpages.manchester.ac.uk/staff/sam.griffiths-jones/software/ralee/ http://dx.doi.org/10.1093/bioinformatics/bth489 Just that there are prior examples so would be able to learn from them if you still wanted to roll your own here. -jason On Dec 15, 2009, at 9:18 AM, Tristan Lefebure wrote: > Hello, > > (Be careful: the following is a very naive question) > > Something that I find myself missing is a simple way to look > at alignments and trees on remote machines where I don't > have access to X. Since, > (1) one can make wonderful terminal programs like screen > and emacs by using ncurses, > (2) that alignment and tree objects are already well > handled in bioperl, and > (3) that there is a CPAN Curses module; > > doing 1+2+3, may I dream of a curse/bioperl perl program to > render alignment and trees? I suppose a plain C program > would be much better, but well I am a biologist... > > Thanks, > > --Tristan > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From roy.chaudhuri at gmail.com Tue Dec 15 12:47:26 2009 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Tue, 15 Dec 2009 17:47:26 +0000 Subject: [Bioperl-l] ncurses and bioperl? In-Reply-To: <200912151218.26357.tristan.lefebure@gmail.com> References: <200912151218.26357.tristan.lefebure@gmail.com> Message-ID: <4B27CBAE.5000303@gmail.com> Hi Tristan, Not a Bioperl solution, but retree from the Phylip package displays trees in a terminal. Roy. On 15/12/2009 17:18, Tristan Lefebure wrote: > Hello, > > (Be careful: the following is a very naive question) > > Something that I find myself missing is a simple way to look > at alignments and trees on remote machines where I don't > have access to X. Since, > (1) one can make wonderful terminal programs like screen > and emacs by using ncurses, > (2) that alignment and tree objects are already well > handled in bioperl, and > (3) that there is a CPAN Curses module; > > doing 1+2+3, may I dream of a curse/bioperl perl program to > render alignment and trees? I suppose a plain C program > would be much better, but well I am a biologist... > > Thanks, > > --Tristan > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From nml5566 at gmail.com Tue Dec 15 16:37:30 2009 From: nml5566 at gmail.com (Nathan Liles) Date: Tue, 15 Dec 2009 15:37:30 -0600 Subject: [Bioperl-l] Bio::Ontology::OBOEngine for parsing obo files? Message-ID: <81a20b1e0912151337q786b6c35se18328173ec27abd@mail.gmail.com> Is the Bio::Ontology::OBOEngine module working or being currently maintained? I tried following the documentation in the module: * use Bio::Ontology::OBOEngine; my $parser = Bio::Ontology::OBOEngine->new ( -file => "gene_ontology.obo" ); my $engine = $parser->parse(); *But, it throws an error when I run the file saying 'Can't locate object method "parse" '. Does anyone have any experience getting this module working; or, is there any alternative bioperl module to extract terms and relationships out of sequence ontology files? From hlapp at drycafe.net Tue Dec 15 17:05:10 2009 From: hlapp at drycafe.net (Hilmar Lapp) Date: Tue, 15 Dec 2009 17:05:10 -0500 Subject: [Bioperl-l] Bio::Ontology::OBOEngine for parsing obo files? In-Reply-To: <81a20b1e0912151337q786b6c35se18328173ec27abd@mail.gmail.com> References: <81a20b1e0912151337q786b6c35se18328173ec27abd@mail.gmail.com> Message-ID: That shouldn't happen I suppose, but you're not supposed really to use the engine directly. Rather it will be used as a backing parser by the Bio::OntologyIO parser you choose. Have you tried that route and found it not to work? -hilmar On Dec 15, 2009, at 4:37 PM, Nathan Liles wrote: > Is the Bio::Ontology::OBOEngine module working or being currently > maintained? I tried following the documentation in the module: > > * use Bio::Ontology::OBOEngine; > > my $parser = Bio::Ontology::OBOEngine->new > ( -file => "gene_ontology.obo" ); > > my $engine = $parser->parse(); > > *But, it throws an error when I run the file saying 'Can't locate > object > method "parse" '. Does anyone have any experience getting this module > working; or, is there any alternative bioperl module to extract > terms and > relationships out of sequence ontology files? > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From David.Messina at sbc.su.se Wed Dec 16 04:58:16 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 16 Dec 2009 10:58:16 +0100 Subject: [Bioperl-l] LocatableSeq NSE format - suggested changes In-Reply-To: <579AEE4E-A79C-496B-881C-226616BF0A2B@illinois.edu> References: <579AEE4E-A79C-496B-881C-226616BF0A2B@illinois.edu> Message-ID: I'd tend to be inclined more towards option 1 over option 2 because option 2 pollutes the name field. (Although that's not a huge problem if the '(strand)' is always just before the '/'.) It's a question of whether to optimize human-readability over machine-readabilitiy: option 2 favors the former over the latter, and option 1 the reverse. Whichever way you go, I think > a new method that creates this, and deprecate[s] out simple non-stranded NSE would be great. Dave From maj at fortinbras.us Wed Dec 16 07:51:24 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 16 Dec 2009 07:51:24 -0500 Subject: [Bioperl-l] LocatableSeq NSE format - suggested changes In-Reply-To: <579AEE4E-A79C-496B-881C-226616BF0A2B@illinois.edu> References: <579AEE4E-A79C-496B-881C-226616BF0A2B@illinois.edu> Message-ID: <6723123C0ABD447190639AE1F5D1A6A7@NewLife> I'm with Dave; option 1 is cleaner. The only problem might be the automatic interpretation of older output as always plus strand, but presumably these would have had to record the strandedness explicitly elsewhere, so they would be updatable. I'm definitely for making strandedness part of the spec in some way. cheers MAJ ----- Original Message ----- From: "Chris Fields" To: "BioPerl List" Sent: Monday, December 14, 2009 8:23 PM Subject: [Bioperl-l] LocatableSeq NSE format - suggested changes > All, > > The current output for NSE format (Name/Start-End) via > Bio::LocatableSeq::get_nse() currently doesn't allow for strandedness. I have > seen two variations of NSE that incorporate strandedness: > > 1) Stockholm Rfam reverses start and end if the strand == -1 > > chrY/598-1 > > 2) Sheldon McKay's Gbrowse_syn uses Name(strand)/start-end > > rice-3(+)/16598648-16600199 > > The former breaks fewer things within BioPerl, but the latter seems more > explicit. Any preferences? Do we want a new method that creates this, and > deprecate out simple non-stranded NSE? > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From tuco at pasteur.fr Wed Dec 16 09:14:28 2009 From: tuco at pasteur.fr (Emmanuel Quevillon) Date: Wed, 16 Dec 2009 15:14:28 +0100 Subject: [Bioperl-l] Data missing into Annotation object using Bio::SeqIO (Genbank) Message-ID: <4B28EB44.3080006@pasteur.fr> Hi, I've wrote a small Genbank parser few months ago before BioPerl release 1.6.0. I tried to use my code once again but now the output of my parser is empty. It looks like Annotation from seqfeatures is not filled anymore. Here is the code I used previously: while(my $seq = $streamer->next_seq()){ #We only want to retrieve CDS features... foreach my $feat (grep { $_->primary_tag() eq 'CDS' } $seq->get_SeqFeatures()){ print $ofh join("#", $feat->annotation()->get_Annotations('locus_tag'), # Acc num $feat->annotation()->get_Annotations('gene') ? $feat->annotation()->get_Annotations('gene') # Gene name : $feat->annotation()->get_Annotations('locus_tag'), $feat->annotation()->get_Annotations('product'), # Description ),"\n"; } } $feat is a Bio::SeqFeature::Generic object If I print Dumper($feat->annotation()) here is the output : $VAR1 = bless( { '_typemap' => bless( { '_type' => { 'comment' => 'Bio::Annotation::Comment', 'reference' => 'Bio::Annotation::Reference', 'dblink' => 'Bio::Annotation::DBLink' } }, 'Bio::Annotation::TypeManager' ), '_annotation' => {} }, 'Bio::Annotation::Collection' ); Have some changes been made into the way annotation object is populated? Thanks for any clue and sorry if my question look stupid Regards Emmanuel -- ------------------------- Emmanuel Quevillon Biological Software and Databases Group Institut Pasteur +33 1 44 38 95 98 tuco at_ pasteur dot fr ------------------------- From cjfields at illinois.edu Wed Dec 16 10:09:56 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 16 Dec 2009 09:09:56 -0600 Subject: [Bioperl-l] Data missing into Annotation object using Bio::SeqIO (Genbank) In-Reply-To: <4B28EB44.3080006@pasteur.fr> References: <4B28EB44.3080006@pasteur.fr> Message-ID: <29CB0088-99C1-417E-BB3B-56FE7EC135F9@illinois.edu> Emmanuel, The previous behavior in the 1.5.x series was to store feature tags as Bio::Annotation. The problem had been the way this was implemented was considered unsatisfactory for various reasons, so we reverted back to using simple tag-value pairs as the default. You can get at the data this way (from the Feature/Annotation HOWTO): for my $feat_object ($seq_object->get_SeqFeatures) { print "primary tag: ", $feat_object->primary_tag, "\n"; for my $tag ($feat_object->get_all_tags) { print " tag: ", $tag, "\n"; for my $value ($feat_object->get_tag_values($tag)) { print " value: ", $value, "\n"; } } } You can also convert all the tag-value data into a Bio::Annotation::Collection using the Bio::SeqFeature::AnnotationAdaptor, but this is completely optional. chris On Dec 16, 2009, at 8:14 AM, Emmanuel Quevillon wrote: > Hi, > > I've wrote a small Genbank parser few months ago before BioPerl release 1.6.0. > I tried to use my code once again but now the output of my parser is empty. > It looks like Annotation from seqfeatures is not filled anymore. > > Here is the code I used previously: > > while(my $seq = $streamer->next_seq()){ > > #We only want to retrieve CDS features... > foreach my $feat (grep { $_->primary_tag() eq 'CDS' } $seq->get_SeqFeatures()){ > print $ofh join("#", > $feat->annotation()->get_Annotations('locus_tag'), # Acc num > $feat->annotation()->get_Annotations('gene') > ? $feat->annotation()->get_Annotations('gene') # Gene name > : $feat->annotation()->get_Annotations('locus_tag'), > $feat->annotation()->get_Annotations('product'), # Description > ),"\n"; > } > } > > $feat is a Bio::SeqFeature::Generic object > > If I print Dumper($feat->annotation()) here is the output : > > $VAR1 = bless( { > '_typemap' => bless( { > '_type' => { > 'comment' => 'Bio::Annotation::Comment', > 'reference' => 'Bio::Annotation::Reference', > 'dblink' => 'Bio::Annotation::DBLink' > } > }, 'Bio::Annotation::TypeManager' ), > '_annotation' => {} > }, 'Bio::Annotation::Collection' ); > > Have some changes been made into the way annotation object is populated? > > Thanks for any clue and sorry if my question look stupid > > Regards > > Emmanuel > > -- > ------------------------- > Emmanuel Quevillon > Biological Software and Databases Group > Institut Pasteur > +33 1 44 38 95 98 > tuco at_ pasteur dot fr > ------------------------- > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From tuco at pasteur.fr Wed Dec 16 10:37:45 2009 From: tuco at pasteur.fr (Emmanuel Quevillon) Date: Wed, 16 Dec 2009 16:37:45 +0100 Subject: [Bioperl-l] Data missing into Annotation object using Bio::SeqIO (Genbank) In-Reply-To: <29CB0088-99C1-417E-BB3B-56FE7EC135F9@illinois.edu> References: <4B28EB44.3080006@pasteur.fr> <29CB0088-99C1-417E-BB3B-56FE7EC135F9@illinois.edu> Message-ID: <4B28FEC9.1080509@pasteur.fr> On 12/16/2009 04:09 PM, Chris Fields wrote: > Emmanuel, > > The previous behavior in the 1.5.x series was to store feature tags as Bio::Annotation. The problem had been the way this was implemented was considered unsatisfactory for various reasons, so we reverted back to using simple tag-value pairs as the default. You can get at the data this way (from the Feature/Annotation HOWTO): > > for my $feat_object ($seq_object->get_SeqFeatures) { > print "primary tag: ", $feat_object->primary_tag, "\n"; > for my $tag ($feat_object->get_all_tags) { > print " tag: ", $tag, "\n"; > for my $value ($feat_object->get_tag_values($tag)) { > print " value: ", $value, "\n"; > } > } > } > > You can also convert all the tag-value data into a Bio::Annotation::Collection using the Bio::SeqFeature::AnnotationAdaptor, but this is completely optional. > > chris > > Hi Chris Thanks for the infos. I indeed revert back to using $feat->get_tag_values() and it works as previously. For my small problem I can keep this solution which far adapted for my problem. Regards Emmanuel -- ------------------------- Emmanuel Quevillon Biological Software and Databases Group Institut Pasteur +33 1 44 38 95 98 tuco at_ pasteur dot fr ------------------------- From sung at bio.cc Wed Dec 16 12:55:16 2009 From: sung at bio.cc (Sungsam Gong) Date: Wed, 16 Dec 2009 17:55:16 +0000 Subject: [Bioperl-l] pdb.pm and annotations Message-ID: <2dade3480912160955h4f77277dv8e6b47b7b0fda23a@mail.gmail.com> Hi, Wanted to get pubmed identifier from a PDB file using Bio::Structure, so hacked the code. Knew that Bio::Structure::IO::pdb.pm get relevant info from either 'JRNL' or 'REMARK 1'. However could not see any actual code parsing 'PMID'. >From pdb.pm, what I see: sub _read_PDB_jrnl { ... $auth = $self->_concatenate_lines($auth,$rol) if ($subr eq "AUTH"); $titl = $self->_concatenate_lines($titl,$rol) if ($subr eq "TITL"); $edit = $self->_concatenate_lines($edit,$rol) if ($subr eq "EDIT"); $ref = $self->_concatenate_lines($ref ,$rol) if ($subr eq "REF"); $publ = $self->_concatenate_lines($publ,$rol) if ($subr eq "PUBL"); $refn = $self->_concatenate_lines($refn,$rol) if ($subr eq "REFN"); ... } sub _read_PDB_remark_1 { ... $auth = $self->_concatenate_lines($auth,$rol) if ($subr eq "AUTH"); $titl = $self->_concatenate_lines($titl,$rol) if ($subr eq "TITL"); $edit = $self->_concatenate_lines($edit,$rol) if ($subr eq "EDIT"); $ref = $self->_concatenate_lines($ref ,$rol) if ($subr eq "REF"); $publ = $self->_concatenate_lines($publ,$rol) if ($subr eq "PUBL"); $refn = $self->_concatenate_lines($refn,$rol) if ($subr eq "REFN"); ... } >From my script, I did: ($struc->annotation->get_Annotations('reference'))[0]->authors ($struc->annotation->get_Annotations('reference'))[0]->title or my $hash_ref=($struc->annotation->get_Annotations('reference'))[0]->hash_tree for my $key (keys %{$hash_ref}) { print $key,": ",$hash_ref->{$key},"\n"; } Any plan to include a code chopping 'PMID' out? Or did I miss something? Cheers, Sung From nml5566 at gmail.com Wed Dec 16 14:42:57 2009 From: nml5566 at gmail.com (Nathan Liles) Date: Wed, 16 Dec 2009 13:42:57 -0600 Subject: [Bioperl-l] Bio::Ontology::OBOEngine for parsing obo files? In-Reply-To: References: <81a20b1e0912151337q786b6c35se18328173ec27abd@mail.gmail.com> Message-ID: <81a20b1e0912161142m77051529se59b4621a0add13b@mail.gmail.com> Actually, yes I did find that and it works very well. Now I'm wondering, is it possible to search for similar terms using a string instead of a Bio::Ontology term object? For examle, I'd like to search for the synonym: "transcription start site" and have it return all similar terms. But, it throws an error if I pass in a simple query like that. -Nathan On Tue, Dec 15, 2009 at 4:05 PM, Hilmar Lapp wrote: > That shouldn't happen I suppose, but you're not supposed really to use the > engine directly. Rather it will be used as a backing parser by the > Bio::OntologyIO parser you choose. Have you tried that route and found it > not to work? > > -hilmar > > > On Dec 15, 2009, at 4:37 PM, Nathan Liles wrote: > > Is the Bio::Ontology::OBOEngine module working or being currently >> maintained? I tried following the documentation in the module: >> >> * use Bio::Ontology::OBOEngine; >> >> my $parser = Bio::Ontology::OBOEngine->new >> ( -file => "gene_ontology.obo" ); >> >> my $engine = $parser->parse(); >> >> *But, it throws an error when I run the file saying 'Can't locate object >> method "parse" '. Does anyone have any experience getting this module >> working; or, is there any alternative bioperl module to extract terms and >> relationships out of sequence ontology files? >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > =========================================================== > > > > > From cjfields1 at gmail.com Wed Dec 16 19:53:50 2009 From: cjfields1 at gmail.com (Chris Fields) Date: Wed, 16 Dec 2009 16:53:50 -0800 (PST) Subject: [Bioperl-l] Test post from Google Groups Message-ID: Howdy from Google Groups From cjfields1 at gmail.com Wed Dec 16 20:01:38 2009 From: cjfields1 at gmail.com (Chris Fields) Date: Wed, 16 Dec 2009 17:01:38 -0800 (PST) Subject: [Bioperl-l] bioperl-l Google Groups mirror Message-ID: <87fe54c9-766e-458a-bc09-0a658d050453@m3g2000yqf.googlegroups.com> I would like to announce (with the tremendous help of Hilmar Lapp) the creation of a mirror for the BioPerl mail list, if the last post didn't already give it away. http://groups.google.com/group/bioperl-l One can join the group and submit posts via the Google Groups web interface or via email. Have fun! chris From ocarnorsk138 at gmail.com Wed Dec 16 20:12:21 2009 From: ocarnorsk138 at gmail.com (Ocar Campos) Date: Wed, 16 Dec 2009 17:12:21 -0800 (PST) Subject: [Bioperl-l] Test post from Google Groups In-Reply-To: References: Message-ID: <03416808-ec4b-44b3-8269-6743a26b5368@k4g2000yqb.googlegroups.com> testing back from google group! On Dec 16, 9:53?pm, Chris Fields wrote: > Howdy from Google Groups > _______________________________________________ > Bioperl-l mailing list > Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l From p.j.a.cock at googlemail.com Thu Dec 17 05:50:23 2009 From: p.j.a.cock at googlemail.com (Peter) Date: Thu, 17 Dec 2009 02:50:23 -0800 (PST) Subject: [Bioperl-l] bioperl-l Google Groups mirror In-Reply-To: <87fe54c9-766e-458a-bc09-0a658d050453@m3g2000yqf.googlegroups.com> References: <87fe54c9-766e-458a-bc09-0a658d050453@m3g2000yqf.googlegroups.com> Message-ID: On Dec 17, 1:01?am, Chris Fields wrote: > I would like to announce (with the tremendous help of Hilmar Lapp) the > creation of a mirror for the BioPerl mail list, if the last post > didn't already give it away. > > http://groups.google.com/group/bioperl-l > > One can join the group and submit posts via the Google Groups web > interface or via email. ?Have fun! > > chris Sounds particularly good in the long run (once there is enough of an archive on Google Groups to make searching there useful). Does this mean a Google Groups user doesn't have to be subscribed to the mailing list to post (since the mailing list normally only allows subscribers to post)? Peter From David.Messina at sbc.su.se Thu Dec 17 07:25:49 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 17 Dec 2009 13:25:49 +0100 Subject: [Bioperl-l] bioperl-l Google Groups mirror In-Reply-To: References: <87fe54c9-766e-458a-bc09-0a658d050453@m3g2000yqf.googlegroups.com> Message-ID: <1D13A126-0A51-4815-89D6-664AC062C2AD@sbc.su.se> Very nice, Chris and Hilmar! That'll be great. > Does this mean a Google Groups user doesn't have to be subscribed > to the mailing list to post (since the mailing list normally only > allows subscribers to post)? I think that's right. From the Google groups page: > You can join (and post to) the list either here through Google Groups, or at the BioPerl-l mailing list home, using the web-interface or email, respectively. Dave From cjfields at illinois.edu Thu Dec 17 08:21:46 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 17 Dec 2009 07:21:46 -0600 Subject: [Bioperl-l] bioperl-l Google Groups mirror In-Reply-To: <1D13A126-0A51-4815-89D6-664AC062C2AD@sbc.su.se> References: <87fe54c9-766e-458a-bc09-0a658d050453@m3g2000yqf.googlegroups.com> <1D13A126-0A51-4815-89D6-664AC062C2AD@sbc.su.se> Message-ID: <209F1321-37DD-4B6C-A153-8A5AA0EF3E0A@illinois.edu> On Dec 17, 2009, at 6:25 AM, Dave Messina wrote: > Very nice, Chris and Hilmar! That'll be great. > > > >> Does this mean a Google Groups user doesn't have to be subscribed >> to the mailing list to post (since the mailing list normally only >> allows subscribers to post)? > > > I think that's right. From the Google groups page: > >> You can join (and post to) the list either here through Google Groups, or at the BioPerl-l mailing list home, using the web-interface or email, respectively. > > > > > Dave It is moderated by user to deal with spam. Hilmar's already a manager/co-owner, and either of us can add more as needed. chris From hlapp at drycafe.net Thu Dec 17 09:52:33 2009 From: hlapp at drycafe.net (Hilmar Lapp) Date: Thu, 17 Dec 2009 09:52:33 -0500 Subject: [Bioperl-l] bioperl-l Google Groups mirror In-Reply-To: References: <87fe54c9-766e-458a-bc09-0a658d050453@m3g2000yqf.googlegroups.com> Message-ID: <56214506-9BE7-4761-9E87-3A43D3707A29@drycafe.net> On Dec 17, 2009, at 5:50 AM, Peter wrote: > Does this mean a Google Groups user doesn't have to be subscribed > to the mailing list to post Yes. They can post through the Google Groups web interface. The email address for mirrored groups is the one of the list being mirrored though, bioperl-l at bioperl.org in this case, and so in order to post by email you still have to be subscribed at the bioperl-l list. At least that's what the docs at Google say. I haven't tried yet posting to the group at the bioperl-l at googlegroups dot com email under an email address that isn't subscribed to bioperl-l at bioperl dot org. Maybe it actually would work, contrary to docs. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From jay at jays.net Thu Dec 17 12:05:24 2009 From: jay at jays.net (Jay Hannah) Date: Thu, 17 Dec 2009 11:05:24 -0600 Subject: [Bioperl-l] bioperl-l Google Groups mirror In-Reply-To: <56214506-9BE7-4761-9E87-3A43D3707A29@drycafe.net> References: <87fe54c9-766e-458a-bc09-0a658d050453@m3g2000yqf.googlegroups.com> <56214506-9BE7-4761-9E87-3A43D3707A29@drycafe.net> Message-ID: <9BDF08A3-67E0-4F5E-8429-11AE586F6504@jays.net> On Dec 17, 2009, at 8:52 AM, Hilmar Lapp wrote: > I haven't tried yet posting to the group at the bioperl-l at googlegroups dot com email under an email address that isn't subscribed to bioperl-l at bioperl dot org. Maybe it actually would work, contrary to docs. In my experience (and ignoring a brief glitch this summer) moderation of new members works great. Almost zero spam gets through. Not as convenient for the admin as MailMan self-service email verification, but perhaps easier for some users and not too much admin work if you don't have too many new legitimate members every month. Here is the configuration set I recommend: http://clab.ist.unomaha.edu/~jhannah/tmp/google_groups.png Your membership roles will end up with quite a few junk accounts, but those bots can't post, so it's not that big a deal. I purge mine manually once a year or so. HTH, j http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah From robert.bradbury at gmail.com Thu Dec 17 14:42:54 2009 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Thu, 17 Dec 2009 14:42:54 -0500 Subject: [Bioperl-l] Remote blast fork errors / Process limit restrictions In-Reply-To: <39420695-FC14-446F-A4C2-C187E2221E44@bioperl.org> References: <39420695-FC14-446F-A4C2-C187E2221E44@bioperl.org> Message-ID: Just to close out the issue of bioperl forking (in particular accesses to external databases through get_sequence) which involves individual database sub-modules and not collecting its children. As it turns out the code does do an explicit fork, it looks like so the child process can read from the database while the parent process manipulates the data as it becomes available. Now, one could argue that a threaded model might be better since now threads are fairly standard OS tools in current environments. But I couldn't find any functions which actually wait for the forked process (presumably because they are created for "future" use). But nor is there any indication in the pages I've found in most of the documentation (which is spread across the web) or Wiki that explain that "creating child processes" is how these functions work and one *needs* to collect those children after each use or else zombie processes will accumulate, which on "reasonable" systems with per-user process limits will create problems for proper program functioning. Nor (it would appear) does the parent process setup a SIGCHLD "catcher" which could collect the processes once they exit (which I expect in the case of "get_sequence" would be after closing of the socket which actually fetched the sequence from Genbank. It can be resolved easily enough by adding a call after each use of these functions: $kid = waitpid(-1, WNOHANG); But typically, as a programmer, I should not be responsible for having to clean up the leftovers of library calls (unless said cleanup requirements are clearly documented). But to a "newbie" using the functions, coming from a functional background (C), not an OO background (which at least I would tend to view as a wart on the otherwise robust Perl language), there are two problems 1. The lack of documentation and examples explaining how the functions work and how they must be handled at a higher level (by executing explicit wait system calls). 2. The lack of code in the BioPerl functions to deal with the forked processes which they create. Functional programmers have a perspective -- if you create it -- you have to clean it up. It would appear that in the transition to OO programming (or perhaps simply for expediency) that detail was left out of both (either/and) the documentation and the code. From this standpoint one could view garbage collectors as being fundamentally evil -- because they gloss over the fact that programmers should know what they are doing and when they are doing it. So, everywhere in the documentation where there is a get_sequence call (or anything which accesses an external database which causes a fork to occur) there should be a modification as I have outlined above -- or else the code should be corrected so orphaned children are always collected and not allowed to accumulate. From robert.bradbury at gmail.com Thu Dec 17 15:23:38 2009 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Thu, 17 Dec 2009 15:23:38 -0500 Subject: [Bioperl-l] Remote blast fork errors / Process limit restrictions In-Reply-To: References: <39420695-FC14-446F-A4C2-C187E2221E44@bioperl.org> Message-ID: Oh, yes, in case it was not clear, the fork calls which fails is in DB/WebDBSeqI.pm: line 722 defined(my $pid = fork) or $self->throw("'Couldn't fork: $!"); And of course that is because Linux has reached the process limits for the user (due to accumulated background processes which are uncollected). And they could be resolved by simply executing a simple waitpid call for prior orphaned children before forking [1] But such a succinct solution would violate "functional" programming rules -- clean up what you create -- instead they would tend to fall into the OO camp -- "Oh don't worry the garbage collector will take care of it". Green programming is a little less cavalier. Robert 1. IMO, a very very real problem with programming today is that there is no connection between programmers and the cost of their programs. How many programmers know the instruction cycle time of their computers, what does an instruction cost in terms of W consumed, W wasted (heat generation), fruitless scanning over uncollected zombie processes, etc. It may be that only that programmers who grew up in the era when CPU cycles were expensive (300 ns/cycle) who know what each instruction required in terms of cycles consider these perspectives. Now things (cpu use, processor use, etc) tend to be swept under the rug and it appears that that is the case with the standard implementation of bioper. The documentation does not clearly state that additional sub-processes may be created and need to be collected. You are providing a utility that only works "this much". And guess what -- I happen to have run into the "this". From cjfields at illinois.edu Thu Dec 17 15:25:56 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 17 Dec 2009 14:25:56 -0600 Subject: [Bioperl-l] Remote blast fork errors / Process limit restrictions In-Reply-To: References: <39420695-FC14-446F-A4C2-C187E2221E44@bioperl.org> Message-ID: Robert, I have previously outlined specifically why you are seeing the fork issue, and a possible solution. IIRC it primarily has to do with you trying to do something more advanced using the (very basic) Bio::Perl procedural interface, something along the lines of pulling a sequence and using RemoteBlast. Retrieving a sequence from a remote database is a forked process on most OS's (I think Win is the sole exception) and occurs internally in Bio::Perl via Bio::DB::GenBank. Setting up your own pipeline, using Bio::DB::GenBank (set to use temp files), followed by Bio::Tools::Run::RemoteBlast or Bio::Perl, are options in the meantime. Trying to catch signals can be notoriously flaky cross-platform and cross perl versions; I recall running into problems with CygWin and OS X. We can modify Bio::Perl to use a temp file instead, which avoids the whole use of forks altogether, and is probably the best long-term solution. My last bit: I don't usually say this, primarily b/c it's misconstrued by some, but 'patches are always welcome'. What doesn't work is just telling us to arbitrarily change code w/o indicating exactly where to do so. The tone you use, which comes off a tad condescending, can be abrasive and may not garner any response (or at least will get you one you don't expect). Please keep that in mind. chris On Dec 17, 2009, at 1:42 PM, Robert Bradbury wrote: > Just to close out the issue of bioperl forking (in particular accesses to > external databases through get_sequence) which involves individual database > sub-modules and not collecting its children. > > As it turns out the code does do an explicit fork, it looks like so the > child process can read from the database while the parent process > manipulates the data as it becomes available. Now, one could argue that a > threaded model might be better since now threads are fairly standard OS > tools in current environments. > > But I couldn't find any functions which actually wait for the forked process > (presumably because they are created for "future" use). But nor is there > any indication in the pages I've found in most of the documentation (which > is spread across the web) or Wiki that explain that "creating child > processes" is how these functions work and one *needs* to collect those > children after each use or else zombie processes will accumulate, which on > "reasonable" systems with per-user process limits will create problems for > proper program functioning. Nor (it would appear) does the parent process > setup a SIGCHLD "catcher" which could collect the processes once they exit > (which I expect in the case of "get_sequence" would be after closing of the > socket which actually fetched the sequence from Genbank. > > It can be resolved easily enough by adding a call after each use of these > functions: > $kid = waitpid(-1, WNOHANG); > But typically, as a programmer, I should not be responsible for having to > clean up the leftovers of library calls (unless said cleanup requirements > are clearly documented). > > > But to a "newbie" using the functions, coming from a functional background > (C), not an OO background (which at least I would tend to view as a wart on > the otherwise robust Perl language), there are two problems > 1. The lack of documentation and examples explaining how the functions work > and how they must be handled at a higher level (by executing explicit wait > system calls). > 2. The lack of code in the BioPerl functions to deal with the forked > processes which they create. Functional programmers have a perspective -- > if you create it -- you have to clean it up. It would appear that in the > transition to OO programming (or perhaps simply for expediency) that detail > was left out of both (either/and) the documentation and the code. From this > standpoint one could view garbage collectors as being fundamentally evil -- > because they gloss over the fact that programmers should know what they are > doing and when they are doing it. > > So, everywhere in the documentation where there is a get_sequence call (or > anything which accesses an external database which causes a fork to occur) > there should be a modification as I have outlined above -- or else the code > should be corrected so orphaned children are always collected and not > allowed to accumulate. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu Dec 17 15:29:10 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 17 Dec 2009 14:29:10 -0600 Subject: [Bioperl-l] Remote blast fork errors / Process limit restrictions In-Reply-To: References: <39420695-FC14-446F-A4C2-C187E2221E44@bioperl.org> Message-ID: On Dec 17, 2009, at 2:23 PM, Robert Bradbury wrote: > Oh, yes, in case it was not clear, the fork calls which fails is in > DB/WebDBSeqI.pm: line 722 > defined(my $pid = fork) > or $self->throw("'Couldn't fork: $!"); Okay, that's a bit more helpful. > And of course that is because Linux has reached the process limits for the > user (due to accumulated background processes which are uncollected). Right, but again, we need to check this in a cross-platform compatible way. > And they could be resolved by simply executing a simple waitpid call for > prior orphaned children before forking [1] But such a succinct solution > would violate "functional" programming rules -- clean up what you create -- > instead they would tend to fall into the OO camp -- "Oh don't worry the > garbage collector will take care of it". Green programming is a little less > cavalier. > > Robert > > 1. IMO, a very very real problem with programming today is that there is no > connection between programmers and the cost of their programs. How many > programmers know the instruction cycle time of their computers, what does an > instruction cost in terms of W consumed, W wasted (heat generation), > fruitless scanning over uncollected zombie processes, etc. It may be that > only that programmers who grew up in the era when CPU cycles were expensive > (300 ns/cycle) who know what each instruction required in terms of cycles > consider these perspectives. Now things (cpu use, processor use, etc) tend > to be swept under the rug and it appears that that is the case with the > standard implementation of bioper. The documentation does not clearly state > that additional sub-processes may be created and need to be collected. You > are providing a utility that only works "this much". And guess what -- I > happen to have run into the "this". Um, yeah. Okay. chris From robfsouza at gmail.com Fri Dec 18 13:07:34 2009 From: robfsouza at gmail.com (Robson Francisco de Souza) Date: Fri, 18 Dec 2009 13:07:34 -0500 Subject: [Bioperl-l] Fwd: blast.pm patch In-Reply-To: References: Message-ID: Hi, I've been dealing with an apparent bug in the output of NCBI's BLAST programs (blastall, blastpgp) which sometimes produces output like the one below. I think I've managed to produce a work around for Bioperl blast.pm parser and would like to contribute it to Bioperl. The fix is based on blast.pm from the CVS tree (downloaded some months ago...) and is attached to this message. Best, Robson PS: what happened to the bioperl-bugs mailing list? It does not seem to be working... >gi|156552846|ref|XP_001600053.1| PREDICTED: similar to conserved ? ? ? ? ? hypothetical protein [Nasonia vitripennis] ? ? ? ? ?Length = 1774 ?Score = 75.9 bits (185), Expect = 1e-11, ? Method: Compositional matrix adjust. ?Identities = 85/393 (21%), Positives = 175/393 (44%), Gaps = 28/393 (7%) Query: 0 ? - Sbjct: 328 P ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328 Query: 0 Sbjct: 328 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328 Query: 0 Sbjct: 328 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328 Query: 0 Sbjct: 328 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328 Query: 0 Sbjct: 328 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328 Query: 0 Sbjct: 328 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328 Query: 0 Sbjct: 328 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328 Query: 0 Sbjct: 328 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328 Query: 0 Sbjct: 328 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328 Query: 0 Sbjct: 328 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328 Query: 612 VPPPPGSGIPMPPGGGFFGMKTKLP-----KLPELKATKDTKKIHIAG ? ? ? ? ? ? 654 ? ? ? ? ? ?P PP + ? + P ? ? ? KTK+ ? ? ?K+P ?K ? ? ? ? + Sbjct: 329 TPEPPNNSAKLLPQQEIPTPKTKMKTINWNKIPNHKVIGKRNIWSLVA ? ? ? ? ? ? 376 Query: 655 DKINNKDIEGTGWMSILEENAEKMSKIFDKN-LFENNFQKKETRDAPSQEKENVPTLVSF 713 ? ? ? ? ? ++ ?N ?+ ? ?W ?+ ? ? +++ ?+ ? N ? ?NN ? ? ? D ? +E ? ?PT ++ Sbjct: 377 NEHQNSPMADLDWAEMEGLFCQQVPPMIPANTTCSNNLGNGVDTDKRRRE----PTEIAL 432 Query: 714 LDSKTSYQLALLLGFLKKNEREIRKHVIDLNEKELQKQTIHSLKDLCPEEDKFKEIESFV 773 ? ? ? ? ? LD K S ?+ + L ? + + ?+I + + D ? ?++ ?+ + ?L ?+ PE D+ + ++SF Sbjct: 433 LDGKRSLNVNIFLKQFRSSNEDIIQLIKDGGHDDIGAEKLRGLLKILPEVDELEMLKSF- 491 Query: 774 QKGDGYLEQLEPGDKLFYAMKDIPRLKQRFTAWSSQIYFEGSVISVEPDIESLNRACKNI 833 ? ? ? ? ? ? ?DG ? +L ? +K F ?+ ?+P ?K R ? ? ?+ ?F ?++ ?+EP I S+ ?A +++ Sbjct: 492 ---DGDKLKLGNAEKFFLQLIQVPNYKLRIECMLLKEEFAANMSYLEPSINSMILAGEDL 548 Query: 834 VQCKSLQRLMTLIVLLVNFLNKAKTDKDRVYGFKLNFLTKLGDIKSSSDPNRSMMNYLCE 893 ? ? ? ? ? + ?KSLQ ++ ++++ ?NFLN ? ? ?+ ? G KL+ L KL +I++ ? ?N+ ?MN L Sbjct: 549 MTNKSLQEVLYMVLVAGNFLNSGGYAGN-AAGVKLSSLQKLTEIRA----NKPGMN-LIH 602 Query: 894 FLLAKDDKLIPELLKELK--DYAEVGSRIELPELKKEIGKLNESLKVIQTELEFYKKEQK 951 ? ? ? ? ? ++ ?+ ++ ? +LL ? + ?+ ?+ ?++ ?+ +L ?E ? L+ ?+K I+++++ ? ?E Sbjct: 603 YVAMQAERKRKDLLNFARGMNALDSATKTTVEQLTNEFNALDTRIKKIRSQIQLPTTEA- 661 Query: 952 FINDKFPKQLDEFYQYAKSEMQKINKAQEKLEKILKEVAKFFGE 995 ? ? ? ? ? ? ? ? ?+Q+ +F Q A+ EM ++ + ?E+L+ + + +A+FF E Sbjct: 662 ----DIQEQMAQFLQMAEQEMSQLKRDMEELDGVRRTLAEFFCE 701 -------------- next part -------------- A non-text attachment was scrubbed... Name: blast_patched.pm Type: application/octet-stream Size: 91820 bytes Desc: not available URL: From cjfields at illinois.edu Fri Dec 18 13:33:44 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 18 Dec 2009 12:33:44 -0600 Subject: [Bioperl-l] Fwd: blast.pm patch In-Reply-To: References: Message-ID: Robson, Any chance you could check this against SVN? We haven't used the CVS tree for a few years (had a number of releases along the way as well). Not sure about bioperl-bugs, we have bugzilla still running though: http://bugzilla.open-bio.org/ chris On Dec 18, 2009, at 12:07 PM, Robson Francisco de Souza wrote: > Hi, > > I've been dealing with an apparent bug in the output of NCBI's BLAST > programs (blastall, blastpgp) which sometimes produces output like the > one below. > I think I've managed to produce a work around for Bioperl blast.pm > parser and would like to contribute it to Bioperl. > The fix is based on blast.pm from the CVS tree (downloaded some months > ago...) and is attached to this message. > Best, > Robson > > PS: what happened to the bioperl-bugs mailing list? It does not seem > to be working... > >> gi|156552846|ref|XP_001600053.1| PREDICTED: similar to conserved > hypothetical protein [Nasonia vitripennis] > Length = 1774 > > Score = 75.9 bits (185), Expect = 1e-11, Method: Compositional matrix adjust. > Identities = 85/393 (21%), Positives = 175/393 (44%), Gaps = 28/393 (7%) > > Query: 0 - > > Sbjct: 328 P 328 > > Query: 0 > > Sbjct: 328 328 > > Query: 0 > > Sbjct: 328 328 > > Query: 0 > > Sbjct: 328 328 > > Query: 0 > > Sbjct: 328 328 > > Query: 0 > > Sbjct: 328 328 > > Query: 0 > > Sbjct: 328 328 > > Query: 0 > > Sbjct: 328 328 > > Query: 0 > > Sbjct: 328 328 > > Query: 0 > > Sbjct: 328 328 > > Query: 612 VPPPPGSGIPMPPGGGFFGMKTKLP-----KLPELKATKDTKKIHIAG 654 > P PP + + P KTK+ K+P K + > Sbjct: 329 TPEPPNNSAKLLPQQEIPTPKTKMKTINWNKIPNHKVIGKRNIWSLVA 376 > > Query: 655 DKINNKDIEGTGWMSILEENAEKMSKIFDKN-LFENNFQKKETRDAPSQEKENVPTLVSF 713 > ++ N + W + +++ + N NN D +E PT ++ > Sbjct: 377 NEHQNSPMADLDWAEMEGLFCQQVPPMIPANTTCSNNLGNGVDTDKRRRE----PTEIAL 432 > > Query: 714 LDSKTSYQLALLLGFLKKNEREIRKHVIDLNEKELQKQTIHSLKDLCPEEDKFKEIESFV 773 > LD K S + + L + + +I + + D ++ + + L + PE D+ + ++SF > Sbjct: 433 LDGKRSLNVNIFLKQFRSSNEDIIQLIKDGGHDDIGAEKLRGLLKILPEVDELEMLKSF- 491 > > Query: 774 QKGDGYLEQLEPGDKLFYAMKDIPRLKQRFTAWSSQIYFEGSVISVEPDIESLNRACKNI 833 > DG +L +K F + +P K R + F ++ +EP I S+ A +++ > Sbjct: 492 ---DGDKLKLGNAEKFFLQLIQVPNYKLRIECMLLKEEFAANMSYLEPSINSMILAGEDL 548 > > Query: 834 VQCKSLQRLMTLIVLLVNFLNKAKTDKDRVYGFKLNFLTKLGDIKSSSDPNRSMMNYLCE 893 > + KSLQ ++ ++++ NFLN + G KL+ L KL +I++ N+ MN L > Sbjct: 549 MTNKSLQEVLYMVLVAGNFLNSGGYAGN-AAGVKLSSLQKLTEIRA----NKPGMN-LIH 602 > > Query: 894 FLLAKDDKLIPELLKELK--DYAEVGSRIELPELKKEIGKLNESLKVIQTELEFYKKEQK 951 > ++ + ++ +LL + + + ++ + +L E L+ +K I+++++ E > Sbjct: 603 YVAMQAERKRKDLLNFARGMNALDSATKTTVEQLTNEFNALDTRIKKIRSQIQLPTTEA- 661 > > Query: 952 FINDKFPKQLDEFYQYAKSEMQKINKAQEKLEKILKEVAKFFGE 995 > +Q+ +F Q A+ EM ++ + E+L+ + + +A+FF E > Sbjct: 662 ----DIQEQMAQFLQMAEQEMSQLKRDMEELDGVRRTLAEFFCE 701 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From biopython at maubp.freeserve.co.uk Fri Dec 18 18:00:47 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 18 Dec 2009 23:00:47 +0000 Subject: [Bioperl-l] Fwd: blast.pm patch In-Reply-To: References: Message-ID: <320fb6e00912181500r53c93284yc526ce654ca9050@mail.gmail.com> On Fri, Dec 18, 2009 at 6:07 PM, Robson Francisco de Souza wrote: > Hi, > > I've been dealing with an apparent bug in the output of NCBI's BLAST > programs (blastall, blastpgp) which sometimes produces output like the > one below. > I think I've managed to produce a work around for Bioperl blast.pm > parser and would like to contribute it to Bioperl. > The fix is based on blast.pm from the CVS tree (downloaded some months > ago...) and is attached to this message. > Best, > Robson Do you have a complete example of this kind of funny output? This problem has also been reported with blastpgp for the Biopython parser. I'd love an example for our unit tests (probably worth doing in BioPerl too). Could you upload a test case here?: http://bugzilla.open-bio.org/show_bug.cgi?id=2927 Thanks! Peter @ Biopython From biopython at maubp.freeserve.co.uk Sat Dec 19 06:19:53 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 19 Dec 2009 11:19:53 +0000 Subject: [Bioperl-l] Fwd: blast.pm patch In-Reply-To: References: <320fb6e00912181500r53c93284yc526ce654ca9050@mail.gmail.com> Message-ID: <320fb6e00912190319s75a0eb75m94dfbd7946a310e5@mail.gmail.com> On Sat, Dec 19, 2009 at 11:06 AM, Robson Francisco de Souza wrote: > > Hi Peter, > > I just upload my example. I also reported this bug to the NCBI > developers and I hope they can fix it, since it is easy to reproduce. > I just forgot to mention the blastpgp version: 2.2.18 > Best, > Robson Thank you, Peter From maj at fortinbras.us Sat Dec 19 14:52:45 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sat, 19 Dec 2009 14:52:45 -0500 Subject: [Bioperl-l] NCBI BlastPlus wrapper for your enjoyment Message-ID: Hi All, Your full-service BLAST wrapper, Bio::Tools::Run::StandAloneBlastPlus, is at beta in the bioperl-run trunk. It wraps all the programs of the NCBI's new blast+-2.2.22 suite ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ and integrates them, allowing you to create, mask, and query databases from within a single factory object. See the HOWTO http://www.bioperl.org/wiki/HOWTO:BlastPlus for the usual usage and implementation details. Happy coding-- MAJ From David.Messina at sbc.su.se Sat Dec 19 15:34:10 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Sat, 19 Dec 2009 21:34:10 +0100 Subject: [Bioperl-l] NCBI BlastPlus wrapper for your enjoyment In-Reply-To: References: Message-ID: <8F67673F-E71E-46A1-BD7C-6465C4D13398@sbc.su.se> Sweet! Thanks, Mark. Dave From cjfields at illinois.edu Sat Dec 19 17:44:46 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 19 Dec 2009 16:44:46 -0600 Subject: [Bioperl-l] NCBI BlastPlus wrapper for your enjoyment In-Reply-To: References: Message-ID: <3DC558C9-DD64-45F9-8A6F-EA4238D22EA5@illinois.edu> Very nice! We'll definitely give it a try here (along with the requisite feedback, of course). chris On Dec 19, 2009, at 1:52 PM, Mark A. Jensen wrote: > Hi All, > > Your full-service BLAST wrapper, Bio::Tools::Run::StandAloneBlastPlus, > is at beta in the bioperl-run trunk. It wraps all the programs of the > NCBI's new blast+-2.2.22 suite > ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ > and integrates them, allowing you to create, mask, and query > databases from within a single factory object. See the HOWTO > http://www.bioperl.org/wiki/HOWTO:BlastPlus > for the usual usage and implementation details. > > Happy coding-- > MAJ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Sat Dec 19 23:59:38 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 19 Dec 2009 22:59:38 -0600 Subject: [Bioperl-l] LocatableSeq NSE format - suggested changes In-Reply-To: <6723123C0ABD447190639AE1F5D1A6A7@NewLife> References: <579AEE4E-A79C-496B-881C-226616BF0A2B@illinois.edu> <6723123C0ABD447190639AE1F5D1A6A7@NewLife> Message-ID: <97DC7C2B-2433-4B8D-A16C-DF0507A29B22@illinois.edu> I think option 1 is cleaner as well; very easily added, so committed to main trunk as I consider this a bug, as one can potentially lose strand information when round-tripping data (original data with a -1 strand would be converted to +1). I'll work out the test fails on trunk along the way (ensure they're due to erroneous test data and not something else). chris On Dec 16, 2009, at 6:51 AM, Mark A. Jensen wrote: > I'm with Dave; option 1 is cleaner. The only problem might be the automatic interpretation of older output as always plus strand, but presumably these would have had to record the strandedness explicitly elsewhere, so they would be updatable. I'm definitely for making strandedness part of the spec in some way. cheers MAJ > ----- Original Message ----- From: "Chris Fields" > To: "BioPerl List" > Sent: Monday, December 14, 2009 8:23 PM > Subject: [Bioperl-l] LocatableSeq NSE format - suggested changes > > >> All, >> >> The current output for NSE format (Name/Start-End) via Bio::LocatableSeq::get_nse() currently doesn't allow for strandedness. I have seen two variations of NSE that incorporate strandedness: >> >> 1) Stockholm Rfam reverses start and end if the strand == -1 >> >> chrY/598-1 >> >> 2) Sheldon McKay's Gbrowse_syn uses Name(strand)/start-end >> >> rice-3(+)/16598648-16600199 >> >> The former breaks fewer things within BioPerl, but the latter seems more explicit. Any preferences? Do we want a new method that creates this, and deprecate out simple non-stranded NSE? >> >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From e.osimo at gmail.com Sun Dec 20 13:19:37 2009 From: e.osimo at gmail.com (Emanuele Osimo) Date: Sun, 20 Dec 2009 19:19:37 +0100 Subject: [Bioperl-l] Bio::Graphics and different Glyph sizes Message-ID: <2ac05d0f0912201019w278c1101q534749dd453fa1d1@mail.gmail.com> Hello everyone, I have a very particular problem: I'd like to draw in a single track different SNPs with a glyph that allows me to see graphically their importance. For example, if I have 10 SNPs 1 to 10 in importance, I'd like to have the first depicted small, and the last one big, with the ones in between with according sizes. I'd be satisfied also with a color gradient. What I cannot do is to set the option -height , for example, instead than in the add_track section, in the Bio::SeqFeature::Generic->new that I use for each of my objects. If I set it in the add_track section, all the glyphs are then of the same size (or color). If, otherwise, I add a different track for each object, my picture becomes too big. Please, help! Thanks Emanuele From ajmackey at gmail.com Sun Dec 20 13:41:14 2009 From: ajmackey at gmail.com (Aaron Mackey) Date: Sun, 20 Dec 2009 13:41:14 -0500 Subject: [Bioperl-l] Bio::Graphics and different Glyph sizes In-Reply-To: <2ac05d0f0912201019w278c1101q534749dd453fa1d1@mail.gmail.com> References: <2ac05d0f0912201019w278c1101q534749dd453fa1d1@mail.gmail.com> Message-ID: <24c96eca0912201041i37c32845k9e261414588b9bf4@mail.gmail.com> You can set the height as a callback sub, rather than a constant -- the callback will get passed the feature about to be drawn, from which you can calculate the "importance", and return the desired height, dynamically. -Aaron On Sun, Dec 20, 2009 at 1:19 PM, Emanuele Osimo wrote: > Hello everyone, > I have a very particular problem: I'd like to draw in a single track > different SNPs with a glyph that allows me to see graphically their > importance. > For example, if I have 10 SNPs 1 to 10 in importance, I'd like to have the > first depicted small, and the last one big, with the ones in between with > according sizes. > I'd be satisfied also with a color gradient. > What I cannot do is to set the option -height , for example, instead than > in > the add_track section, in the Bio::SeqFeature::Generic->new that I use for > each of my objects. > If I set it in the add_track section, all the glyphs are then of the same > size (or color). > If, otherwise, I add a different track for each object, my picture becomes > too big. > > Please, help! > Thanks > Emanuele > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From robfsouza at gmail.com Sat Dec 19 06:06:16 2009 From: robfsouza at gmail.com (Robson Francisco de Souza) Date: Sat, 19 Dec 2009 06:06:16 -0500 Subject: [Bioperl-l] Fwd: blast.pm patch In-Reply-To: <320fb6e00912181500r53c93284yc526ce654ca9050@mail.gmail.com> References: <320fb6e00912181500r53c93284yc526ce654ca9050@mail.gmail.com> Message-ID: Hi Peter, I just upload my example. I also reported this bug to the NCBI developers and I hope they can fix it, since it is easy to reproduce. I just forgot to mention the blastpgp version: 2.2.18 Best, Robson On Fri, Dec 18, 2009 at 6:00 PM, Peter wrote: > On Fri, Dec 18, 2009 at 6:07 PM, Robson Francisco de Souza > wrote: >> Hi, >> >> I've been dealing with an apparent bug in the output of NCBI's BLAST >> programs (blastall, blastpgp) which sometimes produces output like the >> one below. >> I think I've managed to produce a work around for Bioperl blast.pm >> parser and would like to contribute it to Bioperl. >> The fix is based on blast.pm from the CVS tree (downloaded some months >> ago...) and is attached to this message. >> Best, >> Robson > > Do you have a complete example of this kind of funny output? > This problem has also been reported with blastpgp for the > Biopython parser. I'd love an example for our unit tests > (probably worth doing in BioPerl too). Could you upload a > test case here?: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2927 > > Thanks! > > Peter @ Biopython > From biopython at maubp.freeserve.co.uk Mon Dec 21 10:27:47 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 21 Dec 2009 15:27:47 +0000 Subject: [Bioperl-l] Fwd: blast.pm patch In-Reply-To: References: <320fb6e00912181500r53c93284yc526ce654ca9050@mail.gmail.com> Message-ID: <320fb6e00912210727m522d2039if78891ab32fe0983@mail.gmail.com> On Sat, Dec 19, 2009 at 11:06 AM, Robson Francisco de Souza wrote: > > Hi Peter, > > I just upload my example. I also reported this bug to the NCBI > developers and I hope they can fix it, since it is easy to reproduce. > I just forgot to mention the blastpgp version: 2.2.18 > Best, > Robson Hi again Robson, Having a reproducible example to investigate this issue is incredibly helpful - thank you! I've been looking at the output, and while I can make sense of it "by hand", it would be very tricky to try and parse as a special case. It really does look like a bug in BLAST to me. The alignment includes an initial pair, a leading gap in the query (with a coordinate of zero), plus a residue from the match sequence (with a sensible coordinate). The alignment statistics include this (extra) pair in the alignment length. You said you were using blastpgp version 2.2.18, so I tried this with the latest (final?) version of the "legacy" BLAST suite, blastpgp 2.2.22, which I already had installed. It looks like my copy of NR is more recent (bigger), but the same odd output was produced: blastpgp -d nr -i Ngru1000013938.fa -o Ngru1000013938.fa.br -a 8 -j 1 -b 10000 I also tried what I think would be the equivalent command line on the new BLAST+ suite, using psiblast 2.2.22+ like this: psiblast -db nr -query Ngru1000013938.fa -out Ngru1000013938.fa.blast -num_threads 8 -parse_deflines -num_alignments 10000 This was much faster, and seems to output sensible alignments. I might therefore expect the NCBI so say "yes, this is a bug in the old blastpgp tool, just use the new psiblast tool instead". However, fingers crossed they will do another maintenance release of the "legacy" BLAST suite and fix this in blastpgp. Have you had any reply from the NCBI? Admittedly it is almost Christmas/New Year so we may not expect an answer until Jan. Peter From maj at fortinbras.us Mon Dec 21 13:52:01 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 21 Dec 2009 13:52:01 -0500 Subject: [Bioperl-l] test fail Message-ID: <5614E9FF133A47A694EF892D38A1717A@NewLife> fyi, getting following failure (Perl 5.10, GNU/Linux x86_64) t/SeqTools/SeqUtils..........................NOK 46/51# Failed test at t/SeqTools/SeqUtils.t line 275. # got: '1..4' # expected: 'complement(5..8)' t/SeqTools/SeqUtils..........................NOK 47/51# Failed test at t/SeqTools/SeqUtils.t line 276. # got: 'complement(5..8)' # expected: '1..4' # Looks like you failed 2 tests of 51. MAJ From cjfields at illinois.edu Mon Dec 21 14:20:32 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 21 Dec 2009 13:20:32 -0600 Subject: [Bioperl-l] test fail In-Reply-To: <5614E9FF133A47A694EF892D38A1717A@NewLife> References: <5614E9FF133A47A694EF892D38A1717A@NewLife> Message-ID: Saw that from the other day (LocatableSeq commit). I'll check it out. chris On Dec 21, 2009, at 12:52 PM, Mark A. Jensen wrote: > fyi, getting following failure (Perl 5.10, GNU/Linux x86_64) > > t/SeqTools/SeqUtils..........................NOK 46/51# Failed test at t/SeqTools/SeqUtils.t line 275. > # got: '1..4' > # expected: 'complement(5..8)' > > t/SeqTools/SeqUtils..........................NOK 47/51# Failed test at t/SeqTools/SeqUtils.t line 276. > # got: 'complement(5..8)' > # expected: '1..4' > # Looks like you failed 2 tests of 51. > > MAJ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From scott at scottcain.net Mon Dec 21 15:02:20 2009 From: scott at scottcain.net (Scott Cain) Date: Mon, 21 Dec 2009 15:02:20 -0500 Subject: [Bioperl-l] Bio::Graphics documentation Message-ID: <4536f7700912211202j4de81bb4k1e9039ed19b4ef97@mail.gmail.com> Hi All, Today it was pointed out to me that the Bio::Graphics documentation links on the BioPerl wiki are broken, no doubt because Bio::Graphics is no longer part of bioperl-core (is that how it should be referred to?). Anyway, the question is: what is the right way to rectify this problem? Since other things may get broken out in the future, I suppose we should get some sort of standard established. Can a release of Bio::Graphics be placed somewhere on the BioPerl wiki server to be processed? Thanks, Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From cjfields at illinois.edu Mon Dec 21 15:22:39 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 21 Dec 2009 14:22:39 -0600 Subject: [Bioperl-l] Bio::Graphics documentation In-Reply-To: <4536f7700912211202j4de81bb4k1e9039ed19b4ef97@mail.gmail.com> References: <4536f7700912211202j4de81bb4k1e9039ed19b4ef97@mail.gmail.com> Message-ID: <6FC2F08B-E902-449A-9E67-D1417A0BE20C@illinois.edu> We can come up with some standard wiki template for those modules no longer in svn, maybe with just CPAN links. Shouldn't be too hard to do. chris On Dec 21, 2009, at 2:02 PM, Scott Cain wrote: > Hi All, > > Today it was pointed out to me that the Bio::Graphics documentation > links on the BioPerl wiki are broken, no doubt because Bio::Graphics > is no longer part of bioperl-core (is that how it should be referred > to?). Anyway, the question is: what is the right way to rectify this > problem? Since other things may get broken out in the future, I > suppose we should get some sort of standard established. Can a > release of Bio::Graphics be placed somewhere on the BioPerl wiki > server to be processed? > > Thanks, > Scott > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Dec 21 16:12:45 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 21 Dec 2009 15:12:45 -0600 Subject: [Bioperl-l] test fail In-Reply-To: References: <5614E9FF133A47A694EF892D38A1717A@NewLife> Message-ID: T'was a bad test call. I basically changed the test to pull each feature directly by the primary tag, check it against the original sf prior to revcom, then check that the location was revcomp'ed correctly. chris On Dec 21, 2009, at 1:20 PM, Chris Fields wrote: > Saw that from the other day (LocatableSeq commit). I'll check it out. > > chris > > On Dec 21, 2009, at 12:52 PM, Mark A. Jensen wrote: > >> fyi, getting following failure (Perl 5.10, GNU/Linux x86_64) >> >> t/SeqTools/SeqUtils..........................NOK 46/51# Failed test at t/SeqTools/SeqUtils.t line 275. >> # got: '1..4' >> # expected: 'complement(5..8)' >> >> t/SeqTools/SeqUtils..........................NOK 47/51# Failed test at t/SeqTools/SeqUtils.t line 276. >> # got: 'complement(5..8)' >> # expected: '1..4' >> # Looks like you failed 2 tests of 51. >> >> MAJ >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Mon Dec 21 16:27:25 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 21 Dec 2009 16:27:25 -0500 Subject: [Bioperl-l] Bio::Graphics documentation In-Reply-To: <6FC2F08B-E902-449A-9E67-D1417A0BE20C@illinois.edu> References: <4536f7700912211202j4de81bb4k1e9039ed19b4ef97@mail.gmail.com> <6FC2F08B-E902-449A-9E67-D1417A0BE20C@illinois.edu> Message-ID: <1F54D94CE87E4238BC2C6128002FBC6A@NewLife> I've modified Template:Doclink ; if you now do {{Doclink|Bio::Graphics|cpan}} you'll get a page with only the cpan link. {{Doclink|Bio::SeqIO}} etc. works as usual. MAJ ----- Original Message ----- From: "Chris Fields" To: "Scott Cain" Cc: "BioPerl List" Sent: Monday, December 21, 2009 3:22 PM Subject: Re: [Bioperl-l] Bio::Graphics documentation > We can come up with some standard wiki template for those modules no longer in > svn, maybe with just CPAN links. Shouldn't be too hard to do. > > chris > > On Dec 21, 2009, at 2:02 PM, Scott Cain wrote: > >> Hi All, >> >> Today it was pointed out to me that the Bio::Graphics documentation >> links on the BioPerl wiki are broken, no doubt because Bio::Graphics >> is no longer part of bioperl-core (is that how it should be referred >> to?). Anyway, the question is: what is the right way to rectify this >> problem? Since other things may get broken out in the future, I >> suppose we should get some sort of standard established. Can a >> release of Bio::Graphics be placed somewhere on the BioPerl wiki >> server to be processed? >> >> Thanks, >> Scott >> >> >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D. scott at scottcain dot >> net >> GMOD Coordinator (http://gmod.org/) 216-392-3087 >> Ontario Institute for Cancer Research >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Mon Dec 21 16:34:40 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 21 Dec 2009 16:34:40 -0500 Subject: [Bioperl-l] Bio::Graphics documentation In-Reply-To: <6FC2F08B-E902-449A-9E67-D1417A0BE20C@illinois.edu> References: <4536f7700912211202j4de81bb4k1e9039ed19b4ef97@mail.gmail.com> <6FC2F08B-E902-449A-9E67-D1417A0BE20C@illinois.edu> Message-ID: <5081DC24D9AE46FF95075559898B2574@NewLife> Also, applied the new Doclink to Bio::Graphics on wiki. ----- Original Message ----- From: "Chris Fields" To: "Scott Cain" Cc: "BioPerl List" Sent: Monday, December 21, 2009 3:22 PM Subject: Re: [Bioperl-l] Bio::Graphics documentation > We can come up with some standard wiki template for those modules no longer in > svn, maybe with just CPAN links. Shouldn't be too hard to do. > > chris > > On Dec 21, 2009, at 2:02 PM, Scott Cain wrote: > >> Hi All, >> >> Today it was pointed out to me that the Bio::Graphics documentation >> links on the BioPerl wiki are broken, no doubt because Bio::Graphics >> is no longer part of bioperl-core (is that how it should be referred >> to?). Anyway, the question is: what is the right way to rectify this >> problem? Since other things may get broken out in the future, I >> suppose we should get some sort of standard established. Can a >> release of Bio::Graphics be placed somewhere on the BioPerl wiki >> server to be processed? >> >> Thanks, >> Scott >> >> >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D. scott at scottcain dot >> net >> GMOD Coordinator (http://gmod.org/) 216-392-3087 >> Ontario Institute for Cancer Research >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Mon Dec 21 21:51:32 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 21 Dec 2009 21:51:32 -0500 Subject: [Bioperl-l] pdb.pm and annotations In-Reply-To: <2dade3480912160955h4f77277dv8e6b47b7b0fda23a@mail.gmail.com> References: <2dade3480912160955h4f77277dv8e6b47b7b0fda23a@mail.gmail.com> Message-ID: <6292EDA0F05B48578AF7B7E5864C8707@NewLife> Hi Sung-- We didn't plan it, but we added it anyway: see revision 16559 of bioperl-live/trunk. You can then do $pmid = ($struct->annotation->get_Annotations('reference'))[0]->pubmed; and even $doi = ($struct->annotation->get_Annotations('reference'))[0]->doi; Thanks for the heads-up! cheers, MAJ ----- Original Message ----- From: "Sungsam Gong" To: Sent: Wednesday, December 16, 2009 12:55 PM Subject: [Bioperl-l] pdb.pm and annotations > Hi, > > Wanted to get pubmed identifier from a PDB file using Bio::Structure, > so hacked the code. > Knew that Bio::Structure::IO::pdb.pm get relevant info from either > 'JRNL' or 'REMARK 1'. > However could not see any actual code parsing 'PMID'. > >>From pdb.pm, what I see: > > sub _read_PDB_jrnl { > ... > $auth = $self->_concatenate_lines($auth,$rol) if ($subr eq "AUTH"); > $titl = $self->_concatenate_lines($titl,$rol) if ($subr eq "TITL"); > $edit = $self->_concatenate_lines($edit,$rol) if ($subr eq "EDIT"); > $ref = $self->_concatenate_lines($ref ,$rol) if ($subr eq "REF"); > $publ = $self->_concatenate_lines($publ,$rol) if ($subr eq "PUBL"); > $refn = $self->_concatenate_lines($refn,$rol) if ($subr eq "REFN"); > ... > } > > sub _read_PDB_remark_1 { > ... > $auth = $self->_concatenate_lines($auth,$rol) if > ($subr eq "AUTH"); > $titl = $self->_concatenate_lines($titl,$rol) if > ($subr eq "TITL"); > $edit = $self->_concatenate_lines($edit,$rol) if > ($subr eq "EDIT"); > $ref = $self->_concatenate_lines($ref ,$rol) if > ($subr eq "REF"); > $publ = $self->_concatenate_lines($publ,$rol) if > ($subr eq "PUBL"); > $refn = $self->_concatenate_lines($refn,$rol) if > ($subr eq "REFN"); > ... > } > >>From my script, I did: > > ($struc->annotation->get_Annotations('reference'))[0]->authors > ($struc->annotation->get_Annotations('reference'))[0]->title > > or > > my $hash_ref=($struc->annotation->get_Annotations('reference'))[0]->hash_tree > for my $key (keys %{$hash_ref}) { > print $key,": ",$hash_ref->{$key},"\n"; > } > > Any plan to include a code chopping 'PMID' out? > Or did I miss something? > > Cheers, > Sung > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From dan.kortschak at adelaide.edu.au Mon Dec 21 22:24:04 2009 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Tue, 22 Dec 2009 13:54:04 +1030 Subject: [Bioperl-l] call for help and comments on module Message-ID: <1261452244.9520.86.camel@zoidberg.mbs.adelaide.edu.au> Hi, I've been working on a Bio::Tools::Run module to handle the bowtie rapid alignment tool (and associated tools): Bio::Tools::Run::Bowtie (in bioperl-run tree). I have 90% of what I want included in the module and would like some advice from more experienced bioperlers. Feedback on approach is also welcomed (this is my first significant wrapper, and after a long gap from writing module, so I am rusty). The module has ended up being significantly more complicated than I had hoped. There are a few issues I'm having, so I apologise for the list: 1. Informal tests run correctly (outside the t/ tree and Test harness), but formal Test harness tests fail for reasons I cannot understand. (The module is still lacking a lot of tests, but since things were failing in the harness I have placed them as a lower priority and have been working to my micro-script tests - yes, bad form. 2. I am having a big problem with IPC::Run for one of the executables (the module can call 5 different excutables for 7 commands), bowtie-maptool (module command 'map'). All the other commands tested (this excludes bowtie-maqconvert [convert command]) work fine, but maptool fails with an illegal seek - presumably due to the redirection handling? I have no idea how to resolve this, so help would be greatly appreciated (a small script that demonstrates the use that results in the failure is below). There will be provision for returning a Bio::Assembly::IO object through samtools in the finished module, but currently the Bio::Assembly::IO::sam builder doesn't like what bowtie can provide. Thanks for any help. Dan #!/usr/bin/perl use strict; use warnings; use Bio::Tools::Run::Bowtie; # These files are in the bioperl-run t/data/ tree my $rdq = '/usr/local/src/bioperl-run/t/data/bowtie/reads/e_coli_1000.fq'; my $refseq = '/usr/local/src/bioperl-run/t/data/bowtie/indexes/e_coli'; my $bowtiefac = Bio::Tools::Run::Bowtie->new( -command => 'single', -max_seed_mismatches => 2, -seed_length => 28, -max_qual_mismatch => 70, -sam_format => 0 ); my $align = $bowtiefac->run($rdq,$refseq); # this runs fine my $bowtiemap = Bio::Tools::Run::Bowtie->new( -command => 'map' ); my $map = $bowtiemap->run($align); # throws Illegal seek print "$map\n"; open (IN,$map); my $lines =(my @lines)= ; print @lines; print "\n\n$lines\n"; close IN; From maj at fortinbras.us Tue Dec 22 00:19:35 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 22 Dec 2009 00:19:35 -0500 Subject: [Bioperl-l] call for help and comments on module In-Reply-To: <1261452244.9520.86.camel@zoidberg.mbs.adelaide.edu.au> References: <1261452244.9520.86.camel@zoidberg.mbs.adelaide.edu.au> Message-ID: Hey Dan, It looks like if the outfile isn't specified on the commandline for maptool, then the align is written to stdout. So, you could try this workaround in in Bowtie/Config.pm: our %command_files = ( 'single' => [qw( ind seq #out )], 'paired' => [qw( ind seq seq2 #out )], 'crossbow' => [qw( ind seq #out )], 'build' => [qw( ref out )], 'inspect' => [qw( ind >#out )], 'convert' => [qw( bwt out bfa )], - 'map' => [qw( bwt #out )] + 'map' => [qw( bwt >#out )] ); which should be transparent to the user. If this works, then there is probably something funky going on with IPC::Run + maptool; if it doesn't, then the funkiness is prob. in my code. I notice, however, that both bowtie-maptool and bowtie-maqconvert have been removed from the 0.12.0-beta release (http://bowtie-bio.sourceforge.net/index.shtml)... cheers MAJ ----- Original Message ----- From: "Dan Kortschak" To: Sent: Monday, December 21, 2009 10:24 PM Subject: [Bioperl-l] call for help and comments on module > Hi, > > I've been working on a Bio::Tools::Run module to handle the bowtie rapid > alignment tool (and associated tools): Bio::Tools::Run::Bowtie (in > bioperl-run tree). > > I have 90% of what I want included in the module and would like some > advice from more experienced bioperlers. Feedback on approach is also > welcomed (this is my first significant wrapper, and after a long gap > from writing module, so I am rusty). The module has ended up being > significantly more complicated than I had hoped. > > There are a few issues I'm having, so I apologise for the list: > > 1. Informal tests run correctly (outside the t/ tree and Test > harness), but formal Test harness tests fail for reasons I > cannot understand. (The module is still lacking a lot of tests, > but since things were failing in the harness I have placed them > as a lower priority and have been working to my micro-script > tests - yes, bad form. > 2. I am having a big problem with IPC::Run for one of the > executables (the module can call 5 different excutables for 7 > commands), bowtie-maptool (module command 'map'). All the other > commands tested (this excludes bowtie-maqconvert [convert > command]) work fine, but maptool fails with an illegal seek - > presumably due to the redirection handling? I have no idea how > to resolve this, so help would be greatly appreciated (a small > script that demonstrates the use that results in the failure is > below). > > There will be provision for returning a Bio::Assembly::IO object through > samtools in the finished module, but currently the > Bio::Assembly::IO::sam builder doesn't like what bowtie can provide. > > Thanks for any help. > Dan > > > #!/usr/bin/perl > > use strict; > use warnings; > > use Bio::Tools::Run::Bowtie; > > # These files are in the bioperl-run t/data/ tree > my $rdq = '/usr/local/src/bioperl-run/t/data/bowtie/reads/e_coli_1000.fq'; > my $refseq = '/usr/local/src/bioperl-run/t/data/bowtie/indexes/e_coli'; > > my $bowtiefac = Bio::Tools::Run::Bowtie->new( > -command => 'single', > -max_seed_mismatches => 2, > -seed_length => 28, > -max_qual_mismatch => 70, > -sam_format => 0 > ); > > my $align = $bowtiefac->run($rdq,$refseq); # this runs fine > > my $bowtiemap = Bio::Tools::Run::Bowtie->new( > -command => 'map' > ); > > my $map = $bowtiemap->run($align); # throws Illegal seek > > print "$map\n"; > > open (IN,$map); > my $lines =(my @lines)= ; > print @lines; > print "\n\n$lines\n"; > close IN; > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From dan.kortschak at adelaide.edu.au Tue Dec 22 00:51:30 2009 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Tue, 22 Dec 2009 16:21:30 +1030 Subject: [Bioperl-l] call for help and comments on module In-Reply-To: References: <1261452244.9520.86.camel@zoidberg.mbs.adelaide.edu.au> Message-ID: <1261461090.4411.13.camel@epistle> Hi Mark, maptool either outputs to stdout or a specified file - I chose to use a specified file and run it that way, but I've tried the redirect a you suggest, with the same failure result. I think it's a strangeness of maptool (which may well be a reason for it being dropped - also note the maptool output doesn't seem reasonable for the test data provided even when run from the command line). It's probably a result of difficult interaction between IPC::Run and maptool. Any funkiness in your code is not likely to be a cause as I've deeply analysed what is being passed to IPC::Run, and I've quite extensively modified the IPC run handling method from your code to take into account the differences between a single executable with many commands as the base code managed from a cluster of executables each taking a small subset of different filespecs as bowtie needs. My funkiness will undoubtedly swamp yours. Resolution: Will drop bowtie-maptool from module. (Should test maqconvert - if it fails, this will be dropped also unless someone asks otherwise). When the module copes with 0.11.* properly I'll start thinking about 0.12.* which has colourspace handling to deal with. cheers Dan On Tue, 2009-12-22 at 00:19 -0500, Mark A. Jensen wrote: > Hey Dan, > It looks like if the outfile isn't specified on the commandline for > maptool, then the align is written to stdout. So, you could > try this workaround in in Bowtie/Config.pm: > > our %command_files = ( > 'single' => [qw( ind seq #out )], > 'paired' => [qw( ind seq seq2 #out )], > 'crossbow' => [qw( ind seq #out )], > 'build' => [qw( ref out )], > 'inspect' => [qw( ind >#out )], > 'convert' => [qw( bwt out bfa )], > - 'map' => [qw( bwt #out )] > + 'map' => [qw( bwt >#out )] > ); > > which should be transparent to the user. If this works, then > there is probably something funky going on with IPC::Run > + maptool; if it doesn't, then the funkiness is prob. in my code. > > I notice, however, that both bowtie-maptool and bowtie-maqconvert > have been removed from the 0.12.0-beta release > (http://bowtie-bio.sourceforge.net/index.shtml)... > > cheers MAJ From lovebaby39 at gmail.com Wed Dec 23 05:48:55 2009 From: lovebaby39 at gmail.com (Hsueh) Date: Wed, 23 Dec 2009 18:48:55 +0800 Subject: [Bioperl-l] About bioperl issue: get string In-Reply-To: <15F92119-7625-4491-899A-0D49CE1BC861@sbc.su.se> References: <5F281DC3E4514B3AAA8881169B240227@SHAPC> <107080B6-BC05-470C-B426-5DB69BD574C1@sbc.su.se> <9DEC7152C11A4F00B2F919B653E6D572@SHAPC> <15F92119-7625-4491-899A-0D49CE1BC861@sbc.su.se> Message-ID: <52CDD8F61DDC48B9BBADD020EF18E9E0@SHAPC> Dear all I use "$hit_u->name" to get "gnl|uv|Z46234.1:664-3444", but I don't know how to get "P.pastoris DNA for pPIC9K expression vector". while (my $result_u = $blast_report_u-> next_result ) { while (my $hit_u = $result_u->next_hit()){ while (my $hsp_u = $hit_u->next_hsp()){ $hit_u->name; $hsp_u->evalue; $hsp_u->score; } } } I will appreciate if you could tell me how to do it. P.S. How can I download the BioPerl's Manual? (BioPerl's Manual download link?) The flow is BLAST result: ------------------------------------------------------------------------------------------------------------------------------------- BLASTN 2.2.16 [Mar-25-2007] Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. Query= (458 letters) Database: UniVec (build 4.0) 2416 sequences; 597,480 total letters Searching..................................................done Score E Sequences producing significant alignments: (bits) Value gnl|uv|Z46234.1:664-3444 P.pastoris DNA for pPIC9K expression ve... 26 3.1 gnl|uv|U89673.1:863-1946 Cloning vector pIRES1neo 26 3.1 gnl|uv|U13843.1:1887-9923 pBPV cloning vector 26 3.1 >gnl|uv|Z46234.1:664-3444 P.pastoris DNA for pPIC9K expression vector Length = 2781 Score = 26.3 bits (13), Expect = 3.1 Identities = 13/13 (100%) Strand = Plus / Plus Query: 352 tactaccgccatt 364 ||||||||||||| Sbjct: 2209 tactaccgccatt 2221 ------------------------------------------------------------------------------------------------------------------------------------- Reginald Hsueh From hrh at fmi.ch Wed Dec 23 10:14:06 2009 From: hrh at fmi.ch (Hotz, Hans-Rudolf) Date: Wed, 23 Dec 2009 16:14:06 +0100 Subject: [Bioperl-l] About bioperl issue: get string In-Reply-To: <52CDD8F61DDC48B9BBADD020EF18E9E0@SHAPC> Message-ID: Hi Assuming you are using "SearchIO", try: $hit_u->description for more details see: http://www.bioperl.org/wiki/HOWTO:SearchIO Regards, Hans On 12/23/09 11:48 AM, "Hsueh" wrote: > Dear all > > I use "$hit_u->name" to get "gnl|uv|Z46234.1:664-3444", but I don't know how > to get "P.pastoris DNA for pPIC9K expression vector". > > while (my $result_u = $blast_report_u-> next_result ) { > while (my $hit_u = $result_u->next_hit()){ > while (my $hsp_u = $hit_u->next_hsp()){ > $hit_u->name; > $hsp_u->evalue; > $hsp_u->score; > } > } > } > > I will appreciate if you could tell me how to do it. > > P.S. How can I download the BioPerl's Manual? (BioPerl's Manual download > link?) > > > > The flow is BLAST result: > ------------------------------------------------------------------------------ > ------------------------------------------------------- > BLASTN 2.2.16 [Mar-25-2007] > Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, > Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), > "Gapped BLAST and PSI-BLAST: a new generation of protein database search > programs", Nucleic Acids Res. 25:3389-3402. > Query= > (458 letters) > > Database: UniVec (build 4.0) > 2416 sequences; 597,480 total letters > Searching..................................................done > > Score E > Sequences producing significant alignments: > (bits) Value > > gnl|uv|Z46234.1:664-3444 P.pastoris DNA for pPIC9K expression ve... > 26 3.1 > gnl|uv|U89673.1:863-1946 Cloning vector pIRES1neo > 26 3.1 > gnl|uv|U13843.1:1887-9923 pBPV cloning vector > 26 3.1 > >> gnl|uv|Z46234.1:664-3444 P.pastoris DNA for pPIC9K expression vector > Length = 2781 > > Score = 26.3 bits (13), Expect = 3.1 > Identities = 13/13 (100%) > Strand = Plus / Plus > > Query: 352 tactaccgccatt 364 > ||||||||||||| > Sbjct: 2209 tactaccgccatt 2221 > ------------------------------------------------------------------------------ > ------------------------------------------------------- > > Reginald Hsueh > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From pkuonline at gmail.com Wed Dec 23 13:36:49 2009 From: pkuonline at gmail.com (pkuonline) Date: Wed, 23 Dec 2009 12:36:49 -0600 Subject: [Bioperl-l] PAML parser failed for PAML 4.3b and 4.1 Message-ID: <200912231236490784820@gmail.com> Hi Everyone, I used the latest Bioperl build, http://www.bioperl.org/DIST/nightly_builds/bioperl-live.tar.gz and tried to parse CODEML result. I searched the mail list and found current PAML parser is compatible with PAML 4.3a, http://lists.open-bio.org/pipermail/bioperl-l/2009-November/031602.html. However, recently, Ziheng Yang updates his PAML to 4.3b. I found the parser does not work. More strangely, I tested it on the old PAML 4.1 result and also failed. I attached my CODEML outputs here to see whether you guys have some idea. Many thanks ahead! Best regards, ------------------------------------------------------------- Yong Zhang Ph.D, Research Scholar Manyuan Long's Lab University of Chicago -------------- next part -------------- A non-text attachment was scrubbed... Name: rst4.1 Type: application/octet-stream Size: 60616 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: mlc4.1 Type: application/octet-stream Size: 11635 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: mlc4.3b Type: application/octet-stream Size: 11330 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: rst4.3b Type: application/octet-stream Size: 60616 bytes Desc: not available URL: From cjfields at illinois.edu Wed Dec 23 16:19:48 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 23 Dec 2009 15:19:48 -0600 Subject: [Bioperl-l] PAML parser failed for PAML 4.3b and 4.1 In-Reply-To: <200912231236490784820@gmail.com> References: <200912231236490784820@gmail.com> Message-ID: Well, not completely unexpected, but very frustrating nonetheless. Changes to PAML output have broken in just about every PAML parser revision. Not sure when this will be addressed unfortunately, my hope is sooner than later. Can you file a bioperl bug report for this? It's the best place to keep track. http://bugzilla.open-bio.org/ chris On Dec 23, 2009, at 12:36 PM, pkuonline wrote: > Hi Everyone, > > I used the latest Bioperl build, http://www.bioperl.org/DIST/nightly_builds/bioperl-live.tar.gz and tried to parse CODEML result. I searched the mail list and found current PAML parser is compatible with PAML 4.3a, http://lists.open-bio.org/pipermail/bioperl-l/2009-November/031602.html. However, recently, Ziheng Yang updates his PAML to 4.3b. I found the parser does not work. More strangely, I tested it on the old PAML 4.1 result and also failed. > > I attached my CODEML outputs here to see whether you guys have some idea. > > Many thanks ahead! > > Best regards, > ------------------------------------------------------------- > Yong Zhang > Ph.D, Research Scholar > Manyuan Long's Lab > University of Chicago_______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From pkuonline at gmail.com Wed Dec 23 17:45:54 2009 From: pkuonline at gmail.com (pkuonline) Date: Wed, 23 Dec 2009 16:45:54 -0600 Subject: [Bioperl-l] PAML parser failed for PAML 4.3b and 4.1 References: <200912231236490784820@gmail.com>, Message-ID: <200912231645536094087@gmail.com> Hi Chris, Thanks for your reply and I just submitted this bug to bugzilla. Have a nice holiday! ------------------------------------------------------------- Yong Zhang Ph.D, Research Scholar Manyuan Long's Lab University of Chicago >------------------------------------------------------------- >From: Chris Fields >Time: 2009-12-23 15:19:50 >To: pkuonline bioperl-l >Subject: Re: [Bioperl-l] PAML parser failed for PAML 4.3b and 4.1 >Well, not completely unexpected, but very frustrating nonetheless. Changes to PAML output have broken in just about every PAML parser revision. Not sure when this will be addressed unfortunately, my hope is sooner than later. > >Can you file a bioperl bug report for this? It's the best place to keep track. > >http://bugzilla.open-bio.org/ > >chris > >On Dec 23, 2009, at 12:36 PM, pkuonline wrote: > >> Hi Everyone, >> >> I used the latest Bioperl build, http://www.bioperl.org/DIST/nightly_builds/bioperl-live.tar.gz and tried to parse CODEML result. I searched the mail list and found current PAML parser is compatible with PAML 4.3a, http://lists.open-bio.org/pipermail/bioperl-l/2009-November/031602.html. However, recently, Ziheng Yang updates his PAML to 4.3b. I found the parser does not work. More strangely, I tested it on the old PAML 4.1 result and also failed. >> >> I attached my CODEML outputs here to see whether you guys have some idea. >> >> Many thanks ahead! >> >> Best regards, >> ------------------------------------------------------------- >> Yong Zhang >> Ph.D, Research Scholar >> Manyuan Long's Lab >> University of Chicago_______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From David.Messina at sbc.su.se Wed Dec 23 18:23:44 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 24 Dec 2009 00:23:44 +0100 Subject: [Bioperl-l] PAML parser failed for PAML 4.3b and 4.1 In-Reply-To: <200912231645536094087@gmail.com> References: <200912231236490784820@gmail.com>, <200912231645536094087@gmail.com> Message-ID: <08E748F4-1398-4543-AB77-0640441BC323@sbc.su.se> Hi Yong, Could you attach your codeml output to the bug report, too? I'll take a look at this as soon as I can. Dave From maj at fortinbras.us Thu Dec 24 00:47:10 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 24 Dec 2009 00:47:10 -0500 Subject: [Bioperl-l] PAML parser failed for PAML 4.3b and 4.1 In-Reply-To: <200912231645536094087@gmail.com> References: <200912231236490784820@gmail.com>, <200912231645536094087@gmail.com> Message-ID: <2DF45CDC2BE44A85ADCD865A98CD13D6@NewLife> Yong-- say 'ni hao' to Manyuan for me --- cheers MAJ ----- Original Message ----- From: "pkuonline" To: "Chris Fields" Cc: "bioperl-l" Sent: Wednesday, December 23, 2009 5:45 PM Subject: Re: [Bioperl-l] PAML parser failed for PAML 4.3b and 4.1 > Hi Chris, > > Thanks for your reply and I just submitted this bug to bugzilla. > > Have a nice holiday! > ------------------------------------------------------------- > Yong Zhang > Ph.D, Research Scholar > Manyuan Long's Lab > University of Chicago > >>------------------------------------------------------------- >>From: Chris Fields >>Time: 2009-12-23 15:19:50 >>To: pkuonline bioperl-l >>Subject: Re: [Bioperl-l] PAML parser failed for PAML 4.3b and 4.1 > >>Well, not completely unexpected, but very frustrating nonetheless. Changes to >>PAML output have broken in just about every PAML parser revision. Not sure >>when this will be addressed unfortunately, my hope is sooner than later. >> >>Can you file a bioperl bug report for this? It's the best place to keep >>track. >> >>http://bugzilla.open-bio.org/ >> >>chris >> >>On Dec 23, 2009, at 12:36 PM, pkuonline wrote: >> >>> Hi Everyone, >>> >>> I used the latest Bioperl build, >>> http://www.bioperl.org/DIST/nightly_builds/bioperl-live.tar.gz and tried to >>> parse CODEML result. I searched the mail list and found current PAML parser >>> is compatible with PAML 4.3a, >>> http://lists.open-bio.org/pipermail/bioperl-l/2009-November/031602.html. >>> However, recently, Ziheng Yang updates his PAML to 4.3b. I found the parser >>> does not work. More strangely, I tested it on the old PAML 4.1 result and >>> also failed. >>> >>> I attached my CODEML outputs here to see whether you guys have some idea. >>> >>> Many thanks ahead! >>> >>> Best regards, >>> ------------------------------------------------------------- >>> Yong Zhang >>> Ph.D, Research Scholar >>> Manyuan Long's Lab >>> University of >>> Chicago_______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > -------------------------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bhakti.dwivedi at gmail.com Fri Dec 25 21:46:51 2009 From: bhakti.dwivedi at gmail.com (Bhakti Dwivedi) Date: Fri, 25 Dec 2009 21:46:51 -0500 Subject: [Bioperl-l] how to retrieve organism name from accession number? Message-ID: Hi, Does anyone know how to retrieve the "Source" or the "Species name" given the accession number using Bioperl. I have these 30,000 accession numbers for which I need to get the source organisms. Any kind of help will be appreciated. Thanks BD From maj at fortinbras.us Fri Dec 25 22:52:10 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 25 Dec 2009 22:52:10 -0500 Subject: [Bioperl-l] how to retrieve organism name from accession number? In-Reply-To: References: Message-ID: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife> Bhakti, The following example (using EUtilities) may serve your purpose: use Bio::DB::EUtilities; my (%taxa, @taxa); my (%names, %idmap); # these are protein ids; nuc ids will work by changing -dbfrom => 'nucleotide', # (probably) my @ids = qw(1621261 89318838 68536103 20807972 730439); my $factory = Bio::DB::EUtilities->new(-eutil => 'elink', -db => 'taxonomy', -dbfrom => 'protein', -correspondence => 1, -id => \@ids); # iterate through the LinkSet objects while (my $ds = $factory->next_LinkSet) { $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0] } @taxa = @taxa{@ids}; $factory = Bio::DB::EUtilities->new(-eutil => 'esummary', -db => 'taxonomy', -id => \@taxa ); while (local $_ = $factory->next_DocSum) { $names{($_->get_contents_by_name('TaxId'))[0]} = ($_->get_contents_by_name('ScientificName'))[0]; } foreach (@ids) { $idmap{$_} = $names{$taxa{$_}}; } # %idmap is # 1621261 => 'Mycobacterium tuberculosis H37Rv' # 20807972 => 'Thermoanaerobacter tengcongensis MB4' # 68536103 => 'Corynebacterium jeikeium K411' # 730439 => 'Bacillus caldolyticus' # 89318838 => undef (this record has been removed from the db) 1; You probably will need to break up your 30000 into chunks (say, 1000-3000 each), and do the above on each chunk with a sleep 3; or so separating the queries. MAJ ----- Original Message ----- From: "Bhakti Dwivedi" To: Sent: Friday, December 25, 2009 9:46 PM Subject: [Bioperl-l] how to retrieve organism name from accession number? > Hi, > > Does anyone know how to retrieve the "Source" or the "Species name" given > the accession number using Bioperl. I have these 30,000 accession numbers > for which I need to get the source organisms. Any kind of help will be > appreciated. > > Thanks > > BD > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Sat Dec 26 06:47:29 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 26 Dec 2009 05:47:29 -0600 Subject: [Bioperl-l] how to retrieve organism name from accession number? In-Reply-To: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife> References: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife> Message-ID: On Dec 25, 2009, at 9:52 PM, Mark A. Jensen wrote: > Bhakti, > The following example (using EUtilities) may serve your purpose: > > use Bio::DB::EUtilities; > > ... > You probably will need to break up your 30000 into chunks > (say, 1000-3000 each), and do the above on each chunk with a > > sleep 3; > > or so separating the queries. > MAJ The 'sleep 3' is built-in and now (on main trunk) matches NCBI's current spec of 3 queries/sec. chris From arpm9 at charter.net Sun Dec 27 16:42:09 2009 From: arpm9 at charter.net (arpm9) Date: Sun, 27 Dec 2009 16:42:09 -0500 Subject: [Bioperl-l] Should Bio::Tools::BPlite be deprecated? In-Reply-To: 4533A8D3.90709@sendu.me.uk Message-ID: <867A36FEE0244EF2950108C42BD2BE58@paulb0d5af35b9> hi chris, I was trying to make sense of this backpacking lite and just simply wanted to view the light...and got nowhere and very frustrated...please help if you can...or whoever can...thanks Pm From pengyu.ut at gmail.com Tue Dec 29 11:08:09 2009 From: pengyu.ut at gmail.com (Peng Yu) Date: Tue, 29 Dec 2009 10:08:09 -0600 Subject: [Bioperl-l] Comparison between bioperl and biopython? Message-ID: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com> May I ask somebody who are versitile in both bioperl and biopython comment on the pros and cons of bioperl and biopython? I'm sending this email to both bioperl and biopython mailing lists. But I hope that it will not result in any contention. I assume that the functionality between bioperl or biopython is the same, i.e., tasks can be done in bioperl can be done biopython and vice versa, as both libraries have been out there over 10 years. Please correct me if my understanding is not true. Given that a task that can be done with either bioperl or biopython, I, in particularly, want to know how long it will take to write the code for the task in bioperl and biopython, with the same readability requirement (see below) and the assumption that users have the same fluency in perl and python. python is claimed to be good for maintainability. But perl is criticized for there-are-many-ways-for-a-given-task. Since there are multiple ways in perl, let us assume that we always use perl in a readable way. From jason at bioperl.org Tue Dec 29 11:49:20 2009 From: jason at bioperl.org (Jason Stajich) Date: Tue, 29 Dec 2009 08:49:20 -0800 Subject: [Bioperl-l] Comparison between bioperl and biopython? In-Reply-To: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com> References: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com> Message-ID: <2B85EF86-8A84-491B-8C33-7EC16CCB8CBC@bioperl.org> Are you asking for the purposes of choosing a toolkit for your work or just curious about the advantages/disadvantages of language choice? -jason On Dec 29, 2009, at 8:08 AM, Peng Yu wrote: > May I ask somebody who are versitile in both bioperl and biopython > comment on the pros and cons of bioperl and biopython? I'm sending > this email to both bioperl and biopython mailing lists. But I hope > that it will not result in any contention. > > I assume that the functionality between bioperl or biopython is the > same, i.e., tasks can be done in bioperl can be done biopython and > vice versa, as both libraries have been out there over 10 years. > Please correct me if my understanding is not true. > > Given that a task that can be done with either bioperl or biopython, > I, in particularly, want to know how long it will take to write the > code for the task in bioperl and biopython, with the same readability > requirement (see below) and the assumption that users have the same > fluency in perl and python. > > python is claimed to be good for maintainability. But perl is > criticized for there-are-many-ways-for-a-given-task. Since there are > multiple ways in perl, let us assume that we always use perl in a > readable way. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ From ak at ebi.ac.uk Tue Dec 29 11:57:18 2009 From: ak at ebi.ac.uk (Andreas =?iso-8859-1?B?S+Ro5HJp?=) Date: Tue, 29 Dec 2009 16:57:18 +0000 Subject: [Bioperl-l] Comparison between bioperl and biopython? In-Reply-To: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com> References: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com> Message-ID: <20091229165718.GB30356@quux.windows.ebi.ac.uk> On Tue, Dec 29, 2009 at 10:08:09AM -0600, Peng Yu wrote: > May I ask somebody who are versitile in both bioperl and biopython > comment on the pros and cons of bioperl and biopython? I'm sending > this email to both bioperl and biopython mailing lists. But I hope > that it will not result in any contention. > > I assume that the functionality between bioperl or biopython is the > same, i.e., tasks can be done in bioperl can be done biopython and > vice versa, as both libraries have been out there over 10 years. > Please correct me if my understanding is not true. > > Given that a task that can be done with either bioperl or biopython, > I, in particularly, want to know how long it will take to write the > code for the task in bioperl and biopython, with the same readability > requirement (see below) and the assumption that users have the same > fluency in perl and python. > > python is claimed to be good for maintainability. But perl is > criticized for there-are-many-ways-for-a-given-task. Since there are > multiple ways in perl, let us assume that we always use perl in a > readable way. Assuming, as you do, that the functionality of BioPerl and BioPython is the same: Which of the two programming languages are you (or your team) most proficient in? Use that language. Regards, Andreas -- Andreas K?h?ri, Ensembl Software Developer European Bioinformatics Institute (EMBL-EBI) Wellcome Trust Genome Campus, Hinxton Cambridge CB10 1SD, United Kingdom From sdavis2 at mail.nih.gov Tue Dec 29 12:03:40 2009 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Tue, 29 Dec 2009 12:03:40 -0500 Subject: [Bioperl-l] [Biopython] Comparison between bioperl and biopython? In-Reply-To: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com> References: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com> Message-ID: <264855a00912290903m213d7cc4l607e8fa0bad55571@mail.gmail.com> On Tue, Dec 29, 2009 at 11:08 AM, Peng Yu wrote: > May I ask somebody who are versitile in both bioperl and biopython > comment on the pros and cons of bioperl and biopython? I'm sending > this email to both bioperl and biopython mailing lists. But I hope > that it will not result in any contention. > > I assume that the functionality between bioperl or biopython is the > same, i.e., tasks can be done in bioperl can be done biopython and > vice versa, as both libraries have been out there over 10 years. > Please correct me if my understanding is not true. The two projects have similar goals, but saying that the functionality is the same would be an extreme oversimplification. You will need to define what you want to do and then check to see what the two projects have to offer. This will, in general, require perusing the websites for both projects as well as the relevant documentation. > Given that a task that can be done with either bioperl or biopython, > I, in particularly, want to know how long it will take to write the > code for the task in bioperl and biopython, with the same readability > requirement (see below) and the assumption that users have the same > fluency in perl and python. Again, you will want to define the task(s) to be accomplished and then weigh the pros and cons of each project combined with local expertise. If you don't know what you want to do, then you can certainly read some examples on the websites and see which project strikes you as a "winner" for you. > python is claimed to be good for maintainability. But perl is > criticized for there-are-many-ways-for-a-given-task. Since there are > multiple ways in perl, let us assume that we always use perl in a > readable way. These two statements are generalizations that provide little insight into the strengths or weaknesses of the languages. In other words, one can write good or bad code in both languages. Hope that helps. Sean From wenzhiwang1983 at yahoo.com.cn Tue Dec 29 13:30:02 2009 From: wenzhiwang1983 at yahoo.com.cn (WangWenzhi) Date: Wed, 30 Dec 2009 02:30:02 +0800 (CST) Subject: [Bioperl-l] Comparison between bioperl and biopython? In-Reply-To: <2B85EF86-8A84-491B-8C33-7EC16CCB8CBC@bioperl.org> Message-ID: <658770.25534.qm@web15204.mail.cnb.yahoo.com> Dear Jason, Plink is a very useful program in the population genetics, especially in the Genome-Wide SNP scan era. Is there any plan to add the Plink (ped or tped) format to Bio::PopGen::IO? Thanks. Wenzhi Wang State Key Laboratory of Genetic Resources and Evolution Kunming Institute of Zoology, Chinese Academy of Sciences Kunming, Yunnan 650223 P. R. China Tel: 86 871 5198 993 Fax: 86 871 5195 430 E-mail: wenzhiwang1983 at yahoo.com.cn ___________________________________________________________ ?????????????????????????????????? http://card.mail.cn.yahoo.com/ From pengyu.ut at gmail.com Tue Dec 29 13:58:59 2009 From: pengyu.ut at gmail.com (Peng Yu) Date: Tue, 29 Dec 2009 12:58:59 -0600 Subject: [Bioperl-l] Comparison between bioperl and biopython? In-Reply-To: <2B85EF86-8A84-491B-8C33-7EC16CCB8CBC@bioperl.org> References: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com> <2B85EF86-8A84-491B-8C33-7EC16CCB8CBC@bioperl.org> Message-ID: <366c6f340912291058t6c601e57re0c35e69fe81e09d@mail.gmail.com> To choose a toolkit for my work. On Tue, Dec 29, 2009 at 10:49 AM, Jason Stajich wrote: > Are you asking for the purposes of choosing a toolkit for your work or just > curious about the advantages/disadvantages of language choice? > > -jason > On Dec 29, 2009, at 8:08 AM, Peng Yu wrote: > >> May I ask somebody who are versitile in both bioperl and biopython >> comment on the pros and cons of bioperl and biopython? I'm sending >> this email to both bioperl and biopython mailing lists. But I hope >> that it will not result in any contention. >> >> I assume that the functionality between bioperl or biopython is the >> same, i.e., tasks can be done in bioperl can be done biopython and >> vice versa, as both libraries have been out there over 10 years. >> Please correct me if my understanding is not true. >> >> Given that a task that can be done with either bioperl or biopython, >> I, in particularly, want to know how long it will take to write the >> code for the task in bioperl and biopython, with the same readability >> requirement (see below) and the assumption that users have the same >> fluency in perl and python. >> >> python is claimed to be good for maintainability. But perl is >> criticized for there-are-many-ways-for-a-given-task. Since there are >> multiple ways in perl, let us assume that we always use perl in a >> readable way. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > http://fungalgenomes.org/ > > From pengyu.ut at gmail.com Tue Dec 29 14:15:14 2009 From: pengyu.ut at gmail.com (Peng Yu) Date: Tue, 29 Dec 2009 13:15:14 -0600 Subject: [Bioperl-l] [Biopython] Comparison between bioperl and biopython? In-Reply-To: <264855a00912290903m213d7cc4l607e8fa0bad55571@mail.gmail.com> References: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com> <264855a00912290903m213d7cc4l607e8fa0bad55571@mail.gmail.com> Message-ID: <366c6f340912291115o58ba0b82kce74e18fecd833c8@mail.gmail.com> On Tue, Dec 29, 2009 at 11:03 AM, Sean Davis wrote: > On Tue, Dec 29, 2009 at 11:08 AM, Peng Yu wrote: >> May I ask somebody who are versitile in both bioperl and biopython >> comment on the pros and cons of bioperl and biopython? I'm sending >> this email to both bioperl and biopython mailing lists. But I hope >> that it will not result in any contention. >> >> I assume that the functionality between bioperl or biopython is the >> same, i.e., tasks can be done in bioperl can be done biopython and >> vice versa, as both libraries have been out there over 10 years. >> Please correct me if my understanding is not true. > > The two projects have similar goals, but saying that the functionality > is the same would be an extreme oversimplification. ?You will need to > define what you want to do and then check to see what the two projects > have to offer. ?This will, in general, require perusing the websites > for both projects as well as the relevant documentation. According to your experience, are there some tasks that are easier with one than with another? >> Given that a task that can be done with either bioperl or biopython, >> I, in particularly, want to know how long it will take to write the >> code for the task in bioperl and biopython, with the same readability >> requirement (see below) and the assumption that users have the same >> fluency in perl and python. > > Again, you will want to define the task(s) to be accomplished and then > weigh the pros and cons of each project combined with local expertise. > ?If you don't know what you want to do, then you can certainly read > some examples on the websites and see which project strikes you as a > "winner" for you. > >> python is claimed to be good for maintainability. But perl is >> criticized for there-are-many-ways-for-a-given-task. Since there are >> multiple ways in perl, let us assume that we always use perl in a >> readable way. > > These two statements are generalizations that provide little insight > into the strengths or weaknesses of the languages. ?In other words, > one can write good or bad code in both languages. > > Hope that helps. > > Sean > From alperyilmaz at gmail.com Tue Dec 29 14:36:03 2009 From: alperyilmaz at gmail.com (Alper Yilmaz) Date: Tue, 29 Dec 2009 14:36:03 -0500 Subject: [Bioperl-l] Bio::TreeIO, Bio::Tree::Draw::Cladogram and phyloxml issues.. Message-ID: Hello, I have a tree in phyloxml format, and am trying to draw a subtree by using a spefic node as the root. I used Bio::Tree::Draw::Cladogram for drawing and encountered some problems. When I use whole tree and draw it, everything is fine; but, when I pick a particular node and construct the subtree from that node's ancestor by using "my $subtree = Bio::Tree::Tree->new(-root => $new_root, -nodelete => 1);", Bio::Tree::Draw::Cladogram creates a faulty EPS file, which contains extra lines added in the middle of the file. For instance: . . . 72.0820393261372 126 moveto (OsIBCD006509) show 30 81.25 moveto 81.25 lineto lineto 48.5410196630686 120 moveto 30 120 lineto . . . Should read: 72.0820393261372 126 moveto (OsIBCD006509) show 48.5410196630686 120 moveto 30 120 lineto Also, I tried to write the subtree into a new phyloxml file first, then draw it. The code is shown as follows: my $savefile = "save.phyloxml"; my $treeout = Bio::TreeIO->new(-format =>'phyloxml', -file => ">$savefile"); $treeout->write_tree($subtree); my $tree2 = Bio::TreeIO->new(-format =>'phyloxml', -file => "save.phyloxml"); my $t1 = $tree2->next_tree; my $image_output = "test.eps"; my $obj1 = Bio::Tree::Draw::Cladogram->new(-tree => $t1, -top => 10, -bottom => 10,); $obj1->print(-file => $image_output); The generated phyloxml file, which is named save.phyloxml, has an additional new line between "" and "" at the end of the file. And this additional new line lead an error when doing the parsing(open file and draw eps). I removed the new line, manually, then Bio::Tree::Draw::Cladogram gave me the eps file successfully. Anyone knows how to fix these problems: 1- faulty eps file generation 2- additional newline character in phyloxml output Is it the problem about the way I create the subtree? The phyloxml file I used can be downloaded from: http://grassius.org/download/HSF.phyloxml Run this code with the phyloxml file to see newline character problem: http://pastebin.com/f87ee1ee Run this code with the phyloxml file to see faulty eps file problem: http://pastebin.com/fc4715a1 Alper Yilmaz Post-doctoral Researcher Plant Biotechnology Center The Ohio State University 1060 Carmack Rd Columbus, OH 43210 (614)688-4954 From pengyu.ut at gmail.com Tue Dec 29 16:32:17 2009 From: pengyu.ut at gmail.com (Peng Yu) Date: Tue, 29 Dec 2009 15:32:17 -0600 Subject: [Bioperl-l] Document missing on Core/Latest/modules.html Message-ID: <366c6f340912291332i257f8061r6fa277324137033b@mail.gmail.com> http://bioperl.org/Core/Latest/modules.html Many links if not all are broken on the above pages. Could somebody fix it? For example, on http://www.bioperl.org/wiki/HOWTOs/txt/Beginners.txt, I see the following error. There is currently no text in this page. You can search for this page title in other pages, search the related logs, or edit this page. From jason at bioperl.org Tue Dec 29 16:49:00 2009 From: jason at bioperl.org (Jason Stajich) Date: Tue, 29 Dec 2009 13:49:00 -0800 Subject: [Bioperl-l] Document missing on Core/Latest/modules.html In-Reply-To: <366c6f340912291332i257f8061r6fa277324137033b@mail.gmail.com> References: <366c6f340912291332i257f8061r6fa277324137033b@mail.gmail.com> Message-ID: That is an outdated URL I am not sure where you are linking it from. We can probably now disable all old '/Core' URLs. All documentation links are in the /wiki/ The beginner's howto is here for example http://bioperl.org/wiki/HOWTO:Beginners > http://www.bioperl.org/wiki/HOWTOs On Dec 29, 2009, at 1:32 PM, Peng Yu wrote: > http://bioperl.org/Core/Latest/modules.html > > Many links if not all are broken on the above pages. Could somebody > fix it? > > For example, on http://www.bioperl.org/wiki/HOWTOs/txt/Beginners.txt, > I see the following error. > > There is currently no text in this page. You can search for this page > title in other pages, search the related logs, or edit this page. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ From jason at bioperl.org Tue Dec 29 16:50:26 2009 From: jason at bioperl.org (Jason Stajich) Date: Tue, 29 Dec 2009 13:50:26 -0800 Subject: [Bioperl-l] Comparison between bioperl and biopython? In-Reply-To: <658770.25534.qm@web15204.mail.cnb.yahoo.com> References: <658770.25534.qm@web15204.mail.cnb.yahoo.com> Message-ID: yep - be great if someone were to write it. This being a volunteer project we welcome your contribution. No I don't specifically have plans to do it, but maybe you can give it a try or another population genetics interested bioperl user/developer? -jason On Dec 29, 2009, at 10:30 AM, WangWenzhi wrote: > Dear Jason, > > Plink is a very useful program in the population genetics, > especially in the Genome-Wide SNP scan era. Is there any plan to add > the Plink (ped or tped) format to Bio::PopGen::IO? > > Thanks. > > Wenzhi Wang > State Key Laboratory of Genetic Resources and Evolution > Kunming Institute of Zoology, Chinese Academy of Sciences > Kunming, Yunnan 650223 P. R. China > Tel: 86 871 5198 993 > Fax: 86 871 5195 430 > E-mail: wenzhiwang1983 at yahoo.com.cn > > > ___________________________________________________________ > ????????????????? > http://card.mail.cn.yahoo.com/ -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ From jason at bioperl.org Tue Dec 29 16:57:49 2009 From: jason at bioperl.org (Jason Stajich) Date: Tue, 29 Dec 2009 13:57:49 -0800 Subject: [Bioperl-l] [Biopython] Comparison between bioperl and biopython? In-Reply-To: <366c6f340912291115o58ba0b82kce74e18fecd833c8@mail.gmail.com> References: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com> <264855a00912290903m213d7cc4l607e8fa0bad55571@mail.gmail.com> <366c6f340912291115o58ba0b82kce74e18fecd833c8@mail.gmail.com> Message-ID: <02851B8A-E74E-453E-9725-6FA8F3995F82@bioperl.org> On Dec 29, 2009, at 11:15 AM, Peng Yu wrote: > On Tue, Dec 29, 2009 at 11:03 AM, Sean Davis > wrote: >> On Tue, Dec 29, 2009 at 11:08 AM, Peng Yu >> wrote: >>> May I ask somebody who are versitile in both bioperl and biopython >>> comment on the pros and cons of bioperl and biopython? I'm sending >>> this email to both bioperl and biopython mailing lists. But I hope >>> that it will not result in any contention. >>> >>> I assume that the functionality between bioperl or biopython is the >>> same, i.e., tasks can be done in bioperl can be done biopython and >>> vice versa, as both libraries have been out there over 10 years. >>> Please correct me if my understanding is not true. >> >> The two projects have similar goals, but saying that the >> functionality >> is the same would be an extreme oversimplification. You will need to >> define what you want to do and then check to see what the two >> projects >> have to offer. This will, in general, require perusing the websites >> for both projects as well as the relevant documentation. > > According to your experience, are there some tasks that are easier > with one than with another? As you have still failed to give much insight into the 'tasks' it is hard to give you a better answer. If there is a module or set of routines already written then yes one might be easier than the other. Otherwise it just depends on your strengths in the programming language. We discussed the strengths of the different toolkits briefly on the podcast last month. http://twit.tv/floss96 I echo Sean. Use whichever language you are a better programmer in. BioPerl is more mature in some facets than is BioPython, but BioPython has some components that are more heavily developed and supported than BioPerl (structures being one of those and interfacing that to pyMol would be a strength). I personally think the Gbrowse, Bio-Graphics, and Bio::DB::GFF/Bio::DB::SeqFeature::Store interface to Sequence databases and Features is a critical aspect of mining genomic data and features and use these heavily in my work, making BioPerl easy and powerful for my tasks. That and sequence and alignment parsing and reformatting. But there are comparable tools written in python with and without BioPython that you can also use so mainly it is about building up an expertise in a toolkit and going forward. The BioPerl faithful will probably say it is more useful toolkit to us, but we are of course a biased sample. Both projects can benefit from more users and developers contributing code and documentation so I would just jump in and give it a try if you are unsure which will be easier for you. > >>> Given that a task that can be done with either bioperl or biopython, >>> I, in particularly, want to know how long it will take to write the >>> code for the task in bioperl and biopython, with the same >>> readability >>> requirement (see below) and the assumption that users have the same >>> fluency in perl and python. >> >> Again, you will want to define the task(s) to be accomplished and >> then >> weigh the pros and cons of each project combined with local >> expertise. >> If you don't know what you want to do, then you can certainly read >> some examples on the websites and see which project strikes you as a >> "winner" for you. >> >>> python is claimed to be good for maintainability. But perl is >>> criticized for there-are-many-ways-for-a-given-task. Since there are >>> multiple ways in perl, let us assume that we always use perl in a >>> readable way. >> >> These two statements are generalizations that provide little insight >> into the strengths or weaknesses of the languages. In other words, >> one can write good or bad code in both languages. >> >> Hope that helps. >> >> Sean >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ From pengyu.ut at gmail.com Tue Dec 29 17:01:05 2009 From: pengyu.ut at gmail.com (Peng Yu) Date: Wed, 30 Dec 2009 16:01:05 +1800 Subject: [Bioperl-l] How to download the exon sequences, and the exon and CDS boundary for a RefSeq ID? Message-ID: <366c6f340912291401t3ff173fbrc44fe0d4078be148@mail.gmail.com> I see the following example. But it is not clear to me how to get the exon sequences. I also want to get the exon boundaries and associated CDS boundaries. Although, I can get the boundary information from ucsc table browser, but it would be convenient if I can get it in bioperl along with the sequence. Could somebody let me know how do it? http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/DB/RefSeq.html From sdavis2 at mail.nih.gov Tue Dec 29 17:13:30 2009 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Tue, 29 Dec 2009 17:13:30 -0500 Subject: [Bioperl-l] Document missing on Core/Latest/modules.html In-Reply-To: <366c6f340912291332i257f8061r6fa277324137033b@mail.gmail.com> References: <366c6f340912291332i257f8061r6fa277324137033b@mail.gmail.com> Message-ID: <264855a00912291413r7ce37e2h673dec7c2624db6@mail.gmail.com> On Tue, Dec 29, 2009 at 4:32 PM, Peng Yu wrote: > http://bioperl.org/Core/Latest/modules.html > > Many links if not all are broken on the above pages. Could somebody fix it? > > For example, on http://www.bioperl.org/wiki/HOWTOs/txt/Beginners.txt, > I see the following error. > > There is currently no text in this page. You can search for this page > title in other pages, search the related logs, or edit this page. It is unfortunate that the links are broken on that page. However, I believe that page is somewhat outdated, anyway. Here are the HOWTO pages: http://www.bioperl.org/wiki/HOWTOs Sean From pengyu.ut at gmail.com Tue Dec 29 17:21:16 2009 From: pengyu.ut at gmail.com (Peng Yu) Date: Wed, 30 Dec 2009 16:21:16 +1800 Subject: [Bioperl-l] Document missing on Core/Latest/modules.html In-Reply-To: References: <366c6f340912291332i257f8061r6fa277324137033b@mail.gmail.com> Message-ID: <366c6f340912291421m38bb8348oe6b224f29208f9f4@mail.gmail.com> On Wed, Dec 30, 2009 at 3:49 PM, Jason Stajich wrote: > That is an outdated URL I am not sure where you are linking it from. We can > probably now disable all old '/Core' URLs. I'm linked from here. http://www.bioperl.org/wiki/BioPerl_Tutorial Since those URLs are outdated. Could you please fix the links on the above link? > All documentation links are in the /wiki/ > > The beginner's howto is here for example > ?http://bioperl.org/wiki/HOWTO:Beginners > >> http://www.bioperl.org/wiki/HOWTOs > > > On Dec 29, 2009, at 1:32 PM, Peng Yu wrote: > >> http://bioperl.org/Core/Latest/modules.html >> >> Many links if not all are broken on the above pages. Could somebody fix >> it? >> >> For example, on http://www.bioperl.org/wiki/HOWTOs/txt/Beginners.txt, >> I see the following error. >> >> There is currently no text in this page. You can search for this page >> title in other pages, search the related logs, or edit this page. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > http://fungalgenomes.org/ > > From sdavis2 at mail.nih.gov Tue Dec 29 18:06:17 2009 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Tue, 29 Dec 2009 18:06:17 -0500 Subject: [Bioperl-l] How to download the exon sequences, and the exon and CDS boundary for a RefSeq ID? In-Reply-To: <366c6f340912291401t3ff173fbrc44fe0d4078be148@mail.gmail.com> References: <366c6f340912291401t3ff173fbrc44fe0d4078be148@mail.gmail.com> Message-ID: <264855a00912291506s13c32d5dg7b46f0cc34c20f94@mail.gmail.com> On Tue, Dec 29, 2009 at 5:01 PM, Peng Yu wrote: > I see the following example. But it is not clear to me how to get the > exon sequences. I also want to get the exon boundaries and associated > CDS boundaries. Although, I can get the boundary information from ucsc > table browser, but it would be convenient if I can get it in bioperl > along with the sequence. > > Could somebody let me know how do it? > > http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/DB/RefSeq.html Hi, Peng. There may be some confusion, as the UCSC database aligns RefSeq sequence to a genome to generate exon start and end coordinates. However, the RefSeq records retrieved by Bio::DB::RefSeq are not in genomic context and so do not have start and end locations on the genome. That is, if you want the starts and ends along the genome, that information is not available from the RefSeq record itself, I don't think. If that is what you need (genomic coordinates), you can download the information directly from UCSC, download flat files from NCBI mapview, or even from ensembl (using biomart, for instance). If you are looking for a bioperl-compliant way of doing this, look at the Ensembl Perl API. Sean From jkhilmer at gmail.com Tue Dec 29 14:55:18 2009 From: jkhilmer at gmail.com (Jonathan Hilmer) Date: Tue, 29 Dec 2009 12:55:18 -0700 Subject: [Bioperl-l] [Biopython] Comparison between bioperl and biopython? In-Reply-To: <366c6f340912291115o58ba0b82kce74e18fecd833c8@mail.gmail.com> References: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com> <264855a00912290903m213d7cc4l607e8fa0bad55571@mail.gmail.com> <366c6f340912291115o58ba0b82kce74e18fecd833c8@mail.gmail.com> Message-ID: <81277ce10912291155x6dde10ewe2055b9692d077c1@mail.gmail.com> Personally, I think that the differences between Python and Perl (although substantial) are not large enough to make the language itself the deciding factor. Instead, consider the larger community of software. I haven't yet found a situation in which Python cannot be applied: it can be used with R (statistics); lower-level code C or fortran; visualization software such as PyMol, Chimera, Blender, VTK; plotting with matplotlib; and scipy/numpy or sage, which provide innumerable benefits for computation, data-processing, etc. Although I don't claim to have a great deal of experience with Perl, I haven't seen the same integration with that language: I'm assuming it can be used with R and VTK (not sure about C or fortran?). For this reason, unless your work is highly targeted and you have no use programming language integration with other software, I would recommend Python. For perl experts, I would truly appreciate any corrections you could offer to these observations of mine, since I wouldn't mind using perl if it offers benefits either in general or for specific applications. Jonathan On Tue, Dec 29, 2009 at 12:15 PM, Peng Yu wrote: > On Tue, Dec 29, 2009 at 11:03 AM, Sean Davis wrote: >> On Tue, Dec 29, 2009 at 11:08 AM, Peng Yu wrote: >>> May I ask somebody who are versitile in both bioperl and biopython >>> comment on the pros and cons of bioperl and biopython? I'm sending >>> this email to both bioperl and biopython mailing lists. But I hope >>> that it will not result in any contention. >>> >>> I assume that the functionality between bioperl or biopython is the >>> same, i.e., tasks can be done in bioperl can be done biopython and >>> vice versa, as both libraries have been out there over 10 years. >>> Please correct me if my understanding is not true. >> >> The two projects have similar goals, but saying that the functionality >> is the same would be an extreme oversimplification. ?You will need to >> define what you want to do and then check to see what the two projects >> have to offer. ?This will, in general, require perusing the websites >> for both projects as well as the relevant documentation. > > According to your experience, are there some tasks that are easier > with one than with another? > >>> Given that a task that can be done with either bioperl or biopython, >>> I, in particularly, want to know how long it will take to write the >>> code for the task in bioperl and biopython, with the same readability >>> requirement (see below) and the assumption that users have the same >>> fluency in perl and python. >> >> Again, you will want to define the task(s) to be accomplished and then >> weigh the pros and cons of each project combined with local expertise. >> ?If you don't know what you want to do, then you can certainly read >> some examples on the websites and see which project strikes you as a >> "winner" for you. >> >>> python is claimed to be good for maintainability. But perl is >>> criticized for there-are-many-ways-for-a-given-task. Since there are >>> multiple ways in perl, let us assume that we always use perl in a >>> readable way. >> >> These two statements are generalizations that provide little insight >> into the strengths or weaknesses of the languages. ?In other words, >> one can write good or bad code in both languages. >> >> Hope that helps. >> >> Sean >> > > _______________________________________________ > Biopython mailing list ?- ?Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From wgheath at gmail.com Tue Dec 29 15:16:39 2009 From: wgheath at gmail.com (William Heath) Date: Tue, 29 Dec 2009 12:16:39 -0800 Subject: [Bioperl-l] [Biopython] Comparison between bioperl and biopython? In-Reply-To: <81277ce10912291155x6dde10ewe2055b9692d077c1@mail.gmail.com> References: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com> <264855a00912290903m213d7cc4l607e8fa0bad55571@mail.gmail.com> <366c6f340912291115o58ba0b82kce74e18fecd833c8@mail.gmail.com> <81277ce10912291155x6dde10ewe2055b9692d077c1@mail.gmail.com> Message-ID: The biggest reason to go with python is the ease of use. Biologists are not programmers and the learning curve for python is much smaller than that of perl. I like perl but choose python because of this issue. Perl 6 does address some of these issues however but this has not been fully implemented as of yet. -Tim P.S. I love, love, love cpan though which is only for perl right now :( On Tue, Dec 29, 2009 at 11:55 AM, Jonathan Hilmer wrote: > Personally, I think that the differences between Python and Perl > (although substantial) are not large enough to make the language > itself the deciding factor. > > Instead, consider the larger community of software. I haven't yet > found a situation in which Python cannot be applied: it can be used > with R (statistics); lower-level code C or fortran; visualization > software such as PyMol, Chimera, Blender, VTK; plotting with > matplotlib; and scipy/numpy or sage, which provide innumerable > benefits for computation, data-processing, etc. > > Although I don't claim to have a great deal of experience with Perl, I > haven't seen the same integration with that language: I'm assuming it > can be used with R and VTK (not sure about C or fortran?). For this > reason, unless your work is highly targeted and you have no use > programming language integration with other software, I would > recommend Python. > > For perl experts, I would truly appreciate any corrections you could > offer to these observations of mine, since I wouldn't mind using perl > if it offers benefits either in general or for specific applications. > > > Jonathan > > On Tue, Dec 29, 2009 at 12:15 PM, Peng Yu wrote: > > On Tue, Dec 29, 2009 at 11:03 AM, Sean Davis > wrote: > >> On Tue, Dec 29, 2009 at 11:08 AM, Peng Yu wrote: > >>> May I ask somebody who are versitile in both bioperl and biopython > >>> comment on the pros and cons of bioperl and biopython? I'm sending > >>> this email to both bioperl and biopython mailing lists. But I hope > >>> that it will not result in any contention. > >>> > >>> I assume that the functionality between bioperl or biopython is the > >>> same, i.e., tasks can be done in bioperl can be done biopython and > >>> vice versa, as both libraries have been out there over 10 years. > >>> Please correct me if my understanding is not true. > >> > >> The two projects have similar goals, but saying that the functionality > >> is the same would be an extreme oversimplification. You will need to > >> define what you want to do and then check to see what the two projects > >> have to offer. This will, in general, require perusing the websites > >> for both projects as well as the relevant documentation. > > > > According to your experience, are there some tasks that are easier > > with one than with another? > > > >>> Given that a task that can be done with either bioperl or biopython, > >>> I, in particularly, want to know how long it will take to write the > >>> code for the task in bioperl and biopython, with the same readability > >>> requirement (see below) and the assumption that users have the same > >>> fluency in perl and python. > >> > >> Again, you will want to define the task(s) to be accomplished and then > >> weigh the pros and cons of each project combined with local expertise. > >> If you don't know what you want to do, then you can certainly read > >> some examples on the websites and see which project strikes you as a > >> "winner" for you. > >> > >>> python is claimed to be good for maintainability. But perl is > >>> criticized for there-are-many-ways-for-a-given-task. Since there are > >>> multiple ways in perl, let us assume that we always use perl in a > >>> readable way. > >> > >> These two statements are generalizations that provide little insight > >> into the strengths or weaknesses of the languages. In other words, > >> one can write good or bad code in both languages. > >> > >> Hope that helps. > >> > >> Sean > >> > > > > _______________________________________________ > > Biopython mailing list - Biopython at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython > > > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From pengyu.ut at gmail.com Wed Dec 30 12:26:45 2009 From: pengyu.ut at gmail.com (Peng Yu) Date: Thu, 31 Dec 2009 11:26:45 +1800 Subject: [Bioperl-l] How to read in the whole fasta file in the memory? Message-ID: <366c6f340912300926k5af5cc88nc3c3babda541fd1@mail.gmail.com> With Bio::SeqIO, I can only read in the records in a fasta file one by one. This is preferable if there are many records in a file. But I also want to read all the records in. I could use a while loop to read all records in. But could somebody let me know if there is a function in bioperl that can read in all the record at once and return me an object? http://www.bioperl.org/wiki/HOWTO:SeqIO From sdavis2 at mail.nih.gov Wed Dec 30 13:04:53 2009 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Wed, 30 Dec 2009 13:04:53 -0500 Subject: [Bioperl-l] How to read in the whole fasta file in the memory? In-Reply-To: <366c6f340912300926k5af5cc88nc3c3babda541fd1@mail.gmail.com> References: <366c6f340912300926k5af5cc88nc3c3babda541fd1@mail.gmail.com> Message-ID: <264855a00912301004t396e0d4fwf9d291c5d82c3fb9@mail.gmail.com> On Wed, Dec 30, 2009 at 12:26 PM, Peng Yu wrote: > With Bio::SeqIO, I can only read in the records in a fasta file one by > one. This is preferable if there are many records in a file. > > But I also want to read all the records in. I could use a while loop > to read all records in. But could somebody let me know if there is a > function in bioperl that can read in all the record at once and return > me an object? In perl, you can use an array to store the records. You could also use a hash if you have reasonable keys for the entries. Sean > http://www.bioperl.org/wiki/HOWTO:SeqIO > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jason at bioperl.org Wed Dec 30 14:58:54 2009 From: jason at bioperl.org (Jason Stajich) Date: Wed, 30 Dec 2009 11:58:54 -0800 Subject: [Bioperl-l] How to read in the whole fasta file in the memory? In-Reply-To: <264855a00912301004t396e0d4fwf9d291c5d82c3fb9@mail.gmail.com> References: <366c6f340912300926k5af5cc88nc3c3babda541fd1@mail.gmail.com> <264855a00912301004t396e0d4fwf9d291c5d82c3fb9@mail.gmail.com> Message-ID: <3550F192-111F-48A7-B1B7-113FFFAC105B@bioperl.org> or use a database object so you can retrieve sequences that have a particular id. See Bio::DB::Fasta On Dec 30, 2009, at 10:04 AM, Sean Davis wrote: > On Wed, Dec 30, 2009 at 12:26 PM, Peng Yu wrote: >> With Bio::SeqIO, I can only read in the records in a fasta file one >> by >> one. This is preferable if there are many records in a file. >> >> But I also want to read all the records in. I could use a while loop >> to read all records in. But could somebody let me know if there is a >> function in bioperl that can read in all the record at once and >> return >> me an object? > > In perl, you can use an array to store the records. You could also > use a hash if you have reasonable keys for the entries. > > Sean > > >> http://www.bioperl.org/wiki/HOWTO:SeqIO >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ From maj at fortinbras.us Wed Dec 30 16:20:31 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 30 Dec 2009 16:20:31 -0500 Subject: [Bioperl-l] How to read in the whole fasta file in the memory? In-Reply-To: <366c6f340912300926k5af5cc88nc3c3babda541fd1@mail.gmail.com> References: <366c6f340912300926k5af5cc88nc3c3babda541fd1@mail.gmail.com> Message-ID: <2646F627E6D14AADB412A6E6B51E24DA@NewLife> I think you might want Bio::AlignIO: $alnio = Bio::AlignIO->new(-file=> 'my.fas' ); $aln = $alnio->next_aln; @seqs = $aln->each_seqs; MAJ ----- Original Message ----- From: "Peng Yu" To: Sent: Wednesday, December 30, 2009 12:26 PM Subject: [Bioperl-l] How to read in the whole fasta file in the memory? > With Bio::SeqIO, I can only read in the records in a fasta file one by > one. This is preferable if there are many records in a file. > > But I also want to read all the records in. I could use a while loop > to read all records in. But could somebody let me know if there is a > function in bioperl that can read in all the record at once and return > me an object? > > http://www.bioperl.org/wiki/HOWTO:SeqIO > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From David.Messina at sbc.su.se Thu Dec 31 05:55:32 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 31 Dec 2009 11:55:32 +0100 Subject: [Bioperl-l] question about a PAML module In-Reply-To: <31992102.1262223390984.JavaMail.oracle@rif2.s.upf.edu> References: <17885902.1262198478831.JavaMail.oracle@rif1.s.upf.edu> <31992102.1262223390984.JavaMail.oracle@rif2.s.upf.edu> Message-ID: Hi Rui and Sandra, Could you file this as a bug report at http://bugzilla.open-bio.org/enter_bug.cgi?product=Bioperl ? Once you've created the bug report with a brief description of the problem and submitted it, please attach the following to the bug report: - sample input files (a sequence file and a tree file, probably) - a script which reproduces the problem - the output (error messages) like you show below When I updated the code to work with the current version, I didn't exhaustively test all of the different modes of running codeml, so I appreciate you reporting this. There was another, similar issue reported a few days ago. I will try to take a look at both of these bug reports soon. Dave From David.Messina at sbc.su.se Tue Dec 1 05:14:40 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 1 Dec 2009 11:14:40 +0100 Subject: [Bioperl-l] [Bug 2937] Strand in fasta35 output does not seem to be parsed In-Reply-To: <8D08960C647E64438CE5740657CBBDC50148731FDA@iahcexch1.iah.bbsrc.ac.uk> References: <8D08960C647E64438CE5740657CBBDC50148731E47@iahcexch1.iah.bbsrc.ac.uk> <50F0159A-DE58-4405-A2FE-4FA95A3CDDA4@sbc.su.se> <8D08960C647E64438CE5740657CBBDC50148731FDA@iahcexch1.iah.bbsrc.ac.uk> Message-ID: Hi Mick, Did you try running the test case that you had originally attached to the bug report? Or is the below from different code and a diffrent fasta output file? In any case, I'll need to look at the fasta35 output file and the parse2.pl you ran in order to reproduce and fix this -- could you please open a new bug report and attach them to it? Thanks, Dave On Nov 30, 2009, at 17:49, michael watson (IAH-C) wrote: > Hi Dave > > Just got round to looking at this. > > In bioperl-1.6.0, the strand didn't get parsed, but the module only warned about something: > > --------------------- WARNING --------------------- > MSG: Unrecognized alignment line (1) ' /usr/local/fasta3/bin/fasta35 -n -U -Q -H -A -E 2.0 -C 19 -m 0 -m 9i -O iltv_pre.fasta35 iltv_pre.fasta clusters.fasta' > --------------------------------------------------- > > However, in the bioperl-live I just downloaded, this had turned into a full-on stack trace: > > ------------- EXCEPTION ------------- > MSG: Unrecognized alignment line (1) ' /usr/local/fasta3/bin/fasta35 -n -U -Q -H -A -E 2.0 -C 19 -m 0 -m 9i -O iltv_pre.fasta35 iltv_pre.fasta clusters.fasta' > STACK Bio::SearchIO::fasta::next_result /usr/local/bioperl-live_301109//Bio/SearchIO/fasta.pm:1347 > STACK toplevel parse2.pl:20 > ------------------------------------- > > I'm not sure if this is even related to the strand issue (I suspect not, but you never know) but something changed between bioperl-1.6.0 and the live trunk I downloaded today to ensure I still can't use the module. > > Is this another bug report? > > Thanks again for all your help > > Mick > > -----Original Message----- > From: Dave Messina [mailto:David.Messina at sbc.su.se] > Sent: 23 November 2009 17:46 > To: michael watson (IAH-C) > Subject: Re: [Bug 2937] Strand in fasta35 output does not seem to be parsed > > Hi Mick, > > Sure thing -- the current build from subversion is packaged up every > night and available here: > http://www.bioperl.org/DIST/nightly_builds/ > > Just grab bioperl-live.tar.gz from there and you'll get the changes. > > > Dave > > > > > On Nov 23, 2009, at 6:34 PM, michael watson (IAH-C) wrote: > >> Hi Dave >> >> Thanks for the hard work. >> >> Trying to get the latest updates so I can use this... don't have svn >> on my server, tried to install it and I don't have python either, >> which is needed to install it. >> >> I face about 3 weeks whilst my IT department sort this out, unless I >> can access the changes any other way? >> >> Thanks >> Mick >> >> -----Original Message----- >> From: bugzilla-daemon at portal.open-bio.org [mailto:bugzilla- >> daemon at portal.open-bio.org] >> Sent: 20 November 2009 15:12 >> To: michael watson (IAH-C) >> Subject: [Bug 2937] Strand in fasta35 output does not seem to be >> parsed >> >> http://bugzilla.open-bio.org/show_bug.cgi?id=2937 >> >> >> online at davemessina.com changed: >> >> What |Removed |Added >> ---------------------------------------------------------------------------- >> Status|NEW |RESOLVED >> Resolution| |FIXED >> >> >> >> >> ------- Comment #7 from online at davemessina.com 2009-11-20 10:12 EST >> ------- >> Fixed in r16394. >> >> Michael, thanks for the report. Your test cases pass, but please >> reopen the bug >> if needed. >> >> >> -- >> Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi? >> tab=email >> ------- You are receiving this mail because: ------- >> You reported the bug, or are watching the reporter. > From e.osimo at gmail.com Tue Dec 1 13:05:48 2009 From: e.osimo at gmail.com (Emanuele Osimo) Date: Tue, 1 Dec 2009 19:05:48 +0100 Subject: [Bioperl-l] Statistics: how to obtain the p value of a T test Message-ID: <2ac05d0f0912011005n6140869aoc634ad08cdf10ca4@mail.gmail.com> Hello everyone, I'm trying to get the p value of a statistic made with Statistics::TTest I cannot find this function: I can find if the null hypothesis is rejected at a certain confidence level, but I cannot make the script show me the actual p value. Do you know other scripts that can do that? Thanks Emanuele From cjfields at illinois.edu Tue Dec 1 14:25:03 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 1 Dec 2009 13:25:03 -0600 Subject: [Bioperl-l] Fwd: [Utilities-announce] NCBI E-Utility Policy Change References: <7B6F170840CA6C4DA63EE0C8A7BB43EC09CA7387@NIHCESMLBX15.nih.gov> Message-ID: <964687F9-989B-4F11-B74B-977912A922EB@illinois.edu> I'll be adjusting the requisite parameters as indicated below. I'm reluctant to include a time-based limit on submissions (NCBI wants a max of 100 requests at peak hours), but it may become necessary if they request it. chris Begin forwarded message: > From: > Date: December 1, 2009 12:59:34 PM CST > To: > Subject: [Utilities-announce] NCBI E-Utility Policy Change > Reply-To: utilities-announce at ncbi.nlm.nih.gov > > As part of an ongoing effort to ensure efficient access to the Entrez Utilities (E-utilities) by all users, NCBI has decided to change the usage policy for the E-utilities effective June 1, 2010. Effective on June 1, 2010, all E-utility requests, either using standard URLs or SOAP, must contain non-null values for both the &tool and &email parameters. Any E-utility request made after June 1, 2010 that does not contain values for both parameters will return an error explaining that these parameters must be included in E-utility requests. > > The value of the &tool parameter should be a URI-safe string that is the name of the software package, script or web page producing the E-utility request. > > The value of the &email parameter should be a valid e-mail address for the appropriate contact person or group responsible for maintaining the tool producing the E-utility request. > > NCBI uses these parameters to contact users whose use of the E-utilities violates the standard usage policies described athttp://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html#UserSystemRequirements. These usage policies are designed to prevent excessive requests from a small group of users from reducing or eliminating the wider community's access to the E-utilities. NCBI will attempt to contact a user at the e-mail address provided in the &email parameter prior to blocking access to the E-utilities. > > NCBI realizes that this policy change will require many of our users to change their code. Based on past experience, we anticipate that most of our users should be able to make the necessary changes before the June 1, 2010 deadline. If you have any concerns about making these changes by that date, or if you have any questions about these policies, please contact eutilities at ncbi.nlm.nih.gov. > > Thank you for your understanding and cooperation in helping us continue to deliver a reliable and efficient web service. > > _______________________________________________ > Utilities-announce mailing list > http://www.ncbi.nlm.nih.gov/mailman/listinfo/utilities-announce From maj at fortinbras.us Tue Dec 1 21:27:06 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 1 Dec 2009 21:27:06 -0500 Subject: [Bioperl-l] test test test Message-ID: <95142B0024EC48928CB56A69A17A8559@NewLife> MAJ From ocarnorsk138 at gmail.com Tue Dec 1 21:59:48 2009 From: ocarnorsk138 at gmail.com (Ocar Campos) Date: Tue, 1 Dec 2009 23:59:48 -0300 Subject: [Bioperl-l] test test test In-Reply-To: <95142B0024EC48928CB56A69A17A8559@NewLife> References: <95142B0024EC48928CB56A69A17A8559@NewLife> Message-ID: test test test test back O'car Campos C. Bioinformatics Engineering Student. University of Talca. Chile. 2009/12/1 Mark A. Jensen > MAJ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From maj at fortinbras.us Tue Dec 1 22:08:23 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 1 Dec 2009 22:08:23 -0500 Subject: [Bioperl-l] test test test In-Reply-To: References: <95142B0024EC48928CB56A69A17A8559@NewLife> Message-ID: I love when people are paying attention! ----- Original Message ----- From: Ocar Campos To: Mark A. Jensen ; Bioperl Mailing List. Sent: Tuesday, December 01, 2009 9:59 PM Subject: Re: [Bioperl-l] test test test test test test test back O'car Campos C. Bioinformatics Engineering Student. University of Talca. Chile. 2009/12/1 Mark A. Jensen MAJ _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From rtbio.2009 at gmail.com Wed Dec 2 07:07:08 2009 From: rtbio.2009 at gmail.com (Roopa Raghuveer) Date: Wed, 2 Dec 2009 13:07:08 +0100 Subject: [Bioperl-l] Remote blast Message-ID: Hello everyone, I have a problem. I am new to Bioperl. I am working on RNAi tool wherein a cgi script was written which connects to NCBI blast using remote blast program,i.e., The input sequence given in the html page is taken as input and Remote blast is performed on this based on the code for Remote blast.But,I have a problem in the Remote blast code. My code goes like this @compseqs=blastcode($in{'Inputseq'}); sub blastcode { $input1= $_[0]; open(NUC,'>',$nuc); print NUC $input1; close(NUC); my $prog = 'blastn'; my $db = 'refseq_rna'; my $e_val= '1e-10'; my $organism= 'Trypanosoma Brucei'; $gb = new Bio::DB::GenBank; my @params = ( '-prog' => $prog, '-data' => $db, '-expect' => $e_val, '-readmethod' => 'SearchIO', '-Organism' => $organism ); my $factory = Bio::Tools::Run::RemoteBlast->new(@params); #change a paramter $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma brucei[ORGN]'; my $v = 1; #$v is just to turn on and off the messages my $str = Bio::SeqIO->new(-file => $nuc , '-format' => 'fasta' , '-organism' => 'Trypanosoma Brucei' ); while (my $input = $str->next_seq()) { #Blast a sequence against a database: #Alternatively, you could pass in a file with many #sequences rather than loop through sequence one at a time #Remove the loop starting 'while (my $input = $str->next_seq())' #and swap the two lines below for an example of that. my $r = $factory->submit_blast($input); print STDERR "waiting...." if($v>0); while ( my @rids = $factory->each_rid ) { foreach my $rid ( @rids ) { my $rc = $factory->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $factory->remove_rid($rid); } print STDERR "." if ( $v > 0 ); sleep 5; } else { my $result = $rc->next_result(); #save the output my $filename = $result->query_name()."\.out"; $factory->save_output($filename); $factory->remove_rid($rid); # open(BLASTDEBUGFILE,'>',$blastdebugfile); # print BLASTDEBUGFILE "Test1 $result"; # close(BLASTDEBUGFILE); open(OUTFILE,'>',$outfile); print OUTFILE "Test2 $result->database_name()"; close(OUTFILE); while ( my $hit = $result->next_hit ) { next unless ( $v > 0); # open(OUTFILE,'>',$outfile); # print OUTFILE "in while hits"; #close(OUTFILE); my $sequ = $gb->get_Seq_by_version($hit->name); my $dna = $sequ->seq(); # get the sequence as a string push(@seqs,$dna); } } } } } # open(OUTFILE,'>',$outfile); #print OUTFILE $seqs[0]; # close(OUTFILE); return(@seqs); } Here in the above code,my program is able to go till the 'else' part and writing the output file i.e.,this step. my $filename = $result->query_name()."\.out"; But when I tried to enter in to the next while loop where I can get the hits,the program is not entering into the while loop i.e., Not entering into this while ( my $hit = $result->next_hit ) { next unless ( $v > 0); Hence I am unable to get any hits for my query. Ex:-If the query's accession number is Tb11.02.2210, I could just get a file Tb11.02.2210.out file,it is just displaying the file name on the browser. Please help me in solving this problem and mail me regarding any confusions. Regards, Roopa. From ashvip at gmail.com Wed Dec 2 00:24:09 2009 From: ashvip at gmail.com (Vipin Singh) Date: Wed, 2 Dec 2009 10:54:09 +0530 Subject: [Bioperl-l] Problems with installation Message-ID: <8d766b180912012124q44c58f62hecc598615f65e99c@mail.gmail.com> Dear Sir/Madam, I have not been able to install bioperl on my Windows 32 machine despite repeated attempts. I have tried both Active Perl and Strwaberry perl but both do not seem to work. I have followed the instruction given at -- http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows Please guide. Thanks, Vipin. Vipin Singh, Senior Research Fellow, Centre for Cellular and Molecular Biology, Hyderabad - 500007 India. contact - 91-040-27192778 From scott at scottcain.net Wed Dec 2 09:18:37 2009 From: scott at scottcain.net (Scott Cain) Date: Wed, 2 Dec 2009 09:18:37 -0500 Subject: [Bioperl-l] Problems with installation In-Reply-To: <8d766b180912012124q44c58f62hecc598615f65e99c@mail.gmail.com> References: <8d766b180912012124q44c58f62hecc598615f65e99c@mail.gmail.com> Message-ID: <4536f7700912020618y31f8fa15i6e01ce9614a87341@mail.gmail.com> Hello Vipin, "do not seem to work" doesn't give us much to go on; can you tell us what happened? Scott On Wed, Dec 2, 2009 at 12:24 AM, Vipin Singh wrote: > Dear Sir/Madam, > I have not been able to install bioperl on my Windows 32 machine despite > repeated attempts. I have tried both Active Perl and Strwaberry perl but > both do not seem to work. > I have followed the instruction given at > -- http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows > > Please guide. > Thanks, > Vipin. > Vipin Singh, > Senior Research Fellow, > Centre for Cellular and Molecular Biology, > Hyderabad - 500007 > India. > contact - 91-040-27192778 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From maj at fortinbras.us Wed Dec 2 09:18:31 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 2 Dec 2009 09:18:31 -0500 Subject: [Bioperl-l] Problems with installation In-Reply-To: <8d766b180912012124q44c58f62hecc598615f65e99c@mail.gmail.com> References: <8d766b180912012124q44c58f62hecc598615f65e99c@mail.gmail.com> Message-ID: <4A3B25FFC79F43E1AF65E56FD1630F44@NewLife> Hi Vipin-- We need some more information; your commands, error messages you received. Thanks, Mark ----- Original Message ----- From: "Vipin Singh" To: Sent: Wednesday, December 02, 2009 12:24 AM Subject: [Bioperl-l] Problems with installation > Dear Sir/Madam, > I have not been able to install bioperl on my Windows 32 machine despite > repeated attempts. I have tried both Active Perl and Strwaberry perl but > both do not seem to work. > I have followed the instruction given at > -- http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows > > Please guide. > Thanks, > Vipin. > Vipin Singh, > Senior Research Fellow, > Centre for Cellular and Molecular Biology, > Hyderabad - 500007 > India. > contact - 91-040-27192778 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From bcantarel at som.umaryland.edu Wed Dec 2 13:36:27 2009 From: bcantarel at som.umaryland.edu (Brandi Cantarel) Date: Wed, 2 Dec 2009 13:36:27 -0500 Subject: [Bioperl-l] Parsing Genbank Message-ID: Hi all, I am not sure if this is normal, but when I use SEQIO to parse genbank files, it changes the coordinates of things on the minus strand. For example, I have a sequence that has a CDS on the minus strand at it is from 911 to 974. The sequence is 974 nt. x $cds->start 1 x $cds->end 64 How can I get the original coordinates? Is there a command for that or will I have to just do the math? Feature or Bug? ~~~~~~~~~~~~~~~~~~~~ Brandi Cantarel, PhD Bioinformatics Analyst Institute for Genome Sciences School of Medicine University of Maryland, Baltimore From maj at fortinbras.us Wed Dec 2 14:09:11 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 2 Dec 2009 14:09:11 -0500 Subject: [Bioperl-l] Parsing Genbank In-Reply-To: References: Message-ID: Hi Brandi- If $cds is a Bio::SeqFeature::Generic, that's weird (I believe); if its an ordinary Bio::Seq, that's normal. Can you elaborate by posting your code? cheers, MAJ ----- Original Message ----- From: "Brandi Cantarel" To: Sent: Wednesday, December 02, 2009 1:36 PM Subject: [Bioperl-l] Parsing Genbank > Hi all, > I am not sure if this is normal, but when I use SEQIO to parse genbank files, > it changes the coordinates of things on the minus strand. > > > For example, I have a sequence that has a CDS on the minus strand at it is > from 911 to 974. The sequence is 974 nt. > > x $cds->start > 1 > x $cds->end > 64 > > How can I get the original coordinates? Is there a command for that or will I > have to just do the math? > > Feature or Bug? > > > ~~~~~~~~~~~~~~~~~~~~ > Brandi Cantarel, PhD > Bioinformatics Analyst > Institute for Genome Sciences > School of Medicine > University of Maryland, Baltimore > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From bcantarel at som.umaryland.edu Wed Dec 2 14:29:56 2009 From: bcantarel at som.umaryland.edu (Brandi Cantarel) Date: Wed, 2 Dec 2009 14:29:56 -0500 Subject: [Bioperl-l] Parsing Genbank In-Reply-To: References: Message-ID: <854933BD-4EF5-4A74-8CE2-62E11AD64B6A@som.umaryland.edu> Here is some of my code, the real code actually enters the data into a database. $in = Bio::SeqIO->new(-file => $gbkfile, '-format' => 'genbank'); W1:while (my $seq = $in->next_seq()) { my @feats = $seq->get_all_SeqFeatures(); my $j = 0; F1:foreach $cds (@feats) { next F1 unless ($cds->primary_tag() eq 'CDS'); #do something with the cds start and cds end } } LOCUS subjpool12_contig3 974 bp DNA linear UNK 19-Nov-2009 ACCESSION subjpool12_contig3 KEYWORDS . SOURCE human metagenome ORGANISM human metagenome unclassified sequences; organismal metagenomes,metagenomes. FEATURES Location/Qualifiers source 1..974 /mol_type="genomic DNA" /isolation_source="Homo sapiens" /organism="human metagenome" /collection_date="19-Nov-2009" CDS complement(911..974) /locus_tag="subjpool12_contig3|metagene|gene_2" /translation="IRIMTVELINPYIRHVEHST" /score="2.52804" /product="hypothetical protein" /note="score=2.52804" /note="score=2.52804" /note="frame=1" ORIGIN #some sequence?. >From this example, I would like to get the coordinates 911 and 974, rather than 1 and 64. ~~~~~~~~~~~~~~~~~~~~ Brandi Cantarel, PhD Bioinformatics Analyst Institute for Genome Sciences School of Medicine University of Maryland, Baltimore On Dec 2, 2009, at 2:09 PM, Mark A. Jensen wrote: > Hi Brandi- > If $cds is a Bio::SeqFeature::Generic, that's weird (I believe); if its an ordinary Bio::Seq, that's normal. > Can you elaborate by posting your code? > cheers, > MAJ > ----- Original Message ----- From: "Brandi Cantarel" > To: > Sent: Wednesday, December 02, 2009 1:36 PM > Subject: [Bioperl-l] Parsing Genbank > > >> Hi all, >> I am not sure if this is normal, but when I use SEQIO to parse genbank files, it changes the coordinates of things on the minus strand. >> >> >> For example, I have a sequence that has a CDS on the minus strand at it is from 911 to 974. The sequence is 974 nt. >> >> x $cds->start >> 1 >> x $cds->end >> 64 >> >> How can I get the original coordinates? Is there a command for that or will I have to just do the math? >> >> Feature or Bug? >> >> >> ~~~~~~~~~~~~~~~~~~~~ >> Brandi Cantarel, PhD >> Bioinformatics Analyst >> Institute for Genome Sciences >> School of Medicine >> University of Maryland, Baltimore >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > From maj at fortinbras.us Wed Dec 2 14:48:44 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 2 Dec 2009 14:48:44 -0500 Subject: [Bioperl-l] Parsing Genbank In-Reply-To: <854933BD-4EF5-4A74-8CE2-62E11AD64B6A@som.umaryland.edu> References: <854933BD-4EF5-4A74-8CE2-62E11AD64B6A@som.umaryland.edu> Message-ID: <24B3D1A1667D44338CDE5A4FFE425C56@NewLife> with fake seq data and that header, I don't get a problem: DB<2> x $cds->location 0 Bio::Location::Simple=HASH(0x37b1df4) '_end' => 974 '_location_type' => 'EXACT' '_root_verbose' => 0 '_seqid' => 'subjpool12_contig3' '_start' => 911 '_strand' => '-1' Are you using the latest BioPerl (1.6.1 or the trunk) ? MAJ ----- Original Message ----- From: "Brandi Cantarel" Cc: Sent: Wednesday, December 02, 2009 2:29 PM Subject: Re: [Bioperl-l] Parsing Genbank Here is some of my code, the real code actually enters the data into a database. $in = Bio::SeqIO->new(-file => $gbkfile, '-format' => 'genbank'); W1:while (my $seq = $in->next_seq()) { my @feats = $seq->get_all_SeqFeatures(); my $j = 0; F1:foreach $cds (@feats) { next F1 unless ($cds->primary_tag() eq 'CDS'); ###>> debugger stops here for above output #do something with the cds start and cds end } } LOCUS subjpool12_contig3 974 bp DNA linear UNK 19-Nov-2009 ACCESSION subjpool12_contig3 KEYWORDS . SOURCE human metagenome ORGANISM human metagenome unclassified sequences; organismal metagenomes,metagenomes. FEATURES Location/Qualifiers source 1..974 /mol_type="genomic DNA" /isolation_source="Homo sapiens" /organism="human metagenome" /collection_date="19-Nov-2009" CDS complement(911..974) /locus_tag="subjpool12_contig3|metagene|gene_2" /translation="IRIMTVELINPYIRHVEHST" /score="2.52804" /product="hypothetical protein" /note="score=2.52804" /note="score=2.52804" /note="frame=1" ORIGIN #some sequence?. >From this example, I would like to get the coordinates 911 and 974, rather than >1 and 64. ~~~~~~~~~~~~~~~~~~~~ Brandi Cantarel, PhD Bioinformatics Analyst Institute for Genome Sciences School of Medicine University of Maryland, Baltimore On Dec 2, 2009, at 2:09 PM, Mark A. Jensen wrote: > Hi Brandi- > If $cds is a Bio::SeqFeature::Generic, that's weird (I believe); if its an > ordinary Bio::Seq, that's normal. > Can you elaborate by posting your code? > cheers, > MAJ > ----- Original Message ----- From: "Brandi Cantarel" > > To: > Sent: Wednesday, December 02, 2009 1:36 PM > Subject: [Bioperl-l] Parsing Genbank > > >> Hi all, >> I am not sure if this is normal, but when I use SEQIO to parse genbank files, >> it changes the coordinates of things on the minus strand. >> >> >> For example, I have a sequence that has a CDS on the minus strand at it is >> from 911 to 974. The sequence is 974 nt. >> >> x $cds->start >> 1 >> x $cds->end >> 64 >> >> How can I get the original coordinates? Is there a command for that or will >> I have to just do the math? >> >> Feature or Bug? >> >> >> ~~~~~~~~~~~~~~~~~~~~ >> Brandi Cantarel, PhD >> Bioinformatics Analyst >> Institute for Genome Sciences >> School of Medicine >> University of Maryland, Baltimore >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Dec 2 14:39:40 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 2 Dec 2009 13:39:40 -0600 Subject: [Bioperl-l] Parsing Genbank In-Reply-To: <854933BD-4EF5-4A74-8CE2-62E11AD64B6A@som.umaryland.edu> References: <854933BD-4EF5-4A74-8CE2-62E11AD64B6A@som.umaryland.edu> Message-ID: <0E82A338-9D28-4685-A7DA-5019060D96F5@illinois.edu> That one's odd; the coordinates should relate back to the original sequence. Any chance you could pass on the sequence file so we can confirm it? you can do this off-list if the information is sensitive, or you can create a faux sequence that has the same problem). chris On Dec 2, 2009, at 1:29 PM, Brandi Cantarel wrote: > Here is some of my code, the real code actually enters the data into a database. > > > $in = Bio::SeqIO->new(-file => $gbkfile, > '-format' => 'genbank'); > > W1:while (my $seq = $in->next_seq()) { > my @feats = $seq->get_all_SeqFeatures(); > my $j = 0; > F1:foreach $cds (@feats) { > next F1 unless ($cds->primary_tag() eq 'CDS'); > #do something with the cds start and cds end > } > } > > > LOCUS subjpool12_contig3 974 bp DNA linear UNK 19-Nov-2009 > ACCESSION subjpool12_contig3 > KEYWORDS . > SOURCE human metagenome > ORGANISM human metagenome > unclassified sequences; organismal metagenomes,metagenomes. > FEATURES Location/Qualifiers > source 1..974 > /mol_type="genomic DNA" > /isolation_source="Homo sapiens" > /organism="human metagenome" > /collection_date="19-Nov-2009" > CDS complement(911..974) > /locus_tag="subjpool12_contig3|metagene|gene_2" > /translation="IRIMTVELINPYIRHVEHST" > /score="2.52804" > /product="hypothetical protein" > /note="score=2.52804" > /note="score=2.52804" > /note="frame=1" > ORIGIN > #some sequence?. > > > > >> From this example, I would like to get the coordinates 911 and 974, rather than 1 and 64. > > > > ~~~~~~~~~~~~~~~~~~~~ > Brandi Cantarel, PhD > Bioinformatics Analyst > Institute for Genome Sciences > School of Medicine > University of Maryland, Baltimore > > On Dec 2, 2009, at 2:09 PM, Mark A. Jensen wrote: > >> Hi Brandi- >> If $cds is a Bio::SeqFeature::Generic, that's weird (I believe); if its an ordinary Bio::Seq, that's normal. >> Can you elaborate by posting your code? >> cheers, >> MAJ >> ----- Original Message ----- From: "Brandi Cantarel" >> To: >> Sent: Wednesday, December 02, 2009 1:36 PM >> Subject: [Bioperl-l] Parsing Genbank >> >> >>> Hi all, >>> I am not sure if this is normal, but when I use SEQIO to parse genbank files, it changes the coordinates of things on the minus strand. >>> >>> >>> For example, I have a sequence that has a CDS on the minus strand at it is from 911 to 974. The sequence is 974 nt. >>> >>> x $cds->start >>> 1 >>> x $cds->end >>> 64 >>> >>> How can I get the original coordinates? Is there a command for that or will I have to just do the math? >>> >>> Feature or Bug? >>> >>> >>> ~~~~~~~~~~~~~~~~~~~~ >>> Brandi Cantarel, PhD >>> Bioinformatics Analyst >>> Institute for Genome Sciences >>> School of Medicine >>> University of Maryland, Baltimore >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Wed Dec 2 15:52:28 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 2 Dec 2009 15:52:28 -0500 Subject: [Bioperl-l] Parsing Genbank In-Reply-To: <001B6793-D1C3-46EF-AA96-CCA1B684AD8E@som.umaryland.edu> References: <854933BD-4EF5-4A74-8CE2-62E11AD64B6A@som.umaryland.edu> <24B3D1A1667D44338CDE5A4FFE425C56@NewLife> <001B6793-D1C3-46EF-AA96-CCA1B684AD8E@som.umaryland.edu> Message-ID: <07332179362A4D53ACAA9A72AD208049@NewLife> Yes, 1.006 is 1.6. There is a later update 1.6.1, but it sounds as if there is a bug. If you can provide data that can reproduce it, as Chris suggests, we can get onto it. thanks MAJ ----- Original Message ----- From: Brandi Cantarel To: Mark A. Jensen Sent: Wednesday, December 02, 2009 3:38 PM Subject: Re: [Bioperl-l] Parsing Genbank How can I tell what version I am using?When I use the command from the website: perl -MBio::Root::Version -e 'printf "%vd\n", $Bio::Root::Version::VERSION' I get 1.006, but the bioperl lib was updated in July, so probably 1.6.0 version since that was the last stable release?. Brandi On Dec 2, 2009, at 2:48 PM, Mark A. Jensen wrote: with fake seq data and that header, I don't get a problem: DB<2> x $cds->location 0 Bio::Location::Simple=HASH(0x37b1df4) '_end' => 974 '_location_type' => 'EXACT' '_root_verbose' => 0 '_seqid' => 'subjpool12_contig3' '_start' => 911 '_strand' => '-1' Are you using the latest BioPerl (1.6.1 or the trunk) ? MAJ ----- Original Message ----- From: "Brandi Cantarel" Cc: Sent: Wednesday, December 02, 2009 2:29 PM Subject: Re: [Bioperl-l] Parsing Genbank Here is some of my code, the real code actually enters the data into a database. $in = Bio::SeqIO->new(-file => $gbkfile, '-format' => 'genbank'); W1:while (my $seq = $in->next_seq()) { my @feats = $seq->get_all_SeqFeatures(); my $j = 0; F1:foreach $cds (@feats) { next F1 unless ($cds->primary_tag() eq 'CDS'); ###>> debugger stops here for above output #do something with the cds start and cds end } } LOCUS subjpool12_contig3 974 bp DNA linear UNK 19-Nov-2009 ACCESSION subjpool12_contig3 KEYWORDS . SOURCE human metagenome ORGANISM human metagenome unclassified sequences; organismal metagenomes,metagenomes. FEATURES Location/Qualifiers source 1..974 /mol_type="genomic DNA" /isolation_source="Homo sapiens" /organism="human metagenome" /collection_date="19-Nov-2009" CDS complement(911..974) /locus_tag="subjpool12_contig3|metagene|gene_2" /translation="IRIMTVELINPYIRHVEHST" /score="2.52804" /product="hypothetical protein" /note="score=2.52804" /note="score=2.52804" /note="frame=1" ORIGIN #some sequence?. From this example, I would like to get the coordinates 911 and 974, rather than 1 and 64. ~~~~~~~~~~~~~~~~~~~~ Brandi Cantarel, PhD Bioinformatics Analyst Institute for Genome Sciences School of Medicine University of Maryland, Baltimore On Dec 2, 2009, at 2:09 PM, Mark A. Jensen wrote: Hi Brandi- If $cds is a Bio::SeqFeature::Generic, that's weird (I believe); if its an ordinary Bio::Seq, that's normal. Can you elaborate by posting your code? cheers, MAJ ----- Original Message ----- From: "Brandi Cantarel" To: Sent: Wednesday, December 02, 2009 1:36 PM Subject: [Bioperl-l] Parsing Genbank Hi all, I am not sure if this is normal, but when I use SEQIO to parse genbank files, it changes the coordinates of things on the minus strand. For example, I have a sequence that has a CDS on the minus strand at it is from 911 to 974. The sequence is 974 nt. x $cds->start 1 x $cds->end 64 How can I get the original coordinates? Is there a command for that or will I have to just do the math? Feature or Bug? ~~~~~~~~~~~~~~~~~~~~ Brandi Cantarel, PhD Bioinformatics Analyst Institute for Genome Sciences School of Medicine University of Maryland, Baltimore _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Dec 2 16:07:58 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 2 Dec 2009 15:07:58 -0600 Subject: [Bioperl-l] Parsing Genbank In-Reply-To: <07332179362A4D53ACAA9A72AD208049@NewLife> References: <854933BD-4EF5-4A74-8CE2-62E11AD64B6A@som.umaryland.edu> <24B3D1A1667D44338CDE5A4FFE425C56@NewLife> <001B6793-D1C3-46EF-AA96-CCA1B684AD8E@som.umaryland.edu> <07332179362A4D53ACAA9A72AD208049@NewLife> Message-ID: <23AE9399-B370-4DB3-94AA-AC8021AF321E@illinois.edu> One never knows, but I would be very surprised if this somehow snuck by the test suite we have, particularly since Gbrowse extensively uses SeqFeatures (any changes should have popped out along the way). Not much we can do unless we have something to help confirm the problem. Also might help to know the source of the genbank file itself. chris On Dec 2, 2009, at 2:52 PM, Mark A. Jensen wrote: > Yes, 1.006 is 1.6. There is a later update 1.6.1, but it sounds > as if there is a bug. If you can provide data that can reproduce > it, as Chris suggests, we can get onto it. > thanks MAJ > ----- Original Message ----- > From: Brandi Cantarel > To: Mark A. Jensen > Sent: Wednesday, December 02, 2009 3:38 PM > Subject: Re: [Bioperl-l] Parsing Genbank > > > How can I tell what version I am using?When I use the command from the website: > > > perl -MBio::Root::Version -e 'printf "%vd\n", $Bio::Root::Version::VERSION' > > > I get 1.006, but the bioperl lib was updated in July, so probably 1.6.0 version since that was the last stable release?. > > > Brandi > > > > > On Dec 2, 2009, at 2:48 PM, Mark A. Jensen wrote: > > > with fake seq data and that header, I don't get a problem: > > DB<2> x $cds->location > 0 Bio::Location::Simple=HASH(0x37b1df4) > '_end' => 974 > '_location_type' => 'EXACT' > '_root_verbose' => 0 > '_seqid' => 'subjpool12_contig3' > '_start' => 911 > '_strand' => '-1' > > Are you using the latest BioPerl (1.6.1 or the trunk) ? > MAJ > ----- Original Message ----- From: "Brandi Cantarel" > Cc: > Sent: Wednesday, December 02, 2009 2:29 PM > Subject: Re: [Bioperl-l] Parsing Genbank > > > Here is some of my code, the real code actually enters the data into a database. > > > $in = Bio::SeqIO->new(-file => $gbkfile, > '-format' => 'genbank'); > > W1:while (my $seq = $in->next_seq()) { > my @feats = $seq->get_all_SeqFeatures(); > my $j = 0; > F1:foreach $cds (@feats) { > next F1 unless ($cds->primary_tag() eq 'CDS'); > ###>> debugger stops here for above output > > #do something with the cds start and cds end > } > } > > > LOCUS subjpool12_contig3 974 bp DNA linear UNK 19-Nov-2009 > ACCESSION subjpool12_contig3 > KEYWORDS . > SOURCE human metagenome > ORGANISM human metagenome > unclassified sequences; organismal metagenomes,metagenomes. > FEATURES Location/Qualifiers > source 1..974 > /mol_type="genomic DNA" > /isolation_source="Homo sapiens" > /organism="human metagenome" > /collection_date="19-Nov-2009" > CDS complement(911..974) > /locus_tag="subjpool12_contig3|metagene|gene_2" > /translation="IRIMTVELINPYIRHVEHST" > /score="2.52804" > /product="hypothetical protein" > /note="score=2.52804" > /note="score=2.52804" > /note="frame=1" > ORIGIN > #some sequence?. > > > > > > From this example, I would like to get the coordinates 911 and 974, rather than 1 and 64. > > > > > ~~~~~~~~~~~~~~~~~~~~ > Brandi Cantarel, PhD > Bioinformatics Analyst > Institute for Genome Sciences > School of Medicine > University of Maryland, Baltimore > > On Dec 2, 2009, at 2:09 PM, Mark A. Jensen wrote: > > > Hi Brandi- > > If $cds is a Bio::SeqFeature::Generic, that's weird (I believe); if its an ordinary Bio::Seq, that's normal. > > Can you elaborate by posting your code? > > cheers, > > MAJ > > ----- Original Message ----- From: "Brandi Cantarel" > > To: > > Sent: Wednesday, December 02, 2009 1:36 PM > > Subject: [Bioperl-l] Parsing Genbank > > > > > > Hi all, > > I am not sure if this is normal, but when I use SEQIO to parse genbank files, it changes the coordinates of things on the minus strand. > > > > > > For example, I have a sequence that has a CDS on the minus strand at it is from 911 to 974. The sequence is 974 nt. > > > > x $cds->start > > 1 > > x $cds->end > > 64 > > > > How can I get the original coordinates? Is there a command for that or will I have to just do the math? > > > > Feature or Bug? > > > > > > ~~~~~~~~~~~~~~~~~~~~ > > Brandi Cantarel, PhD > > Bioinformatics Analyst > > Institute for Genome Sciences > > School of Medicine > > University of Maryland, Baltimore > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From lstein at cshl.edu Thu Dec 3 05:31:31 2009 From: lstein at cshl.edu (Lincoln Stein) Date: Thu, 3 Dec 2009 05:31:31 -0500 Subject: [Bioperl-l] modENCODE seeking data managers Message-ID: <6dce9a0b0912030231p740d0ecbj4a7e79a6ab71801d@mail.gmail.com> Hi All, My apologies for spamming the list, but this announcement may be of interest: The modENCODE Data Coordinating Center (Model Organism Encylopedia of DNA Elements; www.modencode.org) is seeking data managers to gather and curate large scale functional genomics data sets in fly and worm. For details, see http://blog.modencode.org/?p=350. Lincoln -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa From dan.bolser at gmail.com Thu Dec 3 06:44:40 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Thu, 3 Dec 2009 11:44:40 +0000 Subject: [Bioperl-l] Merging separate sequence and quality files to FASTQ ? Message-ID: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com> Hi, can someone test the script here on zero length fasta / qual files? http://www.bioperl.org/wiki/Merging_separate_sequence_and_quality_files_to_FASTQ It seems the output has an extra newline in the sequence part of the output (which throws off scripts that rely on the 'four lines per record' structure of the fastq (although I'm not sure if it's illegal fastq). Here is what I see BEGIN $ head one.fna >FVF7ZWH02PFOVG length=0 xy=2116_2074 region=2 $ head one.qual >FVF7ZWH02PFOVG length=0 xy=2116_2074 region=2 $ createFastq.plx one.fna one.qual @FVF7ZWH02PFOVG +FVF7ZWH02PFOVG END Currently I just put in a clause in the script to skip any zero length sequences, but I think the Qual shouldn't output an extra newline like this. Cheers, Dan. -- JHB: Bioinformatics is Biology and Biology is Bioinformatics. From biopython at maubp.freeserve.co.uk Thu Dec 3 07:12:15 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 3 Dec 2009 12:12:15 +0000 Subject: [Bioperl-l] Merging separate sequence and quality files to FASTQ ? In-Reply-To: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com> References: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com> Message-ID: <320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com> On Thu, Dec 3, 2009 at 11:44 AM, Dan Bolser wrote: > Hi, can someone test the script here on zero length fasta / qual files? > > http://www.bioperl.org/wiki/Merging_separate_sequence_and_quality_files_to_FASTQ > > It seems the output has an extra newline in the sequence part of the > output (which throws off scripts that rely on the 'four lines per > record' structure of the fastq (although I'm not sure if it's illegal > fastq). Hi Dan, The OBF consensus was FASTQ records with a zero length sequence might be useful, and should be output as exactly four lines (one blank sequence line, one blank quality line). However for parsing, any number of blank lines should be OK. http://lists.open-bio.org/pipermail/open-bio-l/2009-July/000522.html I can confirm the perl script currently outputs a FASTQ file with TWO blank lines for the sequence, giving five lines in total for the zero length record. That does suggest a bug. What version of BioPerl are you running? Peter P.S. The script is throwing away any description after the identifier. From dan.bolser at gmail.com Thu Dec 3 08:07:27 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Thu, 3 Dec 2009 13:07:27 +0000 Subject: [Bioperl-l] Merging separate sequence and quality files to FASTQ ? In-Reply-To: <320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com> References: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com> <320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com> Message-ID: <2c8757af0912030507h1b3d12b8p60c8b405792d6579@mail.gmail.com> 2009/12/3 Peter : > On Thu, Dec 3, 2009 at 11:44 AM, Dan Bolser wrote: >> Hi, can someone test the script here on zero length fasta / qual files? >> >> http://www.bioperl.org/wiki/Merging_separate_sequence_and_quality_files_to_FASTQ >> >> It seems the output has an extra newline in the sequence part of the >> output (which throws off scripts that rely on the 'four lines per >> record' structure of the fastq (although I'm not sure if it's illegal >> fastq). > > Hi Dan, > > The OBF consensus was FASTQ records with a zero length > sequence might be useful, and should be output as exactly > four lines (one blank sequence line, one blank quality line). > However for parsing, any number of blank lines should be OK. > http://lists.open-bio.org/pipermail/open-bio-l/2009-July/000522.html > > I can confirm the perl script currently outputs a FASTQ file > with TWO blank lines for the sequence, giving five lines in > total for the zero length record. That does suggest a bug. > What version of BioPerl are you running? Hi Peter, Basically, I'm not running the 'latest' version of BP, which is why I asked this question of the list rather than filing a bug report. What version are you running? ;-) Sounds like 5 lines instead of the expected 4 is a minor bug. (Thanks for the info). > Peter > > P.S. The script is throwing away any description after the > identifier. That's probably bad. Feel free to edit the script on the wiki. Sadly, MediaWiki's diff features are less than optimal, so developing scripts on the wiki isn't ideal. Anyone know how to plug git-hub into a script apparently hosted on a wiki? Or is git-hub basically designed to be 'wiki for code'? I'm wondering, because with the FlaggedRevs extension you could basically build a whole release in the wiki. Which would be fun if nothing else! -- JHP: Biology is bioinformatics and bioinformatics is biology. From heyne at informatik.uni-freiburg.de Thu Dec 3 08:19:51 2009 From: heyne at informatik.uni-freiburg.de (Steffen Heyne) Date: Thu, 03 Dec 2009 14:19:51 +0100 Subject: [Bioperl-l] problem with alignments and sequence locations In-Reply-To: References: <4AF962AA.7060908@informatik.uni-freiburg.de> Message-ID: <4B17BAF7.2050604@informatik.uni-freiburg.de> Hello, so I tried to fix the problem with the location. Currently it works for me with the following changes: LocatableSeq.pm sub get_nse{ ... my $ret; if ($self->strand() >= 0) { $ret = $id . $v. $char1 . $st . $char2 . $end ; } else { $ret = $id . $v. $char1 . $end . $char2 . $st ; } return $ret; } Then I recognized during the usage of $aln->remove_seq() that it cannot remove a seq as it uses a wrong NSE to lookup sequences. I changed the following: SimpleAlign.pm sub remove_seq { ... $id = $seq->id(); $start = $seq->start(); $end = $seq->end(); ## changed code: my $v = $seq->version ? '.'.$seq->version : ''; if ($seq->strand >=0){ $name = sprintf("%s%s/%d-%d",$id,$v,$start,$end); } elsif ($seq->strand == -1){ $name = sprintf("%s%s/%d-%d",$id,$v,$end,$start); } ... } The above code in LocatableSeq.pm worked in the case if I read an alignment in stockholm format and write it out in clustalw format. But if I read an alignment in clustalw and write it out as stockholm (or something else) it didn't worked, as the strand is not correctly set in ClustalW::next_aln. It works with the following changes: ClustalW.pm sub next_aln{ ... my ( $sname, $start, $end, $strand ); ## strand added $strand = 0; ## new, standard = 0??? foreach my $name ( sort { $order{$a} <=> $order{$b} } keys %alignments ) { if ( $name =~ /(\S+):(\d+)-(\d+)/ ) { ( $sname, $start, $end ) = ( $1, $2, $3 ); $strand = 1; ## new if ($start > $end) { ## new ($start, $end, $strand) = ($end, $start, -1); ##new } ## new } else { ( $sname, $start ) = ( $name, 1 ); my $str = $alignments{$name}; $str =~ s/[^A-Za-z]//g; $end = length($str); } my $seq = Bio::LocatableSeq->new( -seq => $alignments{$name}, -id => $sname, -start => $start, -end => $end, -strand=> $strand ## new ); ... } So I don't know if I changed things at their correct position. And I found them only because I used certain functions. I dont know how broad the effect of a changed NSE in LocatableSeq.pm is to other Modules and functions. But I'm happy with my changes (so far :-)...). Do you will change this to your proposed way in bioperl trunk? Thanks! steffen Chris Fields schrieb: > On Nov 10, 2009, at 6:55 AM, Steffen Heyne wrote: > >> Hi, >> >> I'm using Bioperl for my research and it is very useful! Thank you! >> >> Currently I have a problem with locations tags of sequences. I read in >> seed alignments of Rfam (in stockholm format, but I think it is >> similar to other formats). >> >> If the location is like: >> >> AB194432.1/908-846 >> >> the start/end values are changed to >> >> $seq->start = 846 >> $seq->end = 908 >> >> and therefore the new location (e.g.$seq->get_nse) is: >> >> AB194432.1/846-908 >> >> The $seq->strand tag is correctly set to -1 in this case, but if the >> alignment is written out again (clustal, stockholm,...) this strand >> info is lost and the sequences have this "wrong" location. But this >> information is important in respect to the sequence accession number. >> >> Is there a way to set the location back to the original one or is this >> behavior desired? Any manually setting with $seq->start($val) failed >> due to automatic checking. >> >> I'm using bioperl 1.6.1 >> >> Thanks! >> >> steffen > > This is a definite bug. We recently discussed amending the NSE format > due to this (the subject came up over the last few months or so); it's > fallen through the cracks. Fortunaely it is very easy to fix (the > relevant method is in LocatableSeq). > > Does anyone have a problem with me adding this in? It will change > output for only those instances where the strand is -1, so > > AB194432.1/908-846 > > would be start = 846, end = 908, strand = -1 > > AB194432.1/846-908 > > would be start = 846, end = 908, strand = 1 > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- --- Steffen Heyne, Dipl.-Bioinf. Lehrstuhl f?r Bioinformatik Institut f?r Informatik Albert-Ludwigs-Universit?t Freiburg Georges-K?hler-Allee 106 79110 Freiburg, Germany Tel: (+49) 761 203 7465 Fax: (+49) 761 203 7462 Mail: heyne at informatik.uni-freiburg.de From cjfields at illinois.edu Thu Dec 3 08:47:32 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 3 Dec 2009 07:47:32 -0600 Subject: [Bioperl-l] Merging separate sequence and quality files to FASTQ ? In-Reply-To: <2c8757af0912030507h1b3d12b8p60c8b405792d6579@mail.gmail.com> References: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com> <320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com> <2c8757af0912030507h1b3d12b8p60c8b405792d6579@mail.gmail.com> Message-ID: <747E7628-2A72-4064-A7B9-65A904ACDFE7@illinois.edu> Dan, On Dec 3, 2009, at 7:07 AM, Dan Bolser wrote: > 2009/12/3 Peter : >> On Thu, Dec 3, 2009 at 11:44 AM, Dan Bolser wrote: >>> Hi, can someone test the script here on zero length fasta / qual files? >>> >>> http://www.bioperl.org/wiki/Merging_separate_sequence_and_quality_files_to_FASTQ >>> >>> It seems the output has an extra newline in the sequence part of the >>> output (which throws off scripts that rely on the 'four lines per >>> record' structure of the fastq (although I'm not sure if it's illegal >>> fastq). >> >> Hi Dan, >> >> The OBF consensus was FASTQ records with a zero length >> sequence might be useful, and should be output as exactly >> four lines (one blank sequence line, one blank quality line). >> However for parsing, any number of blank lines should be OK. >> http://lists.open-bio.org/pipermail/open-bio-l/2009-July/000522.html >> >> I can confirm the perl script currently outputs a FASTQ file >> with TWO blank lines for the sequence, giving five lines in >> total for the zero length record. That does suggest a bug. >> What version of BioPerl are you running? > > Hi Peter, > > Basically, I'm not running the 'latest' version of BP, which is why I > asked this question of the list rather than filing a bug report. What > version are you running? ;-) > > Sounds like 5 lines instead of the expected 4 is a minor bug. (Thanks > for the info). FASTQ parsing had undergone a major revision prior to 1.6.1 (the latest release in CPAN). Basically, it now parses all three FASTQ variants. However, Peter indicates there may still be a problem, and it's likely he's running 1.6.1. Peter can you confirm that? >> Peter >> >> P.S. The script is throwing away any description after the >> identifier. > > That's probably bad. Feel free to edit the script on the wiki. Sadly, > MediaWiki's diff features are less than optimal, so developing scripts > on the wiki isn't ideal. Anyone know how to plug git-hub into a script > apparently hosted on a wiki? > > Or is git-hub basically designed to be 'wiki for code'? It's more an integrated solution for hosting code via git, with a wiki, bug queue, etc. Think Soourceforge, but a lot nicer and with no ads ;> BitBucket/Hg is another (very nice) solution along the same lines, developed in Python (Github is Ruby-centric). > I'm wondering, because with the FlaggedRevs extension you could > basically build a whole release in the wiki. Which would be fun if > nothing else! I'm not following you there. Could you elaborate on why that would be beneficial? I could see ( chris From biopython at maubp.freeserve.co.uk Thu Dec 3 09:20:32 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 3 Dec 2009 14:20:32 +0000 Subject: [Bioperl-l] Merging separate sequence and quality files to FASTQ ? In-Reply-To: <747E7628-2A72-4064-A7B9-65A904ACDFE7@illinois.edu> References: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com> <320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com> <2c8757af0912030507h1b3d12b8p60c8b405792d6579@mail.gmail.com> <747E7628-2A72-4064-A7B9-65A904ACDFE7@illinois.edu> Message-ID: <320fb6e00912030620m6ce87fc6t310750969e320be7@mail.gmail.com> On Thu, Dec 3, 2009 at 1:47 PM, Chris Fields wrote: > > FASTQ parsing had undergone a major revision prior to > 1.6.1 (the latest release in CPAN). ?Basically, it now parses > all three FASTQ variants. ?However, Peter indicates there > may still be a problem, and it's likely he's running 1.6.1. > Peter can you confirm that? I had BioPerl from SVN circa 1.6.1 (not sure if this was before or after the release of 1.6.1 now): $ perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"' 1.0069 $ perl -MBio::SeqIO -e 'print $Bio::SeqIO::VERSION,"\n"' 1.0069 If the tuples mean anything to you: $ perl -MBio::Root::Version -e 'printf "%vd\n", $Bio::Root::Version::VERSION' 49.46.48.48.54.57 $ perl -MBio::SeqIO -e 'printf "%vd\n", $Bio::SeqIO::VERSION' 49.46.48.48.54.57 I just updated to revision 16435, and retested. I get the same BioPerl version numbers, and the same extra blank line in the sequence FASTQ output as Dan reported. Peter From cjfields at illinois.edu Thu Dec 3 09:39:35 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 3 Dec 2009 08:39:35 -0600 Subject: [Bioperl-l] Merging separate sequence and quality files to FASTQ ? In-Reply-To: <320fb6e00912030620m6ce87fc6t310750969e320be7@mail.gmail.com> References: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com> <320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com> <2c8757af0912030507h1b3d12b8p60c8b405792d6579@mail.gmail.com> <747E7628-2A72-4064-A7B9-65A904ACDFE7@illinois.edu> <320fb6e00912030620m6ce87fc6t310750969e320be7@mail.gmail.com> Message-ID: On Dec 3, 2009, at 8:20 AM, Peter wrote: > On Thu, Dec 3, 2009 at 1:47 PM, Chris Fields wrote: >> >> FASTQ parsing had undergone a major revision prior to >> 1.6.1 (the latest release in CPAN). Basically, it now parses >> all three FASTQ variants. However, Peter indicates there >> may still be a problem, and it's likely he's running 1.6.1. >> Peter can you confirm that? > > I had BioPerl from SVN circa 1.6.1 (not sure if this was before > or after the release of 1.6.1 now): > > $ perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"' > 1.0069 > $ perl -MBio::SeqIO -e 'print $Bio::SeqIO::VERSION,"\n"' > 1.0069 > > If the tuples mean anything to you: > > $ perl -MBio::Root::Version -e 'printf "%vd\n", $Bio::Root::Version::VERSION' > 49.46.48.48.54.57 > $ perl -MBio::SeqIO -e 'printf "%vd\n", $Bio::SeqIO::VERSION' > 49.46.48.48.54.57 > > I just updated to revision 16435, and retested. I get the same > BioPerl version numbers, and the same extra blank line in the > sequence FASTQ output as Dan reported. > > Peter Okay I will try to look into it today (it should be an easy fix). There are two issues, correct? 1) extra blank line. 2) missing description Dan, could you go ahead and submit this as a bug, just in case (so we don't lose track)? Otherwise it might get lost on the mail list or wiki. chris From biopython at maubp.freeserve.co.uk Thu Dec 3 09:56:39 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 3 Dec 2009 14:56:39 +0000 Subject: [Bioperl-l] Merging separate sequence and quality files to FASTQ ? In-Reply-To: References: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com> <320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com> <2c8757af0912030507h1b3d12b8p60c8b405792d6579@mail.gmail.com> <747E7628-2A72-4064-A7B9-65A904ACDFE7@illinois.edu> <320fb6e00912030620m6ce87fc6t310750969e320be7@mail.gmail.com> Message-ID: <320fb6e00912030656p5b75a566t22e1d2037d945338@mail.gmail.com> On Thu, Dec 3, 2009 at 2:39 PM, Chris Fields wrote: > Okay I will try to look into it today (it should be an easy fix). ?There are two issues, correct? > > 1) extra blank line. Which seems to be a bug in BioPerl SeqIO itself. > 2) missing description This is just a trivial bug/omission in the wiki example, http://www.bioperl.org/wiki/Merging_separate_sequence_and_quality_files_to_FASTQ You just need to replace this: my $bsq_obj = Bio::Seq::Quality-> new( -id => $seq_obj->id, -seq => $seq_obj->seq, -qual => $qual_obj->qual, ); With: my $bsq_obj = Bio::Seq::Quality-> new( -id => $seq_obj->id, -description => $seq_obj->description, -seq => $seq_obj->seq, -qual => $qual_obj->qual, ); Look - I seem to be learning Perl by osmosis ;) Peter From dan.bolser at gmail.com Thu Dec 3 11:29:11 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Thu, 3 Dec 2009 16:29:11 +0000 Subject: [Bioperl-l] Merging separate sequence and quality files to FASTQ ? In-Reply-To: <320fb6e00912030656p5b75a566t22e1d2037d945338@mail.gmail.com> References: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com> <320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com> <2c8757af0912030507h1b3d12b8p60c8b405792d6579@mail.gmail.com> <747E7628-2A72-4064-A7B9-65A904ACDFE7@illinois.edu> <320fb6e00912030620m6ce87fc6t310750969e320be7@mail.gmail.com> <320fb6e00912030656p5b75a566t22e1d2037d945338@mail.gmail.com> Message-ID: <2c8757af0912030829t54e87a4bmf166370ca10e966a@mail.gmail.com> 2009/12/3 Peter : > On Thu, Dec 3, 2009 at 2:39 PM, Chris Fields wrote: >> Okay I will try to look into it today (it should be an easy fix). ?There are two issues, correct? ... >> 2) missing description > > This is just a trivial bug/omission in the wiki example, ... > Look - I seem to be learning Perl by osmosis ;) Yay! From dan.bolser at gmail.com Thu Dec 3 11:30:44 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Thu, 3 Dec 2009 16:30:44 +0000 Subject: [Bioperl-l] Merging separate sequence and quality files to FASTQ ? In-Reply-To: <747E7628-2A72-4064-A7B9-65A904ACDFE7@illinois.edu> References: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com> <320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com> <2c8757af0912030507h1b3d12b8p60c8b405792d6579@mail.gmail.com> <747E7628-2A72-4064-A7B9-65A904ACDFE7@illinois.edu> Message-ID: <2c8757af0912030830n718f8cc7hc9e501919435e4a8@mail.gmail.com> 2009/12/3 Chris Fields : > Dan, > > On Dec 3, 2009, at 7:07 AM, Dan Bolser wrote: ... >> I'm wondering, because with the FlaggedRevs extension you could >> basically build a whole release in the wiki. Which would be fun if >> nothing else! > > I'm not following you there. ?Could you elaborate on why that would be beneficial? ?I could see ( I never said it would be beneficial, only that it would be fun. http://www.mediawiki.org/wiki/Flaggedrevs From florent.angly at gmail.com Thu Dec 3 13:26:57 2009 From: florent.angly at gmail.com (Florent Angly) Date: Thu, 03 Dec 2009 10:26:57 -0800 Subject: [Bioperl-l] problem with alignments and sequence locations In-Reply-To: <4B17BAF7.2050604@informatik.uni-freiburg.de> References: <4AF962AA.7060908@informatik.uni-freiburg.de> <4B17BAF7.2050604@informatik.uni-freiburg.de> Message-ID: <4B1802F1.1040304@gmail.com> Hi all, Like Steffen, I've had a few burning questions too regarding LocatableSeq lately. I've had an occasional issue with LocatableSeq. Most assembly-related modules use LocatableSeq objects. They specify the sequence start but not the sequence end. This works in most cases, but I've recently encountered very occasional error messages related to having not explicitely set the end of the sequence. I've been unable to put together a small test case to reproduce the bug easily. My question is. If the start of the sequence is set, is it mandatory to set the end of the sequence? If so, then maybe the documentation needs to be explicit about it and maybe there needs to be a check that enforces that the end is set. In fact, it seems like if I provide a sequence and its start position, the LocatableSeq code should be able to automatically calculate its end, no? Florent Steffen Heyne wrote: > Hello, > > so I tried to fix the problem with the location. Currently it works for > me with the following changes: > > LocatableSeq.pm > > sub get_nse{ > > ... > > my $ret; > if ($self->strand() >= 0) { > $ret = $id . $v. $char1 . $st . $char2 . $end ; > } else { > $ret = $id . $v. $char1 . $end . $char2 . $st ; > } > return $ret; > } > > Then I recognized during the usage of $aln->remove_seq() that it cannot > remove a seq as it uses a wrong NSE to lookup sequences. I changed the > following: > > SimpleAlign.pm > > sub remove_seq { > > ... > $id = $seq->id(); > $start = $seq->start(); > $end = $seq->end(); > > ## changed code: > > my $v = $seq->version ? '.'.$seq->version : ''; > if ($seq->strand >=0){ > $name = sprintf("%s%s/%d-%d",$id,$v,$start,$end); > } elsif ($seq->strand == -1){ > $name = sprintf("%s%s/%d-%d",$id,$v,$end,$start); > } > ... > > } > > The above code in LocatableSeq.pm worked in the case if I read an > alignment in stockholm format and write it out in clustalw format. But > if I read an alignment in clustalw and write it out as stockholm (or > something else) it didn't worked, as the strand is not correctly set in > ClustalW::next_aln. It works with the following changes: > > ClustalW.pm > > sub next_aln{ > > ... > > my ( $sname, $start, $end, $strand ); ## strand added > $strand = 0; ## new, standard = 0??? > foreach my $name ( sort { $order{$a} <=> $order{$b} } keys > %alignments ) { > if ( $name =~ /(\S+):(\d+)-(\d+)/ ) { > ( $sname, $start, $end ) = ( $1, $2, $3 ); > $strand = 1; ## new > if ($start > $end) { ## new > ($start, $end, $strand) = ($end, $start, -1); ##new > } ## new > > } > else { > ( $sname, $start ) = ( $name, 1 ); > my $str = $alignments{$name}; > $str =~ s/[^A-Za-z]//g; > $end = length($str); > } > > my $seq = Bio::LocatableSeq->new( > -seq => $alignments{$name}, > -id => $sname, > -start => $start, > -end => $end, > -strand=> $strand ## new > ); > > ... > > } > > So I don't know if I changed things at their correct position. And I > found them only because I used certain functions. I dont know how broad > the effect of a changed NSE in LocatableSeq.pm is to other Modules and > functions. But I'm happy with my changes (so far :-)...). > > Do you will change this to your proposed way in bioperl trunk? > > Thanks! > > steffen > > > Chris Fields schrieb: > >> On Nov 10, 2009, at 6:55 AM, Steffen Heyne wrote: >> >> >>> Hi, >>> >>> I'm using Bioperl for my research and it is very useful! Thank you! >>> >>> Currently I have a problem with locations tags of sequences. I read in >>> seed alignments of Rfam (in stockholm format, but I think it is >>> similar to other formats). >>> >>> If the location is like: >>> >>> AB194432.1/908-846 >>> >>> the start/end values are changed to >>> >>> $seq->start = 846 >>> $seq->end = 908 >>> >>> and therefore the new location (e.g.$seq->get_nse) is: >>> >>> AB194432.1/846-908 >>> >>> The $seq->strand tag is correctly set to -1 in this case, but if the >>> alignment is written out again (clustal, stockholm,...) this strand >>> info is lost and the sequences have this "wrong" location. But this >>> information is important in respect to the sequence accession number. >>> >>> Is there a way to set the location back to the original one or is this >>> behavior desired? Any manually setting with $seq->start($val) failed >>> due to automatic checking. >>> >>> I'm using bioperl 1.6.1 >>> >>> Thanks! >>> >>> steffen >>> >> This is a definite bug. We recently discussed amending the NSE format >> due to this (the subject came up over the last few months or so); it's >> fallen through the cracks. Fortunaely it is very easy to fix (the >> relevant method is in LocatableSeq). >> >> Does anyone have a problem with me adding this in? It will change >> output for only those instances where the strand is -1, so >> >> AB194432.1/908-846 >> >> would be start = 846, end = 908, strand = -1 >> >> AB194432.1/846-908 >> >> would be start = 846, end = 908, strand = 1 >> >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > From cjfields at illinois.edu Thu Dec 3 23:16:48 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 3 Dec 2009 22:16:48 -0600 Subject: [Bioperl-l] Merging separate sequence and quality files to FASTQ ? In-Reply-To: <2c8757af0912030830n718f8cc7hc9e501919435e4a8@mail.gmail.com> References: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com> <320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com> <2c8757af0912030507h1b3d12b8p60c8b405792d6579@mail.gmail.com> <747E7628-2A72-4064-A7B9-65A904ACDFE7@illinois.edu> <2c8757af0912030830n718f8cc7hc9e501919435e4a8@mail.gmail.com> Message-ID: <37058F8C-419E-4E88-AC4F-543FF9B563E1@illinois.edu> On Dec 3, 2009, at 10:30 AM, Dan Bolser wrote: > 2009/12/3 Chris Fields : >> Dan, >> >> On Dec 3, 2009, at 7:07 AM, Dan Bolser wrote: > > ... > >>> I'm wondering, because with the FlaggedRevs extension you could >>> basically build a whole release in the wiki. Which would be fun if >>> nothing else! >> >> I'm not following you there. Could you elaborate on why that would be beneficial? I could see ( > > I never said it would be beneficial, only that it would be fun. > > http://www.mediawiki.org/wiki/Flaggedrevs Ah, okay, that makes some sense. Just to stay on subject, committed a fix (r16439) to bioperl-live that addresses the additional newline issue. chris From rtbio.2009 at gmail.com Fri Dec 4 08:57:21 2009 From: rtbio.2009 at gmail.com (Roopa Raghuveer) Date: Fri, 4 Dec 2009 14:57:21 +0100 Subject: [Bioperl-l] Regarding Organism based search in Remote blast Message-ID: Hello all, I am working on Remote blast.Here,I am trying to get 2 parameters into the remote blast code.They are 1.The input sequence that has to be sent to blast 2.Organism (The organism which has to be searched for ex:-Trypanasoma brucei etc.,) When I tried to take the organism parameter as an input from the user,through a web page,the Remote blast was not giving any results i.e., it says that there are no alignments found. But,when I hard coded the organism in the code,it gives me the results i.e., 3hits. I could not understand this problem.Could any body please help me in this regard? My code is sub blastcode { $input1= $_[0]; $organ= $_[1]; open(NUC,'>',$nuc); print NUC $input1; close(NUC); my $prog = 'blastn'; my $db = 'refseq_rna'; my $e_val= '1e-10'; my $organism= $organ; $gb = new Bio::DB::GenBank; my @params = ( '-prog' => $prog, '-data' => $db, '-expect' => $e_val, '-readmethod' => 'SearchIO', '-Organism' => $organism ); open(OUTFILE,'>',$debugfile); print OUTFILE @params; close(OUTFILE); my $factory = Bio::Tools::Run::RemoteBlast->new(@params); #change a paramter $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$organism[ORGN]'; #change a paramter # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]'; my $v = 1; #$v is just to turn on and off the messages my $str = Bio::SeqIO->new(-file => $nuc , '-format' => 'fasta' , '-Organism' => $organism ); while (my $input = $str->next_seq()) { #Blast a sequence against a database: #Alternatively, you could pass in a file with many #sequences rather than loop through sequence one at a time #Remove the loop starting 'while (my $input = $str->next_seq())' #and swap the two lines below for an example of that. my $r = $factory->submit_blast($input); # my $r = $factory->submit_blast('amino.fa'); print STDERR "waiting...." if($v>0); while ( my @rids = $factory->each_rid ) { foreach my $rid ( @rids ) { my $rc = $factory->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $factory->remove_rid($rid); } print STDERR "." if ( $v > 0 ); sleep 5; } else { my $result = $rc->next_result(); #save the output $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; # open(BLASTDEBUGFILE,'>',$debugfile); # print BLASTDEBUGFILE $result->next_hit(); # close(BLASTDEBUGFILE); my $filename = $serverpath."/blastdata_".time().$result->query_name()."\.out"; # open(DEBUGFILE,'>',$debugfile); # open(new,'>',$filename); # @arra=; # print DEBUGFILE @arra; # close(DEBUGFILE); # close(new); $factory->save_output($filename); # open(BLASTDEBUGFILE,'>',$debugfile); # print BLASTDEBUGFILE "Hello $rid"; # close(BLASTDEBUGFILE); $factory->remove_rid($rid); open(BLASTDEBUGFILE,'>',$blastdebugfile); print BLASTDEBUGFILE $organism; close(BLASTDEBUGFILE); # open(OUTFILE,'>',$outfile); # print OUTFILE "Test2 $result->database_name()"; # close(OUTFILE); #$hit = $result->next_hit; #open(new,'>',$debugfile); #print $hit; #close(new); while ( my $hit = $result->next_hit ) { next unless ( $v > 0); # open(OUTFILE,'>',$debugfile); # print OUTFILE "$hit in while hits"; # close(OUTFILE); my $sequ = $gb->get_Seq_by_version($hit->name); my $dna = $sequ->seq(); # get the sequence as a string push(@seqs,$dna); } } } } } #open(OUTFILE,'>',$debugfile); #print OUTFILE $seqs[0]; #close(OUTFILE); return(@seqs); } Regards, Roopa. From cjfields at illinois.edu Fri Dec 4 09:59:17 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 4 Dec 2009 08:59:17 -0600 Subject: [Bioperl-l] Regarding Organism based search in Remote blast In-Reply-To: References: Message-ID: <77EDAB6B-68B5-460C-AD9F-EB45B9C3AFF7@illinois.edu> Roopa, At one point a couple of parameters differed between NCBI's web interface and our RemoteBlast-based BLAST interface to URLAPI (this should be indicated in your BLAST reports). See here: http://thread.gmane.org/gmane.comp.lang.perl.bio.general/14155 Also, are the returned hits specific for the genome? You shoudl double-check; in some cases you have to set both HEADER and RETRIEVALHEADER to get the expected results (not sure why): http://article.gmane.org/gmane.comp.lang.perl.bio.general/18737/match=remoteblast+ncbi chris On Dec 4, 2009, at 7:57 AM, Roopa Raghuveer wrote: > Hello all, > > I am working on Remote blast.Here,I am trying to get 2 parameters into the > remote blast code.They are > > 1.The input sequence that has to be sent to blast > > 2.Organism (The organism which has to be searched for ex:-Trypanasoma brucei > etc.,) > > When I tried to take the organism parameter as an input from the > user,through a web page,the Remote blast was not giving any results i.e., it > says that there are no alignments found. > > But,when I hard coded the organism in the code,it gives me the results i.e., > 3hits. > > I could not understand this problem.Could any body please help me in this > regard? > > My code is > > sub blastcode > { > > $input1= $_[0]; > > $organ= $_[1]; > > open(NUC,'>',$nuc); > print NUC $input1; > close(NUC); > > my $prog = 'blastn'; > my $db = 'refseq_rna'; > my $e_val= '1e-10'; > my $organism= $organ; > > $gb = new Bio::DB::GenBank; > > my @params = ( '-prog' => $prog, > '-data' => $db, > '-expect' => $e_val, > '-readmethod' => 'SearchIO', > '-Organism' => $organism ); > > open(OUTFILE,'>',$debugfile); > print OUTFILE @params; > close(OUTFILE); > > > my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > > #change a paramter > $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$organism[ORGN]'; > #change a paramter > # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]'; > > my $v = 1; > #$v is just to turn on and off the messages > > my $str = Bio::SeqIO->new(-file => $nuc , '-format' => 'fasta' , > '-Organism' => $organism ); > > while (my $input = $str->next_seq()) > > { > #Blast a sequence against a database: > #Alternatively, you could pass in a file with many > #sequences rather than loop through sequence one at a time > #Remove the loop starting 'while (my $input = $str->next_seq())' > #and swap the two lines below for an example of that. > > my $r = $factory->submit_blast($input); > > # my $r = $factory->submit_blast('amino.fa'); > > print STDERR "waiting...." if($v>0); > > while ( my @rids = $factory->each_rid ) { > > foreach my $rid ( @rids ) { > > my $rc = $factory->retrieve_blast($rid); > > if( !ref($rc) ) > { > if( $rc < 0 ) > { > $factory->remove_rid($rid); > } > print STDERR "." if ( $v > 0 ); > sleep 5; > } > else { > my $result = $rc->next_result(); > #save the output > $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; > > # open(BLASTDEBUGFILE,'>',$debugfile); > # print BLASTDEBUGFILE $result->next_hit(); > # close(BLASTDEBUGFILE); > > my $filename = > $serverpath."/blastdata_".time().$result->query_name()."\.out"; > > # open(DEBUGFILE,'>',$debugfile); > # open(new,'>',$filename); > # @arra=; > # print DEBUGFILE @arra; > # close(DEBUGFILE); > # close(new); > $factory->save_output($filename); > > # open(BLASTDEBUGFILE,'>',$debugfile); > # print BLASTDEBUGFILE "Hello $rid"; > # close(BLASTDEBUGFILE); > > $factory->remove_rid($rid); > > open(BLASTDEBUGFILE,'>',$blastdebugfile); > print BLASTDEBUGFILE $organism; > close(BLASTDEBUGFILE); > > # open(OUTFILE,'>',$outfile); > # print OUTFILE "Test2 $result->database_name()"; > # close(OUTFILE); > > #$hit = $result->next_hit; > #open(new,'>',$debugfile); > #print $hit; > #close(new); > > while ( my $hit = $result->next_hit ) { > > next unless ( $v > 0); > > # open(OUTFILE,'>',$debugfile); > # print OUTFILE "$hit in while hits"; > # close(OUTFILE); > > my $sequ = $gb->get_Seq_by_version($hit->name); > my $dna = $sequ->seq(); # get the sequence as a string > push(@seqs,$dna); > } > } > } > } > } > > #open(OUTFILE,'>',$debugfile); > #print OUTFILE $seqs[0]; > #close(OUTFILE); > > return(@seqs); > } > > Regards, > Roopa. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From robert.bradbury at gmail.com Fri Dec 4 13:27:38 2009 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Fri, 4 Dec 2009 13:27:38 -0500 Subject: [Bioperl-l] Gene critical region analysis -- visual display Message-ID: Background: I have been involved in aging research off and on for ~16 years. My initial focus was in the eventual decline of the "program" (because DNA has no ECC and only limited redundancy) therefore my initial work (in the early 1990's was focused on DNA repair genes (of which there about 150 in the human genome) [1,2]. Most recently I have focused in on the DNA double strand break repair processes (NHEJ) as a fundamental cause of aging because it may fundamentally corrupt the genomes of individual cells. (And as most programmers would agree -- break the code and you break the program). Michael Lieber at UCLA has estimated that by the time a human is ~70 on the order of several hundred genes in ones cells have been corrupted (which may be an indeterminate effect on the cells functioning). Problem: Just looking at the GenBank output for the human Artemis (DCLRE1C) gene there are on the order of 18 SNPs and 8 possible phosphorylation sites (not to mention other potential modification sites) -- this combined with the fact that Methionine and Tryptophan and to a lesser extent Cysteine are more susceptible to single base mutations (due the alteration of the codon->amino acid coding even involving single base mutations/repairs) . There are various programs to analyze such proteins for the critical sites -- SIFT and the various programs pointed to by their sites. Now it seems to me that one could attack this problem by integrating SNPs, mutations, etc. at the critical sites (where "critical" may or may not be at normal SNPs -- which presumably are primarily at non-critical sites -- and those proteins where if you change the coding sequence to non-synomonous amino acids you potentially break the protein (the real interpretation of which will not be understood until population studies are done). So, in the process of looking at the DCLRE1C protein I asked myself, "Why is there not a BioPerl function which simply enables a visual interpretation of the critical sites of the protein?" I.e. some color-coded representation of the protein (which presumably has some augmented functionality to determine things like probability or statistical information). I.e. hand the function a .fasta file and it will give you an visual (colored) analysis of the critical nature of specific a.a. -- i.e. something which could be used by genomic or SNP analysis (such as I presume that being done by 23andme -- as well as other organizations) to begin to separate out the variations in the human genome (e.g. SNPs) from the mutations which may effect individuals. I have the C programming and to a lesser extent Perl experience to contribute to this -- I lack the BioPerl wisdom to make it generally available. If anyone has some suggestions as to what functions/modules might be of use (in providing a "single-look" view of gene a.a. whose mutations may be more or less detrimental) I would appreciate hearing from them. Robert Bradbury 1. "DNA Repair and Mutagenesis", E.C. Friedberg et al, 2nd Ed., ASM Press (2006) 2. "Aging of the Genome", J. Vijg, Oxford University Press (2007) From maj at fortinbras.us Sun Dec 6 17:54:00 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sun, 6 Dec 2009 17:54:00 -0500 Subject: [Bioperl-l] bioperl-mode new feature: base class browsing Message-ID: <59494F4102D84535B3A5D05B595ACBF7@NewLife> Hi All, You can now browse pod of the base/parent classes of bioperl modules with one keystroke using the latest update of bioperl-mode. See http://bioperl.org/wiki/Emacs_bioperl-mode Press "B" or "P" while in pod view to get a completion list of the parent classes for the module whose pod you're viewing. cheers, MAJ From mmokrejs at ribosome.natur.cuni.cz Mon Dec 7 15:33:48 2009 From: mmokrejs at ribosome.natur.cuni.cz (=?UTF-8?B?TWFydGluIE1PS1JFSsWg?=) Date: Mon, 07 Dec 2009 21:33:48 +0100 Subject: [Bioperl-l] Generalized reciprocal blast In-Reply-To: References: Message-ID: <4B1D66AC.4080804@ribosome.natur.cuni.cz> Hi, I just stumbled across this older posting ... maybe you want to exploit SIMAP (http://webclu.bio.wzw.tum.de/portal/web/simap/). I think it has remote API available. Martin Robert Bradbury wrote: > I would like to know whether or not anyone has attempted to create a > "generalized" reciprocal blast component for BioPerl? > > One sees papers all the time where they discuss running reciprocal blasts to > compare a new species to an old "standard" species or a set of species or > running an all-to-all set of comparisons to match up all of the "known" > proteins from species and determine which are outliers (and therefore > "novel"). There are also accumulating merged sets in NCBI HomoloGene (which > seems to be a some strict subset (perhaps a dozen) "well sequenced" genomes) > and Ensembl (which seems to be working with a much larger set of 40-50 > genomes some of which may be somewhat incomplete and are certainly poorly > "explored". > > I have, I believe, seen code "fragments" from various authors, perhaps some > on the BioPerl list, which perform some major subset of a typical > "reciprocal blast". > > Now what I am looking for is a relatively generalizable some-to-some > reciprocal blast utility. I want to be able to specify the genes (or gene > family), e.g. some of the ~150 known DNA repair genes. It would be helpful > to also specify how "tolerant" the blast "true reciprocal" criteria are. > There are some genes where there is a very strict 1-to-1 relationship across > many genomes. But for genes which involve relatively standard domains, e.g. > "helicase" domains, the 1-to-1 relationship becomes cloudy -- in mammals for > example its more like 5-to-5 and it would be really nice to be able to > specify the strictness or quality level [1] for "matching" genes (and even > which genes are to be excluded because they are known to be false > homologues). > > Then to top this off I want to be able to combine known public e.g. > (HomoloGene / Uniigene / Ensembl) databases with perhaps local private > databases or database subsets (e.g. emerging or specialized genomes). > > The goal here of course to determine the precise phylogenetic relationships > between all of the DNA repair genes and how there may be gain / loss / > evolution of function that can be related to species characteristics (size, > longevity, etc.). > > Is there a generalized reciprocal blast component in BioPerl? Or is it a > "build-it-yourself" situation (that I have to believe has been built > probably a few dozen times by various researchers / organizations / > companies)? > > Thanks, > Robert Bradbury > > 1. This would be handled in BioPerl with a customizable user function which > could be tailored to handle specific cases -- for example a function which > when handed a set of 100 potential "matches" could go through those 100 > matches, identify common domains, and then "re-rate" matches based on > considerations such as the type and number of common domains, domains being > in the same order, etc. I.e. criteria which may be difficult to completely > generalize across entire genomes but are fairly obvious if you are looking > at a graphical replication of a gene set in HomoloGene. From robert.bradbury at gmail.com Mon Dec 7 15:41:54 2009 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Mon, 7 Dec 2009 15:41:54 -0500 Subject: [Bioperl-l] Remote blast fork errors / Process limit restrictions Message-ID: This comment could also have a subject line: "Why does Bioperl/get_sequence> fork at all! Why are not all operations sequential? And if this is a "default" mode that I'm unaware of -- How to I ever write a reliable BioPerl script if I have little or no capability of what the program uses when it runs? I may have days so I can bear the burden of relatively slow results (and so can use sequential processing rather than parallel). I've got a perl script that uses remote blast to blast a sequence against a subset of the NCBI sequences. It "mostly" works, in that it returns a seemingly complete .bls result file but when attempting to look at the sequences (so it can more accurately summarize the information from the results than a standard blast report allows) it terminates prematurely with errors. The error is: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Couldn't fork: Resource temporarily unavailable STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/vendor_perl/5.8.8/Bio/Root/Root.pm:368 STACK: Bio::DB::WebDBSeqI::_open_pipe /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:722 STACK: Bio::DB::WebDBSeqI::get_seq_stream /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:463 STACK: Bio::DB::NCBIHelper::get_Stream_by_acc /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/NCBIHelper.pm:479 STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:186 STACK: Bio::Perl::get_sequence /usr/lib/perl5/vendor_perl/5.8.8/Bio/Perl.pm:520 STACK: main::acc_2_desc /home/bradbury/Genomes/bin/RB.pl:182 STACK: /home/bradbury/Genomes/bin/RB.pl:155 ----------------------------------------------------------- The precise line (in my code) whcih appears to be generating the error is: $seq = get_sequence('GenBank', $accsn); Now this can be a problem if NCBI/Genbank fails due to load conditions -- but this specific failure (which is repeatable is due to most likely hitting the user process limit restrictions) -- but the small blast results work fine -- its only if the Blast has returned several hundred hits that it runs into this problem. Now what it sounds like to me is an attempt to do multiple asynchronous NCBI queries (to get a sequence) with complete disregard of the environment (process limits, NCBI limits, etc.). But I do not know enough about how this works to point a finger at some specific function. As a result get_sequence process results are accumulated, summarized, etc. without ever having issued to respect "wait-variant()) calls to collect former children [This IMO would clearly be a bug.] It could be adjusted to by allowing the BioPerl library to run in 3 modes. (1) completely synchronous -- if you fork you wait until its done -- and you collect "it" and any fork fails then one either collects the process or switches to the non-conservative mode. Robert From cjfields at illinois.edu Mon Dec 7 16:08:40 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 7 Dec 2009 15:08:40 -0600 Subject: [Bioperl-l] Remote blast fork errors / Process limit restrictions In-Reply-To: References: Message-ID: Robert, If you use the relative components directly (by that I mean use Bio::DB::GenBank and Bio::Tools::Run::RemoteBlast instead of Bio::Perl), you can control whether the process forks or not. All Bio::Perl does is wrap those modules for simple beginner tasks; if you want full control over the various parts of the pipeline you will need to use those tools directly. See the POD for those specific modules for more information. chris On Dec 7, 2009, at 2:41 PM, Robert Bradbury wrote: > This comment could also have a subject line: "Why does Bioperl/get_sequence> > fork at all! Why are not all operations sequential? And if this is a > "default" mode that I'm unaware of -- How to I ever write a reliable BioPerl > script if I have little or no capability of what the program uses when it > runs? I may have days so I can bear the burden of relatively slow results > (and so can use sequential processing rather than parallel). > > I've got a perl script that uses remote blast to blast a sequence against a > subset of the NCBI sequences. It "mostly" works, in that it returns a > seemingly complete .bls result file but when attempting to look at the > sequences (so it can more accurately summarize the information from the > results than a standard blast report allows) it terminates prematurely with > errors. > > The error is: > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Couldn't fork: Resource temporarily unavailable > STACK: Error::throw > STACK: Bio::Root::Root::throw > /usr/lib/perl5/vendor_perl/5.8.8/Bio/Root/Root.pm:368 > STACK: Bio::DB::WebDBSeqI::_open_pipe > /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:722 > STACK: Bio::DB::WebDBSeqI::get_seq_stream > /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:463 > STACK: Bio::DB::NCBIHelper::get_Stream_by_acc > /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/NCBIHelper.pm:479 > STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc > /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:186 > STACK: Bio::Perl::get_sequence > /usr/lib/perl5/vendor_perl/5.8.8/Bio/Perl.pm:520 > STACK: main::acc_2_desc /home/bradbury/Genomes/bin/RB.pl:182 > STACK: /home/bradbury/Genomes/bin/RB.pl:155 > ----------------------------------------------------------- > > The precise line (in my code) whcih appears to be generating the error is: > $seq = get_sequence('GenBank', $accsn); > > Now this can be a problem if NCBI/Genbank fails due to load conditions -- > but this specific failure (which is repeatable is due to most likely hitting > the user process limit restrictions) -- but the small blast results work > fine -- its only if the Blast has returned several hundred hits that it runs > into this problem. > > Now what it sounds like to me is an attempt to do multiple asynchronous NCBI > queries (to get a sequence) with complete disregard of the environment > (process limits, NCBI limits, etc.). But I do not know enough about how > this works to point a finger at some specific function. As a result > get_sequence process results are accumulated, summarized, etc. without ever > having issued to respect "wait-variant()) calls to collect former children > [This IMO would clearly be a bug.] > > It could be adjusted to by allowing the BioPerl library to run in 3 modes. > (1) completely synchronous -- if you fork you wait until its done -- and > you collect "it" and any fork fails then one either collects the process or > switches to the non-conservative mode. > > Robert > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason at bioperl.org Mon Dec 7 16:24:54 2009 From: jason at bioperl.org (Jason Stajich) Date: Mon, 7 Dec 2009 13:24:54 -0800 Subject: [Bioperl-l] Remote blast fork errors / Process limit restrictions In-Reply-To: References: Message-ID: <39420695-FC14-446F-A4C2-C187E2221E44@bioperl.org> Robert - You seem to be mixing the blast remote and the sequence query retrieval problems. These messages are related to the remote retrieval of sequences. It is hard to tell from your message specifically which modules you are using or how you are querying NCBI - there are several ways to do this either with the NCBI tools or the Bio::DB::GenBank. If you are using Bio::DB::Query::GenBank that allows for async access and has built in controls to adhere to the wait variant that NCBI requests but I don't think Bio::DB::GenBank get_Seq_by_acc method does any sort of thing (at least when it was originally written). I always advocate if you want highly available and reliable access to sequences you should download the nr or whichever DB and use the local indexing tools for the retrieval. Once you start doing hundreds of queries I don't see any good reason to be doing the query against NCBI directly given unreliabilities of the web and services. Local databases are faster and more reliable for most people so I urge you take advantage of the tools which provide local database access with the same APIs. I would like to comment that the tone of your posts to the list are not particularly helpful. I wonder if you are actually asking for help or just interested in complaining about when things don't work as you expect? This is a collaborative and volunteer-only project, with the principles of working together to make useful toolkit. We encourage you to build programs and applications from this base that suit your needs, but not all things will be directly implemented in the toolkit if they aren't generic enough (at least that is my feeling, the other Core devs help with these decisions). If there is a useful, generic, and reusable part we would like that to be part of the API. Otherwise we suggest the new application that fits a developer's vision. We encourage you to write (and publish) that application separately, but certainly encourage bug (and fixes) submissions and also code contributions for new features where they can be seen as generally useful. -jason On Dec 7, 2009, at 12:41 PM, Robert Bradbury wrote: > This comment could also have a subject line: "Why does Bioperl/ > get_sequence> > fork at all! Why are not all operations sequential? And if this is a > "default" mode that I'm unaware of -- How to I ever write a reliable > BioPerl > script if I have little or no capability of what the program uses > when it > runs? I may have days so I can bear the burden of relatively slow > results > (and so can use sequential processing rather than parallel). > > I've got a perl script that uses remote blast to blast a sequence > against a > subset of the NCBI sequences. It "mostly" works, in that it returns a > seemingly complete .bls result file but when attempting to look at the > sequences (so it can more accurately summarize the information from > the > results than a standard blast report allows) it terminates > prematurely with > errors. > > The error is: > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Couldn't fork: Resource temporarily unavailable > STACK: Error::throw > STACK: Bio::Root::Root::throw > /usr/lib/perl5/vendor_perl/5.8.8/Bio/Root/Root.pm:368 > STACK: Bio::DB::WebDBSeqI::_open_pipe > /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:722 > STACK: Bio::DB::WebDBSeqI::get_seq_stream > /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:463 > STACK: Bio::DB::NCBIHelper::get_Stream_by_acc > /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/NCBIHelper.pm:479 > STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc > /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:186 > STACK: Bio::Perl::get_sequence > /usr/lib/perl5/vendor_perl/5.8.8/Bio/Perl.pm:520 > STACK: main::acc_2_desc /home/bradbury/Genomes/bin/RB.pl:182 > STACK: /home/bradbury/Genomes/bin/RB.pl:155 > ----------------------------------------------------------- > > The precise line (in my code) whcih appears to be generating the > error is: > $seq = get_sequence('GenBank', $accsn); > > Now this can be a problem if NCBI/Genbank fails due to load > conditions -- > but this specific failure (which is repeatable is due to most likely > hitting > the user process limit restrictions) -- but the small blast results > work > fine -- its only if the Blast has returned several hundred hits that > it runs > into this problem. > > Now what it sounds like to me is an attempt to do multiple > asynchronous NCBI > queries (to get a sequence) with complete disregard of the environment > (process limits, NCBI limits, etc.). But I do not know enough about > how > this works to point a finger at some specific function. As a result > get_sequence process results are accumulated, summarized, etc. > without ever > having issued to respect "wait-variant()) calls to collect former > children > [This IMO would clearly be a bug.] > > It could be adjusted to by allowing the BioPerl library to run in 3 > modes. > (1) completely synchronous -- if you fork you wait until its done -- > and > you collect "it" and any fork fails then one either collects the > process or > switches to the non-conservative mode. > > Robert > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From Jonas_Schaer at gmx.de Tue Dec 8 10:21:58 2009 From: Jonas_Schaer at gmx.de (Jonas Schaer) Date: Tue, 8 Dec 2009 16:21:58 +0100 Subject: [Bioperl-l] fasta format Message-ID: <36E9C2F3282347918FD3B3ACA0EC8126@jonas> Hi there, I have a little question concerning bioperl. I have BioPerl-1.6.1.tar.gz installed and i use the fasta.pm module to read in some fasta files. first it worked fine, but now i have some fastafiles in slightly different format (not all lines have the same length!). ------------- EXCEPTION ------------- MSG: Each line of the fasta entry must be the same length except the last. Line above #49 ' ..' is 28 != 101 chars. STACK Bio::DB::Fasta::calculate_offsets C:/Perl/site/lib/Bio/DB/Fasta.pm:771 STACK Bio::DB::Fasta::index_file C:/Perl/site/lib/Bio/DB/Fasta.pm:681 STACK Bio::DB::Fasta::new C:/Perl/site/lib/Bio/DB/Fasta.pm:491 STACK Bio::DB::Fasta::newFh C:/Perl/site/lib/Bio/DB/Fasta.pm:513 STACK main::readfasta blast_eval.pm:174 STACK toplevel blast_eval.pm:83 ------------------------------------- indexing was interrupted, so unlinking test.fasta.index at C:/Perl/site/lib/Bio/ DB/Fasta.pm line 1054. Is there any way to use these fasta files with diffrent length of lines with this fasta.pm module or will i have to change the format of my fasta-files(big databases...) ? Thanks in advance for any help! Regards, Jonas From awitney at sgul.ac.uk Tue Dec 8 12:01:58 2009 From: awitney at sgul.ac.uk (Adam Witney) Date: Tue, 8 Dec 2009 17:01:58 +0000 Subject: [Bioperl-l] package to associate genes with branches on trees? Message-ID: Hi, I have been generating some trees with Phylip (pars) and then processing them with Bioperl. These trees are generated by comparing multiple strains of a bacterial organism by presence/absence (0/1) calls for each gene. I was wondering of there was any package in Bioperl to try to determine if any specific genes were associated with specific branches of the trees? Or if anyone knew of another tool that can do this? thanks for any help adam From jason at bioperl.org Tue Dec 8 12:44:43 2009 From: jason at bioperl.org (Jason Stajich) Date: Tue, 8 Dec 2009 09:44:43 -0800 Subject: [Bioperl-l] fasta format In-Reply-To: <36E9C2F3282347918FD3B3ACA0EC8126@jonas> References: <36E9C2F3282347918FD3B3ACA0EC8126@jonas> Message-ID: you can run sreformat (HMMER) or bp_sreformat.pl script in scripts/utilties (or that is installed when you install the Bioperl scripts) $ bp_sreformat.pl -if fasta -of fasta -i yourfile.fa -o yournewfile.fa # rename it back $ mv yournewfile.fa yourfile.fa or $ sreformat fasta yourfile.fa > yournewfile.fa $ mv yournewfile.fa yourfile.fa -jason On Dec 8, 2009, at 7:21 AM, Jonas Schaer wrote: > Hi there, > I have a little question concerning bioperl. I have > BioPerl-1.6.1.tar.gz installed and i use the fasta.pm module to read > in some fasta files. first it worked fine, but now i have some > fastafiles in slightly different format (not all lines have the same > length!). > > ------------- EXCEPTION ------------- > MSG: Each line of the fasta entry must be the same length except the > last. > Line above #49 ' > ..' is 28 != 101 chars. > STACK Bio::DB::Fasta::calculate_offsets C:/Perl/site/lib/Bio/DB/ > Fasta.pm:771 > STACK Bio::DB::Fasta::index_file C:/Perl/site/lib/Bio/DB/Fasta.pm:681 > STACK Bio::DB::Fasta::new C:/Perl/site/lib/Bio/DB/Fasta.pm:491 > STACK Bio::DB::Fasta::newFh C:/Perl/site/lib/Bio/DB/Fasta.pm:513 > STACK main::readfasta blast_eval.pm:174 > STACK toplevel blast_eval.pm:83 > ------------------------------------- > > indexing was interrupted, so unlinking test.fasta.index at C:/Perl/ > site/lib/Bio/ > DB/Fasta.pm line 1054. > > > Is there any way to use these fasta files with diffrent length of > lines with this fasta.pm module or will i have to change the format > of my fasta-files(big databases...) ? > > Thanks in advance for any help! > > Regards, Jonas > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From cjfields at illinois.edu Tue Dec 8 23:30:26 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 8 Dec 2009 22:30:26 -0600 Subject: [Bioperl-l] [ANNOUNCEMENT] BioPerl Meeting at the GMOD Conference Message-ID: <1BC089CD-75C3-437E-86A5-22220D724DF6@illinois.edu> All, For those interested, we will be holding a general BioPerl meeting, tentatively scheduled for January 13, 2010, just prior to the GMOD Community Meeting from Jan 14-15 in San Diego. This will be just following the Plant and Animal Genome (PAG) conference Jan 9-13. The exact day and time is somewhat flexible depending on attendees' schedules. For those interested, sign up here: http://www.bioperl.org/wiki/GMOD_2010_Meeting For those interested in attending the GMOD meeting or PAG: http://gmod.org/wiki/January_2010_GMOD_Meeting I can envision the following items popping up: * Refactoring of Alignment and GFF3/FeatureIO * Addressing BioPerl's monolithic nature * Moose and Perl 6 * Documentation Any others? chris From akarger at CGR.Harvard.edu Wed Dec 9 10:01:45 2009 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Wed, 9 Dec 2009 10:01:45 -0500 Subject: [Bioperl-l] fasta format In-Reply-To: <36E9C2F3282347918FD3B3ACA0EC8126@jonas> References: <36E9C2F3282347918FD3B3ACA0EC8126@jonas> Message-ID: <1B12003244CE894E85B4726023637888055929@FASXCH01.fasmail.priv> > Is there any way to use these fasta files with diffrent length of > lines with this fasta.pm module or will i have to change the format > of my fasta-files(big databases...) ? > Jonas, It's not Bioperl, but for a quick fix you can use the Scriptome. Use the change_fasta_to_tab script (http://sysbio.harvard.edu/csb/resources/computational/scriptome/Windows/Tools/Change.html#change_a_fasta_file_into_tabular_format__change_fasta_to_tab_) to change your FASTA into a tab-delimited file. Then use the next tool (change_tab_to_fasta) to change your files back. To use a tool: change the input and output file names on the website, then cut and paste the Perl script from the green box into a CMD window. The script works one sequence at a time, so it doesn't need a lot of memory. (As long as you have enough disk space to store the tab-delimited copy). The recreated FASTAs will be 60 characters per line (although you can hand-edit the line after you paste it to be whatever number of characters you'd like). Let me know if you have a problem. -Amir Karger Life Sciences Research Computing, FAS IT Harvard University From Kevin.M.Brown at asu.edu Wed Dec 9 10:26:22 2009 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Wed, 9 Dec 2009 08:26:22 -0700 Subject: [Bioperl-l] fasta format In-Reply-To: <1B12003244CE894E85B4726023637888055929@FASXCH01.fasmail.priv> References: <36E9C2F3282347918FD3B3ACA0EC8126@jonas> <1B12003244CE894E85B4726023637888055929@FASXCH01.fasmail.priv> Message-ID: <1A4207F8295607498283FE9E93B775B4066B4D53@EX02.asurite.ad.asu.edu> Even easier to accomplish in one step. Read in the fasta file and output it right to another fasta file with SeqIO my $in = Bio::SeqIO->new(-format=>'fasta',-file=>$file); my $out = Bio::SeqIO->new(-format=>'fasta',-file=>'>file.fasta'); while (my $seq = $in->next){$out->write_seq($seq);} Kevin Brown Center for Innovations in Medicine Biodesign Institute Arizona State University > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Amir Karger > Sent: Wednesday, December 09, 2009 8:02 AM > To: Jonas Schaer; bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] fasta format > > > Is there any way to use these fasta files with diffrent length of > > lines with this fasta.pm module or will i have to change the format > > of my fasta-files(big databases...) ? > > > > Jonas, > > It's not Bioperl, but for a quick fix you can use the > Scriptome. Use the change_fasta_to_tab script > (http://sysbio.harvard.edu/csb/resources/computational/scripto > me/Windows/Tools/Change.html#change_a_fasta_file_into_tabular_ > format__change_fasta_to_tab_) to change your FASTA into a > tab-delimited file. Then use the next tool > (change_tab_to_fasta) to change your files back. > > To use a tool: change the input and output file names on the > website, then cut and paste the Perl script from the green > box into a CMD window. The script works one sequence at a > time, so it doesn't need a lot of memory. (As long as you > have enough disk space to store the tab-delimited copy). > > The recreated FASTAs will be 60 characters per line (although > you can hand-edit the line after you paste it to be whatever > number of characters you'd like). > > Let me know if you have a problem. > > -Amir Karger > Life Sciences Research Computing, FAS IT > Harvard University > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From Russell.Smithies at agresearch.co.nz Wed Dec 9 14:44:41 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 10 Dec 2009 08:44:41 +1300 Subject: [Bioperl-l] fasta format In-Reply-To: <1A4207F8295607498283FE9E93B775B4066B4D53@EX02.asurite.ad.asu.edu> References: <36E9C2F3282347918FD3B3ACA0EC8126@jonas> <1B12003244CE894E85B4726023637888055929@FASXCH01.fasmail.priv> <1A4207F8295607498283FE9E93B775B4066B4D53@EX02.asurite.ad.asu.edu> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32B6603815F@exchsth.agresearch.co.nz> It's even easier as the script is already written for you :-) bp_seqconvert.pl --from fasta --to fasta < file.in.fa > file.out.fa --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Kevin Brown > Sent: Thursday, 10 December 2009 4:26 a.m. > To: bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] fasta format > > Even easier to accomplish in one step. Read in the fasta file and output > it right to another fasta file with SeqIO > > my $in = Bio::SeqIO->new(-format=>'fasta',-file=>$file); > my $out = Bio::SeqIO->new(-format=>'fasta',-file=>'>file.fasta'); > while (my $seq = $in->next){$out->write_seq($seq);} > > Kevin Brown > Center for Innovations in Medicine > Biodesign Institute > Arizona State University > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org > > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Amir Karger > > Sent: Wednesday, December 09, 2009 8:02 AM > > To: Jonas Schaer; bioperl-l at bioperl.org > > Subject: Re: [Bioperl-l] fasta format > > > > > Is there any way to use these fasta files with diffrent length of > > > lines with this fasta.pm module or will i have to change the format > > > of my fasta-files(big databases...) ? > > > > > > > Jonas, > > > > It's not Bioperl, but for a quick fix you can use the > > Scriptome. Use the change_fasta_to_tab script > > (http://sysbio.harvard.edu/csb/resources/computational/scripto > > me/Windows/Tools/Change.html#change_a_fasta_file_into_tabular_ > > format__change_fasta_to_tab_) to change your FASTA into a > > tab-delimited file. Then use the next tool > > (change_tab_to_fasta) to change your files back. > > > > To use a tool: change the input and output file names on the > > website, then cut and paste the Perl script from the green > > box into a CMD window. The script works one sequence at a > > time, so it doesn't need a lot of memory. (As long as you > > have enough disk space to store the tab-delimited copy). > > > > The recreated FASTAs will be 60 characters per line (although > > you can hand-edit the line after you paste it to be whatever > > number of characters you'd like). > > > > Let me know if you have a problem. > > > > -Amir Karger > > Life Sciences Research Computing, FAS IT > > Harvard University > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From maj at fortinbras.us Wed Dec 9 15:18:08 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 9 Dec 2009 15:18:08 -0500 Subject: [Bioperl-l] fasta format In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32B6603815F@exchsth.agresearch.co.nz> References: <36E9C2F3282347918FD3B3ACA0EC8126@jonas><1B12003244CE894E85B4726023637888055929@FASXCH01.fasmail.priv><1A4207F8295607498283FE9E93B775B4066B4D53@EX02.asurite.ad.asu.edu> <18DF7D20DFEC044098A1062202F5FFF32B6603815F@exchsth.agresearch.co.nz> Message-ID: <5C992E6556584BDFBF39604FDEA8ECE0@NewLife> $ perl -MPerlIO::via::SeqIO -e 'open($f, "<:via(SeqIO)", shift); open($g, ">:via(SeqIO::fasta)", shift); while (<$f>) { print $g $_; }' in.fas out.fas ----- Original Message ----- From: "Smithies, Russell" To: "'Kevin Brown'" ; Sent: Wednesday, December 09, 2009 2:44 PM Subject: Re: [Bioperl-l] fasta format > It's even easier as the script is already written for you :-) > > bp_seqconvert.pl --from fasta --to fasta < file.in.fa > file.out.fa > > > --Russell > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Kevin Brown >> Sent: Thursday, 10 December 2009 4:26 a.m. >> To: bioperl-l at bioperl.org >> Subject: Re: [Bioperl-l] fasta format >> >> Even easier to accomplish in one step. Read in the fasta file and output >> it right to another fasta file with SeqIO >> >> my $in = Bio::SeqIO->new(-format=>'fasta',-file=>$file); >> my $out = Bio::SeqIO->new(-format=>'fasta',-file=>'>file.fasta'); >> while (my $seq = $in->next){$out->write_seq($seq);} >> >> Kevin Brown >> Center for Innovations in Medicine >> Biodesign Institute >> Arizona State University >> >> > -----Original Message----- >> > From: bioperl-l-bounces at lists.open-bio.org >> > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Amir Karger >> > Sent: Wednesday, December 09, 2009 8:02 AM >> > To: Jonas Schaer; bioperl-l at bioperl.org >> > Subject: Re: [Bioperl-l] fasta format >> > >> > > Is there any way to use these fasta files with diffrent length of >> > > lines with this fasta.pm module or will i have to change the format >> > > of my fasta-files(big databases...) ? >> > > >> > >> > Jonas, >> > >> > It's not Bioperl, but for a quick fix you can use the >> > Scriptome. Use the change_fasta_to_tab script >> > (http://sysbio.harvard.edu/csb/resources/computational/scripto >> > me/Windows/Tools/Change.html#change_a_fasta_file_into_tabular_ >> > format__change_fasta_to_tab_) to change your FASTA into a >> > tab-delimited file. Then use the next tool >> > (change_tab_to_fasta) to change your files back. >> > >> > To use a tool: change the input and output file names on the >> > website, then cut and paste the Perl script from the green >> > box into a CMD window. The script works one sequence at a >> > time, so it doesn't need a lot of memory. (As long as you >> > have enough disk space to store the tab-delimited copy). >> > >> > The recreated FASTAs will be 60 characters per line (although >> > you can hand-edit the line after you paste it to be whatever >> > number of characters you'd like). >> > >> > Let me know if you have a problem. >> > >> > -Amir Karger >> > Life Sciences Research Computing, FAS IT >> > Harvard University >> > >> > >> > >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From kellert at ohsu.edu Wed Dec 9 19:36:13 2009 From: kellert at ohsu.edu (Tom Keller) Date: Wed, 9 Dec 2009 16:36:13 -0800 Subject: [Bioperl-l] how to map ensembl id to NCBI gi Message-ID: <435849B7-B66E-4553-988B-0645775E785E@ohsu.edu> Greetings, Is there a simple way to map a list of ensembl ids to the NCBI gis? thanks, Tom Thomas (Tom) Keller kellert at ohsu.edu 503.494.2442 6339b R Jones Hall (BSc/CROET) www.ohsu.edu/xd/research/research-cores/dna-analysis/ From cjfields at illinois.edu Wed Dec 9 20:59:37 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 9 Dec 2009 19:59:37 -0600 Subject: [Bioperl-l] how to map ensembl id to NCBI gi In-Reply-To: <435849B7-B66E-4553-988B-0645775E785E@ohsu.edu> References: <435849B7-B66E-4553-988B-0645775E785E@ohsu.edu> Message-ID: <14495B1F-911C-4FE7-8224-A3F050F7E03C@illinois.edu> Tom, Probably best to do this via BioMart: http://www.ensembl.org/biomart/ I would assume you can also do this via the ensembl perl API as well. Also, have a look at the UniProt ID Mapper: http://www.uniprot.org/?tab=mapping chris On Dec 9, 2009, at 6:36 PM, Tom Keller wrote: > Greetings, > Is there a simple way to map a list of ensembl ids to the NCBI gis? > > thanks, > Tom > > Thomas (Tom) Keller > kellert at ohsu.edu > 503.494.2442 > 6339b R Jones Hall (BSc/CROET) > www.ohsu.edu/xd/research/research-cores/dna-analysis/ > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From lovebaby39 at gmail.com Thu Dec 10 09:22:14 2009 From: lovebaby39 at gmail.com (Hsueh) Date: Thu, 10 Dec 2009 22:22:14 +0800 Subject: [Bioperl-l] about bioperl issue Message-ID: <5F281DC3E4514B3AAA8881169B240227@SHAPC> Dear The following is code. -------------------------------------------------------------------------------- my at params_rb = ( 'program' => 'blastn', 'database' => 'DB\\RB_GUS\\RB_GUS'); my $factory_rb = Bio::Tools::Run::StandAloneBlast->new(@params_rb); my $input_rb = Bio::Seq->new(-id =>"test_query", -seq => $testline2); my $blast_report_rb = $factory_rb->blastall($input_rb); while (my $result_rb = $blast_report_rb-> next_result ) { while (my $hit_rb = $result_rb->next_hit()){ while (my $hsp_rb = $hit_rb->next_hsp()){ print $hit_rb->name,"\nevalue = " , $hsp_rb->evalue , "\t score = " , $hsp_rb->score , "\n" ; #print " ",$hit->name,"\n"; } } } -------------------------------------------------------------------------------- I know how to get "name", "evalue" and "score", but I don't know how to get the word which is in red color. (or please see attachment.) ------------------------------------------------------------------------------------------------------------------ Query: 147 ctctttactcttaggtttacccgccggggtatcgtggcaaacaaggatagtttaaacaga 206 |||||| |||||||||||||||||| |||| || |||||| |||||||||||| || Sbjct: 114 ctcttttctcttaggtttacccgccaatatatcctgtcaaacactgatagtttaaactga 173 ------------------------------------------------------------------------------------------------------------------ I will appreciate if you could tell me how to do it. Thank you. Reginald Hsueh -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: R20080801-1.seq.txt URL: From SMarkel at accelrys.com Thu Dec 10 09:47:36 2009 From: SMarkel at accelrys.com (Scott Markel) Date: Thu, 10 Dec 2009 06:47:36 -0800 Subject: [Bioperl-l] about bioperl issue In-Reply-To: <5F281DC3E4514B3AAA8881169B240227@SHAPC> References: <5F281DC3E4514B3AAA8881169B240227@SHAPC> Message-ID: <5ACBA19439E77B43A06F4CAB897EC977067C6E@EXCH1-COLO.accelrys.net> Reginald, I didn't see anything highlighted in red but the three strings in the pairwise alignment display can be obtained from an HSP using $hsp->query_string() $hsp->hit_string() $hsp->homology_string() Scott Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel at accelrys.com Accelrys (SciTegic R&D) mobile: +1 858 205 3653 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 San Diego, CA 92121 fax: +1 858 799 5222 USA web: http://www.accelrys.com http://www.linkedin.com/in/smarkel Vice President, Board of Directors: International Society for Computational Biology Chair: ISCB Publications Committee Associate Editor: PLoS Computational Biology Editorial Board: Briefings in Bioinformatics -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hsueh Sent: Thursday, 10 December 2009 6:22 AM To: bioperl-l at bioperl.org Subject: [Bioperl-l] about bioperl issue Importance: High Dear The following is code. -------------------------------------------------------------------------------- my at params_rb = ( 'program' => 'blastn', 'database' => 'DB\\RB_GUS\\RB_GUS'); my $factory_rb = Bio::Tools::Run::StandAloneBlast->new(@params_rb); my $input_rb = Bio::Seq->new(-id =>"test_query", -seq => $testline2); my $blast_report_rb = $factory_rb->blastall($input_rb); while (my $result_rb = $blast_report_rb-> next_result ) { while (my $hit_rb = $result_rb->next_hit()){ while (my $hsp_rb = $hit_rb->next_hsp()){ print $hit_rb->name,"\nevalue = " , $hsp_rb->evalue , "\t score = " , $hsp_rb->score , "\n" ; #print " ",$hit->name,"\n"; } } } -------------------------------------------------------------------------------- I know how to get "name", "evalue" and "score", but I don't know how to get the word which is in red color. (or please see attachment.) ------------------------------------------------------------------------------------------------------------------ Query: 147 ctctttactcttaggtttacccgccggggtatcgtggcaaacaaggatagtttaaacaga 206 |||||| |||||||||||||||||| |||| || |||||| |||||||||||| || Sbjct: 114 ctcttttctcttaggtttacccgccaatatatcctgtcaaacactgatagtttaaactga 173 ------------------------------------------------------------------------------------------------------------------ I will appreciate if you could tell me how to do it. Thank you. Reginald Hsueh From David.Messina at sbc.su.se Thu Dec 10 10:09:31 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 10 Dec 2009 16:09:31 +0100 Subject: [Bioperl-l] about bioperl issue In-Reply-To: <5F281DC3E4514B3AAA8881169B240227@SHAPC> References: <5F281DC3E4514B3AAA8881169B240227@SHAPC> Message-ID: <107080B6-BC05-470C-B426-5DB69BD574C1@sbc.su.se> Hi Reginald, None of the words in your email or the attachment are colored red ? unfortunately any kind of formatting tends to get removed from emails send to mailing lists. Could you be more specific about what part of the blast report you are not able to get? You could even just copy and paste that particular bit of the report into your reply if it's not clear what to call it. Dave From David.Messina at sbc.su.se Thu Dec 10 10:36:49 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 10 Dec 2009 16:36:49 +0100 Subject: [Bioperl-l] about bioperl issue In-Reply-To: <9DEC7152C11A4F00B2F919B653E6D572@SHAPC> References: <5F281DC3E4514B3AAA8881169B240227@SHAPC> <107080B6-BC05-470C-B426-5DB69BD574C1@sbc.su.se> <9DEC7152C11A4F00B2F919B653E6D572@SHAPC> Message-ID: <15F92119-7625-4491-899A-0D49CE1BC861@sbc.su.se> Hi Reginald, Please keep all replies on the list so that everyone can follow the thread. In a separate email, Scott gave the answer you were looking for, I think. Namely: $hsp->query_string() OR $hsp->hit_string() Dave On Dec 10, 2009, at 16:31, Hsueh wrote: > Dear Dave Messina > > I need to get the string that is "ctctttactcttaggtttacccgccggggtatcgtggcaaacaaggatagtttaaacaga". > > Thank you > > Reginald Hsueh > > ------------------------------------------------------------------------------------------------------------------------------ > Query: 147 ctctttactcttaggtttacccgccggggtatcgtggcaaacaaggatagtttaaacaga 206 > |||||| |||||||||||||||||| |||| || |||||| |||||||||||| || > Sbjct: 114 ctcttttctcttaggtttacccgccaatatatcctgtcaaacactgatagtttaaactga 173 > ------------------------------------------------------------------------------------------------------------------------------ > > > > > -------------------------------------------------- > From: "Dave Messina" > Sent: Thursday, December 10, 2009 11:09 PM > To: "Hsueh" > Cc: > Subject: Re: [Bioperl-l] about bioperl issue > >> Hi Reginald, >> >> None of the words in your email or the attachment are colored red ? unfortunately any kind of formatting tends to get removed from emails send to mailing lists. >> >> Could you be more specific about what part of the blast report you are not able to get? You could even just copy and paste that particular bit of the report into your reply if it's not clear what to call it. >> >> >> Dave From lovebaby39 at gmail.com Thu Dec 10 10:53:00 2009 From: lovebaby39 at gmail.com (Hsueh) Date: Thu, 10 Dec 2009 23:53:00 +0800 Subject: [Bioperl-l] about bioperl issue In-Reply-To: <15F92119-7625-4491-899A-0D49CE1BC861@sbc.su.se> References: <5F281DC3E4514B3AAA8881169B240227@SHAPC> <107080B6-BC05-470C-B426-5DB69BD574C1@sbc.su.se> <9DEC7152C11A4F00B2F919B653E6D572@SHAPC> <15F92119-7625-4491-899A-0D49CE1BC861@sbc.su.se> Message-ID: Dear Dave Messina Thank you for your replies. Reginald Hsueh -------------------------------------------------- From: "Dave Messina" Sent: Thursday, December 10, 2009 11:36 PM To: "Hsueh" Cc: Subject: Re: [Bioperl-l] about bioperl issue > Hi Reginald, > > Please keep all replies on the list so that everyone can follow the > thread. > > In a separate email, Scott gave the answer you were looking for, I think. > > Namely: > $hsp->query_string() > OR > $hsp->hit_string() > > > > Dave > > > > > On Dec 10, 2009, at 16:31, Hsueh wrote: > >> Dear Dave Messina >> >> I need to get the string that is >> "ctctttactcttaggtttacccgccggggtatcgtggcaaacaaggatagtttaaacaga". >> >> Thank you >> >> Reginald Hsueh >> >> ------------------------------------------------------------------------------------------------------------------------------ >> Query: 147 ctctttactcttaggtttacccgccggggtatcgtggcaaacaaggatagtttaaacaga >> 206 >> |||||| |||||||||||||||||| |||| || |||||| >> |||||||||||| || >> Sbjct: 114 ctcttttctcttaggtttacccgccaatatatcctgtcaaacactgatagtttaaactga >> 173 >> ------------------------------------------------------------------------------------------------------------------------------ >> >> >> >> >> -------------------------------------------------- >> From: "Dave Messina" >> Sent: Thursday, December 10, 2009 11:09 PM >> To: "Hsueh" >> Cc: >> Subject: Re: [Bioperl-l] about bioperl issue >> >>> Hi Reginald, >>> >>> None of the words in your email or the attachment are colored red ? >>> unfortunately any kind of formatting tends to get removed from emails >>> send to mailing lists. >>> >>> Could you be more specific about what part of the blast report you are >>> not able to get? You could even just copy and paste that particular bit >>> of the report into your reply if it's not clear what to call it. >>> >>> >>> Dave >>>>Dear >>>> >>>>The following is code. >>>> >>>> >>>>-------------------------------------------------------------------------------- >>>> >>>>my at params_rb = ( 'program' => 'blastn', >>>> 'database' => 'DB\\RB_GUS\\RB_GUS'); >>>>my $factory_rb = Bio::Tools::Run::StandAloneBlast->new(@params_rb); >>>> >>>>my $input_rb = Bio::Seq->new(-id =>"test_query", >>>> -seq => $testline2); >>>>my $blast_report_rb = $factory_rb->blastall($input_rb); >>>> >>>> while (my $result_rb = $blast_report_rb-> next_result ) { >>>> while (my $hit_rb = $result_rb->next_hit()){ >>>> while (my $hsp_rb = $hit_rb->next_hsp()){ >>>> print $hit_rb->name,"\nevalue = " , $hsp_rb->evalue , "\t score = " >>>> , $hsp_rb->score , "\n" ; >>>> #print " ",$hit->name,"\n"; >>>> } >>>> } >>>> } >>>> >>>>-------------------------------------------------------------------------------- >>>> >>>> >>>>I know how to get "name", "evalue" and "score", but I don't know how >>>>to get the word which is in red color. (or please see attachment.) >>>>------------------------------------------------------------------------------------------------------------------ >>>>Query: 147 ctctttactcttaggtttacccgccggggtatcgtggcaaacaaggatagtttaaacaga >>>>206 >>>> |||||| |||||||||||||||||| |||| || |||||| >>>> |||||||||||| || >>>>Sbjct: 114 >>>>ctcttttctcttaggtttacccgccaatatatcctgtcaaacactgatagtttaaactga 173 >>>>------------------------------------------------------------------------------------------------------------------ >>>> >>>>I will appreciate if you could tell me how to do it. >>>>Thank you. >>>> >>>>Reginald Hsueh From pg4 at sanger.ac.uk Thu Dec 10 15:50:40 2009 From: pg4 at sanger.ac.uk (Pablo Marin-Garcia) Date: Thu, 10 Dec 2009 20:50:40 +0000 (GMT) Subject: [Bioperl-l] how to map ensembl id to NCBI gi In-Reply-To: References: Message-ID: If you are mapping ensembl genes to NCBI genes (via ensemblaPI or biomart) please read this recent thread at ensembl-dev: http://listserver.ebi.ac.uk/mailing-lists-archives/ensembl-dev/msg05417.html Seems that the ensembl gene mapping to NCBI is done through translation so the noncoding genes do not have the corresponding NCBI gene mapped. -Pablo > ------------------------------ > > Message: 4 > Date: Wed, 9 Dec 2009 19:59:37 -0600 > From: Chris Fields > Subject: Re: [Bioperl-l] how to map ensembl id to NCBI gi > To: Tom Keller > Cc: BioPerl-List > Message-ID: <14495B1F-911C-4FE7-8224-A3F050F7E03C at illinois.edu> > Content-Type: text/plain; charset=us-ascii > > Tom, > > Probably best to do this via BioMart: > > http://www.ensembl.org/biomart/ > > I would assume you can also do this via the ensembl perl API as well. > > Also, have a look at the UniProt ID Mapper: > > http://www.uniprot.org/?tab=mapping > > chris > > On Dec 9, 2009, at 6:36 PM, Tom Keller wrote: > >> Greetings, >> Is there a simple way to map a list of ensembl ids to the NCBI gis? >> >> thanks, >> Tom >> >> Thomas (Tom) Keller >> kellert at ohsu.edu >> 503.494.2442 >> 6339b R Jones Hall (BSc/CROET) >> www.ohsu.edu/xd/research/research-cores/dna-analysis/ >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > ==================================================================== Pablo Marin-Garcia, PhD \\// (Argiope bruennichi \/\/`(||>O:'\/\/ with stabilimentum) //\\ Sanger Institute | PostDoc / Computer Biologist Wellcome Trust Genome Campus | team : 128/108 (Human Genetics) Hinxton, Cambridge CB10 1HH | room : N333 United Kingdom | email: pablo.marin at sanger.ac.uk ==================================================================== -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From umjsm at leeds.ac.uk Fri Dec 11 11:44:42 2009 From: umjsm at leeds.ac.uk (Joan Segura Mora) Date: Fri, 11 Dec 2009 16:44:42 +0000 Subject: [Bioperl-l] extract and write a pdb chain Message-ID: <1260549882.6484.11.camel@limm-pc1254> Hello, I am trying to do a very easy think but I don't get it. I want to write in a file a chain of a pdb. I have try a lot of thinks but what I think that it should work is the next script: use Bio::Structure::IO; use strict; my $structio = Bio::Structure::IO->new(-file => "101m.pdb",'-format' => 'pdb'); my $struc = $structio->next_structure; my $new_entry = Bio::Structure::Entry->new( -id => 'structure_id'); for my $chain ($struc->get_chains) { if($chain->id eq "A"){ $new_entry->chain($chain); last; } } my $out = Bio::Structure::IO->new(-file => ">out.pdb",'-format' => 'pdb');# $out->write_structure($new_entry); it doesn't. I get the next error: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: add_chain: first argument needs to be a Model object () STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:368 STACK: Bio::Structure::Entry::add_chain /usr/local/share/perl/5.8.8/Bio/Structure/Entry.pm:335 STACK: Bio::Structure::Entry::get_chains /usr/local/share/perl/5.8.8/Bio/Structure/Entry.pm:391 STACK: Bio::Structure::Entry::chain /usr/local/share/perl/5.8.8/Bio/Structure/Entry.pm:304 STACK: read_pdb.pl:10 ----------------------------------------------------------- As far I understand the documentation, the method chain of the object Bio::Structure::Entry requires an as input an object of type Chain. Any solution will be very welcome. best regards, Joan From wkretzsch at gmail.com Fri Dec 11 14:22:31 2009 From: wkretzsch at gmail.com (Warren W. Kretzschmar) Date: Fri, 11 Dec 2009 14:22:31 -0500 Subject: [Bioperl-l] Proposed project: SeqIO module for msOUT files generated by Hudson's ms Message-ID: <5d2ac05c0912111122p1fea0961rfff0f1cf7aa8f97f@mail.gmail.com> Hi, I'm new to the bioperl community. I've created a perl module that reads in msOUT files generated by Hudson's ms. As far as I understand, there is no SeqIO module to read and output these files? If so, I propose to create a module that does this. Any suggestions? Thanks, Warren Kretzschmar From maj at fortinbras.us Fri Dec 11 14:59:53 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 11 Dec 2009 14:59:53 -0500 Subject: [Bioperl-l] Proposed project: SeqIO module for msOUT filesgenerated by Hudson's ms In-Reply-To: <5d2ac05c0912111122p1fea0961rfff0f1cf7aa8f97f@mail.gmail.com> References: <5d2ac05c0912111122p1fea0961rfff0f1cf7aa8f97f@mail.gmail.com> Message-ID: <07382508ED0B41F4B8289813B734239B@NewLife> Hi Warren, I say go for it. You'll want to have a look at http://bio.perl.org/wiki/Advanced_BioPerl which explains most of our tips and "policies" for prospective code contributors, as well as http://bio.perl.org/wiki/HOWTO:SeqIO which details SeqIO from the user's perspective. Look carefully at some Bio::SeqIO::* modules for implementation details. If you have code to propose, use http://bugzilla.bioperl.org and enter a new enhancement, where you can upload your module for us to review. MAJ ----- Original Message ----- From: "Warren W. Kretzschmar" To: Sent: Friday, December 11, 2009 2:22 PM Subject: [Bioperl-l] Proposed project: SeqIO module for msOUT filesgenerated by Hudson's ms > Hi, > I'm new to the bioperl community. I've created a perl module that > reads in msOUT files generated by Hudson's ms. As far as I > understand, there is no SeqIO module to read and output these files? > If so, I propose to create a module that does this. Any suggestions? > > Thanks, > Warren Kretzschmar > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From bosborne11 at verizon.net Fri Dec 11 15:37:45 2009 From: bosborne11 at verizon.net (Brian Osborne) Date: Fri, 11 Dec 2009 15:37:45 -0500 Subject: [Bioperl-l] extract and write a pdb chain In-Reply-To: <1260549882.6484.11.camel@limm-pc1254> References: <1260549882.6484.11.camel@limm-pc1254> Message-ID: Joan, It looks to me like the first argument to the add_chain() method has to be a Model object, the second is the Chain itself. See Structure/ Entry.pm, for example. However if you're seeing some documentation that says something else then tell us where, it needs to be corrected. In Bio::Structure an Entry consists of one or Models, each of which has one or more Chains. This allows you to build macromolecular complexes (an Entry), which could have more than one defined proteins or protein complexes (Models). Brian O. On Dec 11, 2009, at 11:44 AM, Joan Segura Mora wrote: > Hello, > > I am trying to do a very easy think but I don't get it. I want to > write > in a file a chain of a pdb. I have try a lot of thinks but what I > think > that it should work is the next script: > > use Bio::Structure::IO; > use strict; > > my $structio = Bio::Structure::IO->new(-file => "101m.pdb",'-format' > => > 'pdb'); > my $struc = $structio->next_structure; > > my $new_entry = Bio::Structure::Entry->new( -id => 'structure_id'); > > for my $chain ($struc->get_chains) { > if($chain->id eq "A"){ > $new_entry->chain($chain); > last; > } > } > > my $out = Bio::Structure::IO->new(-file => ">out.pdb",'-format' => > 'pdb');# > $out->write_structure($new_entry); > > it doesn't. I get the next error: > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: add_chain: first argument needs to be a Model object () > > STACK: Error::throw > STACK: > Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/Root.pm: > 368 > STACK: > Bio::Structure::Entry::add_chain /usr/local/share/perl/5.8.8/Bio/ > Structure/Entry.pm:335 > STACK: > Bio::Structure::Entry::get_chains /usr/local/share/perl/5.8.8/Bio/ > Structure/Entry.pm:391 > STACK: > Bio::Structure::Entry::chain /usr/local/share/perl/5.8.8/Bio/ > Structure/Entry.pm:304 > STACK: read_pdb.pl:10 > ----------------------------------------------------------- > > As far I understand the documentation, the method chain of the object > Bio::Structure::Entry requires an as input an object of type Chain. > > Any solution will be very welcome. > > best regards, > Joan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From awitney at sgul.ac.uk Sun Dec 13 16:48:13 2009 From: awitney at sgul.ac.uk (Adam Witney) Date: Sun, 13 Dec 2009 21:48:13 +0000 Subject: [Bioperl-l] combining tree image with heatmap Message-ID: <4B25611D.6050009@sgul.ac.uk> I am trying to draw a tree on the side of a heatmap image, much like you see after clustering data. I was wondering if anyone has managed to do this using bioperl? I can draw the two separately, but can't quite seem to work out how to put the two together and get the nodes to line up with the correct row of clustering data. Is there any particular module to look at? thanks for any help adam From dhwani1030 at gmail.com Sat Dec 12 15:04:01 2009 From: dhwani1030 at gmail.com (dhwani gandhi) Date: Sat, 12 Dec 2009 15:04:01 -0500 Subject: [Bioperl-l] Bioperl code help Message-ID: Hi, I am very new to Bioperl but I am somewhat familiar to perl though. I write my perl programs in Notepad++ and run them in cmd. Now, I want to run Bioperl programs. I just installed bioperl on my computer. And I have a program using bioperl modules in Notepad++. My question is how to run these programs? Can they be ran in cmd as well? or do I use ppm? Please help. Thanks, -Dhwani Gandhi. From eric_donaldson at med.unc.edu Sun Dec 13 18:15:24 2009 From: eric_donaldson at med.unc.edu (eric_donaldson at med.unc.edu) Date: Sun, 13 Dec 2009 18:15:24 -0500 Subject: [Bioperl-l] problem with install Message-ID: Hello, Today I downloaded bioperl 1.61 on my new macbook pro using fink.? I used the fink install bioperl.pm-588 as I could not get it to instal using the perl version 5.10. But now I get an error when trying to run a bioperl script. Here is the error: Can't locate Bio/Tools/BPlite.pm in @INC (@INC contains: /sw/lib/perl5/darwin-thread-multi-2level /sw/lib/perl5 /sw/lib/perl5/darwin /Library/Perl/Updates/5.10.0 /System/Library/Perl/5.10.0/darwin-thread-multi-2level /System/Library/Perl/5.10.0 /Library/Perl/5.10.0/darwin-thread-multi-2level /Library/Perl/5.10.0 /Network/Library/Perl/5.10.0/darwin-thread-multi-2level /Network/Library/Perl/5.10.0 /Network/Library/Perl /System/Library/Perl/Extras/5.10.0/darwin-thread-multi-2level /System/Library/Perl/Extras/5.10.0 .) at blastparser.pl line 8. BEGIN failed--compilation aborted at blastparser.pl line 8. I am a novice at unix and bioperl so I do not know how to troubleshoot this, would you please hleo me? Thank you, Eric Eric F. Donaldson, Ph.D. Research Assistant Professor, Ralph Baric Lab University of North Carolina Department of Epidemiology -------------- next part -------------- begin:vcard n:Donaldson;Eric fn:Eric F. Donaldson, PhD tel;work:919.966.3881 org:University of North Carolina, School of Medicine;Epidemiology adr:;;2107 McGavran-Greenberg Hall CB# 7435 ;Chapel Hill;NC;27599;USA email;internet:eric_donaldson at med.unc.edu email;home;internet:viralnerd at gmail.com title:Research Assistant Professor version:2.1 end:vcard From jason at bioperl.org Sun Dec 13 20:24:26 2009 From: jason at bioperl.org (Jason Stajich) Date: Sun, 13 Dec 2009 17:24:26 -0800 Subject: [Bioperl-l] problem with install In-Reply-To: References: Message-ID: <119F436D-D36D-4D28-BAE7-6EB17D665FC2@bioperl.org> Hi Eric - Bio::Tools::BPlite is no longer supported in Bioperl - it was deprecated several releases ago. It was replaced with Bio::SearchIO -jason On Dec 13, 2009, at 3:15 PM, eric_donaldson at med.unc.edu wrote: > Hello, > > Today I downloaded bioperl 1.61 on my new macbook pro using fink. I > used the > > fink install bioperl.pm-588 as I could not get it to instal using > the perl version 5.10. > > But now I get an error when trying to run a bioperl script. > > Here is the error: > > Can't locate Bio/Tools/BPlite.pm in @INC (@INC contains: /sw/lib/ > perl5/darwin-thread-multi-2level /sw/lib/perl5 /sw/lib/perl5/darwin / > Library/Perl/Updates/5.10.0 /System/Library/Perl/5.10.0/darwin- > thread-multi-2level /System/Library/Perl/5.10.0 /Library/Perl/5.10.0/ > darwin-thread-multi-2level /Library/Perl/5.10.0 /Network/Library/ > Perl/5.10.0/darwin-thread-multi-2level /Network/Library/Perl/5.10.0 / > Network/Library/Perl /System/Library/Perl/Extras/5.10.0/darwin- > thread-multi-2level /System/Library/Perl/Extras/5.10.0 .) at > blastparser.pl line 8. > BEGIN failed--compilation aborted at blastparser.pl line 8. > > > I am a novice at unix and bioperl so I do not know how to > troubleshoot this, would you please hleo me? > > Thank you, > > Eric > > > Eric F. Donaldson, Ph.D. > Research Assistant Professor, Ralph Baric Lab > University of North Carolina > Department of Epidemiology > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From jason at bioperl.org Sun Dec 13 23:09:45 2009 From: jason at bioperl.org (Jason Stajich) Date: Sun, 13 Dec 2009 20:09:45 -0800 Subject: [Bioperl-l] problem with install In-Reply-To: References: <119F436D-D36D-4D28-BAE7-6EB17D665FC2@bioperl.org> Message-ID: <404D2600-58D3-4491-834E-8C9F860D3ACC@bioperl.org> So you installed perl-5.10 or using system perl? I'm confused if you actually installed bioperl.pm or not via fink? It seems like since your @INC or $PERL5LIB points to /sw/lib/perl5 which is one of the dirs it would have installed in, but I don't think you actually installed bioperl. you can try and do: $ locate Bio/SearchIO.pm We'll see if any of the other osx/fink gurus are on the list that can help or you can install it via CPAN I guess. -jason On Dec 13, 2009, at 6:39 PM, eric_donaldson at med.unc.edu wrote: > > I actually tried a different blastparser that uses BIO::SearchIO and > got the same message: > > Can't locate Bio/SearchIO.pm in @INC (@INC contains: /sw/lib/perl5/ > darwin-thread-multi-2level /sw/lib/perl5 /sw/lib/perl5/darwin / > Library/Perl/Updates/5.10.0 /System/Library/Perl/5.10.0/darwin- > thread-multi-2level /System/Library/Perl/5.10.0 /Library/Perl/5.10.0/ > darwin-thread-multi-2level /Library/Perl/5.10.0 /Network/Library/ > Perl/5.10.0/darwin-thread-multi-2level /Network/Library/Perl/5.10.0 / > Network/Library/Perl /System/Library/Perl/Extras/5.10.0/darwin- > thread-multi-2level /System/Library/Perl/Extras/5.10.0 .) at > blastparser.new.pl line 8. > BEGIN failed--compilation aborted at blastparser.new.pl line 8. > > I suspect there is a path problem, but am not savvy enough to know > how to fix it. I am really just a hacker.... I have several scripts > that I use regularly and that I know how to modify, but am lost when > they don't work... > > Thanks for any help, > > Eric > > ----- Original Message ----- > From: Jason Stajich > Date: Sunday, December 13, 2009 8:24 pm > Subject: Re: [Bioperl-l] problem with install > To: eric_donaldson at med.unc.edu > Cc: bioperl-l at bioperl.org > >> Hi Eric - >> >> Bio::Tools::BPlite is no longer supported in Bioperl - it >> was >> deprecated several releases ago. >> It was replaced with Bio::SearchIO >> >> -jason >> On Dec 13, 2009, at 3:15 PM, eric_donaldson at med.unc.edu wrote: >> >>> Hello, >>> >>> Today I downloaded bioperl 1.61 on my new macbook pro using >> fink. I >>> used the >>> >>> fink install bioperl.pm-588 as I could not get it to instal >> using >>> the perl version 5.10. >>> >>> But now I get an error when trying to run a bioperl script. >>> >>> Here is the error: >>> >>> Can't locate Bio/Tools/BPlite.pm in @INC (@INC contains: >> /sw/lib/ >>> perl5/darwin-thread-multi-2level /sw/lib/perl5 >> /sw/lib/perl5/darwin / >>> Library/Perl/Updates/5.10.0 /System/Library/Perl/5.10.0/darwin- >> >>> thread-multi-2level /System/Library/Perl/5.10.0 >> /Library/Perl/5.10.0/ >>> darwin-thread-multi-2level /Library/Perl/5.10.0 >> /Network/Library/ >>> Perl/5.10.0/darwin-thread-multi-2level >> /Network/Library/Perl/5.10.0 / >>> Network/Library/Perl /System/Library/Perl/Extras/5.10.0/darwin- >> >>> thread-multi-2level /System/Library/Perl/Extras/5.10.0 .) >> at >>> blastparser.pl line 8. >>> BEGIN failed--compilation aborted at blastparser.pl line 8. >>> >>> >>> I am a novice at unix and bioperl so I do not know how >> to >>> troubleshoot this, would you please hleo me? >>> >>> Thank you, >>> >>> Eric >>> >>> >>> Eric F. Donaldson, Ph.D. >>> Research Assistant Professor, Ralph Baric Lab >>> University of North Carolina >>> Department of Epidemiology >>> >>> >>> >> < >> eric_donaldson.vcf>_______________________________________________> >> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Jason Stajich >> jason.stajich at gmail.com >> jason at bioperl.org >> >> > > Eric F. Donaldson, Ph.D. > Research Assistant Professor, Ralph Baric Lab > University of North Carolina > Department of Epidemiology > > > -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From jason at bioperl.org Mon Dec 14 00:10:54 2009 From: jason at bioperl.org (Jason Stajich) Date: Sun, 13 Dec 2009 21:10:54 -0800 Subject: [Bioperl-l] problem with install In-Reply-To: References: <119F436D-D36D-4D28-BAE7-6EB17D665FC2@bioperl.org> <404D2600-58D3-4491-834E-8C9F860D3ACC@bioperl.org> Message-ID: <7B2EBA9A-E9DF-49A5-ABC7-C42512BA9C9A@bioperl.org> Eric - please CC the bioperl list when responding so others can help - I can't be the only answerer. But since your @INC message doesn't include /sw/lib/perl5/5.8.8/ you would need to make sure that is added to your PERL5LIB. There are some help docs on the perl sites I expect on how to get your PATHs in order. Or you can just install via CPAN which will put it in the right path - there are docs on the bioperl website about installing via CPAN. -jason On Dec 13, 2009, at 9:02 PM, eric_donaldson at med.unc.edu wrote: > Hi Jason, > > The fink package did not have support for perl 5.10, so I attempted > to install the perl 5.8.6 package. > > When I attempted: locate Bio/SearchIO.pm > I got: -bash: $: command not found > > So even though I can find SearchIO.pm in sw/lib/perl5/5.8.8/Bio/ > SearchIO.pm I cannot access it. Do I need to use the older version > of perl? > > Would it be better to install with CPAN? If so, can you send me to > a page that has instructions? > > Thank you so much! > > ERic > > > ----- Original Message ----- > From: Jason Stajich > Date: Sunday, December 13, 2009 11:10 pm > Subject: Re: [Bioperl-l] problem with install > To: eric_donaldson at med.unc.edu > Cc: BioPerl List > >> So you installed perl-5.10 or using system perl? I'm >> confused if you >> actually installed bioperl.pm or not via fink? >> >> It seems like since your @INC or $PERL5LIB points to >> /sw/lib/perl5 >> which is one of the dirs it would have installed in, but I don't >> think >> you actually installed bioperl. >> >> you can try and do: >> $ locate Bio/SearchIO.pm >> >> We'll see if any of the other osx/fink gurus are on the list >> that can >> help or you can install it via CPAN I guess. >> >> -jason >> On Dec 13, 2009, at 6:39 PM, eric_donaldson at med.unc.edu wrote: >> >>> >>> I actually tried a different blastparser that uses >> BIO::SearchIO and >>> got the same message: >>> >>> Can't locate Bio/SearchIO.pm in @INC (@INC contains: >> /sw/lib/perl5/ >>> darwin-thread-multi-2level /sw/lib/perl5 /sw/lib/perl5/darwin >> / >>> Library/Perl/Updates/5.10.0 /System/Library/Perl/5.10.0/darwin- >> >>> thread-multi-2level /System/Library/Perl/5.10.0 >> /Library/Perl/5.10.0/ >>> darwin-thread-multi-2level /Library/Perl/5.10.0 >> /Network/Library/ >>> Perl/5.10.0/darwin-thread-multi-2level >> /Network/Library/Perl/5.10.0 / >>> Network/Library/Perl /System/Library/Perl/Extras/5.10.0/darwin- >> >>> thread-multi-2level /System/Library/Perl/Extras/5.10.0 .) >> at >>> blastparser.new.pl line 8. >>> BEGIN failed--compilation aborted at blastparser.new.pl line 8. >>> >>> I suspect there is a path problem, but am not savvy enough to >> know >>> how to fix it. I am really just a hacker.... I have >> several scripts >>> that I use regularly and that I know how to modify, but am >> lost when >>> they don't work... >>> >>> Thanks for any help, >>> >>> Eric >>> >>> ----- Original Message ----- >>> From: Jason Stajich >>> Date: Sunday, December 13, 2009 8:24 pm >>> Subject: Re: [Bioperl-l] problem with install >>> To: eric_donaldson at med.unc.edu >>> Cc: bioperl-l at bioperl.org >>> >>>> Hi Eric - >>>> >>>> Bio::Tools::BPlite is no longer supported in Bioperl - it >>>> was >>>> deprecated several releases ago. >>>> It was replaced with Bio::SearchIO >>>> >>>> -jason >>>> On Dec 13, 2009, at 3:15 PM, eric_donaldson at med.unc.edu wrote: >>>> >>>>> Hello, >>>>> >>>>> Today I downloaded bioperl 1.61 on my new macbook pro using >>>> fink. I >>>>> used the >>>>> >>>>> fink install bioperl.pm-588 as I could not get it to instal >>>> using >>>>> the perl version 5.10. >>>>> >>>>> But now I get an error when trying to run a bioperl script. >>>>> >>>>> Here is the error: >>>>> >>>>> Can't locate Bio/Tools/BPlite.pm in @INC (@INC contains: >>>> /sw/lib/ >>>>> perl5/darwin-thread-multi-2level /sw/lib/perl5 >>>> /sw/lib/perl5/darwin / >>>>> Library/Perl/Updates/5.10.0 >> /System/Library/Perl/5.10.0/darwin- >>>> >>>>> thread-multi-2level /System/Library/Perl/5.10.0 >>>> /Library/Perl/5.10.0/ >>>>> darwin-thread-multi-2level /Library/Perl/5.10.0 >>>> /Network/Library/ >>>>> Perl/5.10.0/darwin-thread-multi-2level >>>> /Network/Library/Perl/5.10.0 / >>>>> Network/Library/Perl >> /System/Library/Perl/Extras/5.10.0/darwin- >>>> >>>>> thread-multi-2level /System/Library/Perl/Extras/5.10.0 .) >>>> at >>>>> blastparser.pl line 8. >>>>> BEGIN failed--compilation aborted at blastparser.pl line 8. >>>>> >>>>> >>>>> I am a novice at unix and bioperl so I do not know how >>>> to >>>>> troubleshoot this, would you please hleo me? >>>>> >>>>> Thank you, >>>>> >>>>> Eric >>>>> >>>>> >>>>> Eric F. Donaldson, Ph.D. >>>>> Research Assistant Professor, Ralph Baric Lab >>>>> University of North Carolina >>>>> Department of Epidemiology >>>>> >>>>> >>>>> >>>> < >>>> >> eric_donaldson.vcf>_______________________________________________> >>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> -- >>>> Jason Stajich >>>> jason.stajich at gmail.com >>>> jason at bioperl.org >>>> >>>> >>> >>> Eric F. Donaldson, Ph.D. >>> Research Assistant Professor, Ralph Baric Lab >>> University of North Carolina >>> Department of Epidemiology >>> >>> >>> >> >> -- >> Jason Stajich >> jason.stajich at gmail.com >> jason at bioperl.org >> >> > > Eric F. Donaldson, Ph.D. > Research Assistant Professor, Ralph Baric Lab > University of North Carolina > Department of Epidemiology > > > -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From awitney at sgul.ac.uk Mon Dec 14 04:36:19 2009 From: awitney at sgul.ac.uk (Adam Witney) Date: Mon, 14 Dec 2009 09:36:19 +0000 Subject: [Bioperl-l] Bioperl code help In-Reply-To: References: Message-ID: <4B260713.3070402@sgul.ac.uk> bioperl programs are just perl programs so you should run them in exactly the same way as your perl prorgrams, from the command line HTH adam On 12/12/2009 20:04, dhwani gandhi wrote: > Hi, > I am very new to Bioperl but I am somewhat familiar to perl though. > > I write my perl programs in Notepad++ and run them in cmd. > > Now, I want to run Bioperl programs. I just installed bioperl on my > computer. And I have a program using bioperl modules in Notepad++. > > My question is how to run these programs? Can they be ran in cmd as well? or > do I use ppm? > > Please help. > > Thanks, > -Dhwani Gandhi. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From umjsm at leeds.ac.uk Mon Dec 14 05:39:32 2009 From: umjsm at leeds.ac.uk (Joan Segura Mora) Date: Mon, 14 Dec 2009 10:39:32 +0000 Subject: [Bioperl-l] extract and write a pdb chain In-Reply-To: References: <1260549882.6484.11.camel@limm-pc1254> Message-ID: <1260787172.7359.0.camel@limm-pc1254> Hi Brian, I am not calling the method add_chain, I am calling the method chain http://doc.bioperl.org/releases/bioperl-1.0.1/Bio/Structure/Entry.html#POD6 and if I don't use as an argument an object of type Bio::Structure::Chain I get an error like this (-->depends of the argument<--) ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Supplied a -->Bio::Structure::Residue=HASH(0x11be6a0)<-- to chain, we want a Bio::Structure::Chain or a list of these STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:368 STACK: Bio::Structure::Entry::chain /usr/local/share/perl/5.8.8/Bio/Structure/Entry.pm:314 STACK: read_pdb.pl:11 ----------------------------------------------------------- And if I use a Chain object I get the error that I told you. I have try this code: use Bio::Structure::IO; use strict; my $structio = Bio::Structure::IO->new(-file => "101m.pdb",'-format' => 'pdb'); my $struc = $structio->next_structure; my $new_entry = Bio::Structure::Entry->new( -id => 'structure_id'); my $model = Bio::Structure::Model->new( -id => '0'); for my $chain ($struc->get_chains) { if($chain->id eq "A"){ $new_entry->add_chain($model,$chain); last; } } $new_entry->add_model($model); my $out = Bio::Structure::IO->new(-file => ">out.pdb",'-format' => 'pdb'); $out->write_structure($new_entry); But I get an empty pdb HEADER DEFAULT CLASSIFICATION 24-JAN-70 stru REMARK 1 TER 1 A 0 MASTER END I am trying a lot of combinations, but I can't write a single chain into a file. I don't know what I am doing wrong. Thanks for helping regards, Joan On Fri, 2009-12-11 at 15:37 -0500, Brian Osborne wrote: > Joan, > > It looks to me like the first argument to the add_chain() method has > to be a Model object, the second is the Chain itself. See Structure/ > Entry.pm, for example. However if you're seeing some documentation > that says something else then tell us where, it needs to be corrected. > > In Bio::Structure an Entry consists of one or Models, each of which > has one or more Chains. This allows you to build macromolecular > complexes (an Entry), which could have more than one defined proteins > or protein complexes (Models). > > Brian O. > > On Dec 11, 2009, at 11:44 AM, Joan Segura Mora wrote: > > > Hello, > > > > I am trying to do a very easy think but I don't get it. I want to > > write > > in a file a chain of a pdb. I have try a lot of thinks but what I > > think > > that it should work is the next script: > > > > use Bio::Structure::IO; > > use strict; > > > > my $structio = Bio::Structure::IO->new(-file => "101m.pdb",'-format' > > => > > 'pdb'); > > my $struc = $structio->next_structure; > > > > my $new_entry = Bio::Structure::Entry->new( -id => 'structure_id'); > > > > for my $chain ($struc->get_chains) { > > if($chain->id eq "A"){ > > $new_entry->chain($chain); > > last; > > } > > } > > > > my $out = Bio::Structure::IO->new(-file => ">out.pdb",'-format' => > > 'pdb');# > > $out->write_structure($new_entry); > > > > it doesn't. I get the next error: > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: add_chain: first argument needs to be a Model object () > > > > STACK: Error::throw > > STACK: > > Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/Root.pm: > > 368 > > STACK: > > Bio::Structure::Entry::add_chain /usr/local/share/perl/5.8.8/Bio/ > > Structure/Entry.pm:335 > > STACK: > > Bio::Structure::Entry::get_chains /usr/local/share/perl/5.8.8/Bio/ > > Structure/Entry.pm:391 > > STACK: > > Bio::Structure::Entry::chain /usr/local/share/perl/5.8.8/Bio/ > > Structure/Entry.pm:304 > > STACK: read_pdb.pl:10 > > ----------------------------------------------------------- > > > > As far I understand the documentation, the method chain of the object > > Bio::Structure::Entry requires an as input an object of type Chain. > > > > Any solution will be very welcome. > > > > best regards, > > Joan > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From fs5 at sanger.ac.uk Mon Dec 14 07:18:17 2009 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Mon, 14 Dec 2009 12:18:17 +0000 Subject: [Bioperl-l] parse EMBL Feature Table only Message-ID: <1260793098.17180.184.camel@deskpro15336.dynamic.sanger.ac.uk> Hi, Maybe I'm really missing something here but I can't find how to parse a file that is basically just the Feature Table from an EMBL file, looking like this: FT CDS join(37467..37521,38078..38195,38312..38400,38859..38936,39067..39154,39379..39675,39818..39842) FT /colour=7 FT /product="RNA-binding protein, putative" FT CDS 213199..214812 FT /colour=7 FT /product="eukaryotic translation initiation factor 3 FT subunit 7, putative" ...[more of the same] So the file has no header and no actual sequence and it is used simply to annotate a chromosome in a genome assembly. I've always used GFF for that purpose but have been given this file now. BioSeqIO->new(-format=>"EMBL") complains about the missing header and if I stick in a fake ID line, it warns about the missing sequence and the fact that the features don't fit on the sequence (of length 0). Of course it's not difficult to write my own parser but I'm sure there must be a BioPerl way of doing that that I have just overlooked. Thanks for your help. -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From David.Messina at sbc.su.se Mon Dec 14 09:06:54 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 14 Dec 2009 15:06:54 +0100 Subject: [Bioperl-l] parse EMBL Feature Table only In-Reply-To: <1260793098.17180.184.camel@deskpro15336.dynamic.sanger.ac.uk> References: <1260793098.17180.184.camel@deskpro15336.dynamic.sanger.ac.uk> Message-ID: <0F8203F6-06D8-43EF-BB35-EB723F4B9DFA@sbc.su.se> Hi Frank, You will need to look at the feature table parsing code that Bio::SeqIO::embl itself uses to read those lines, probably the _read_FTHelper_EMBL method: http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/SeqIO/embl.html#POD12 Since you're trying to parse what is effectively a part of an EMBL record, and a somewhat complicated part at that, as you might imagine this could be a little hairy. It might be easier to go the route you started down: add a fake header and a (relatively long) fake sequence, and go through Bio::SeqIO in the normal way. Dave PS ? I suspect you may already be familiar with it, but for an overview on how to get at data in feature tables, look at the Feature Annotation HOWTO: http://www.bioperl.org/wiki/HOWTO:Feature-Annotation From eric_donaldson at med.unc.edu Mon Dec 14 09:22:40 2009 From: eric_donaldson at med.unc.edu (eric_donaldson at med.unc.edu) Date: Mon, 14 Dec 2009 09:22:40 -0500 Subject: [Bioperl-l] problem with install In-Reply-To: <7B2EBA9A-E9DF-49A5-ABC7-C42512BA9C9A@bioperl.org> References: <119F436D-D36D-4D28-BAE7-6EB17D665FC2@bioperl.org> <404D2600-58D3-4491-834E-8C9F860D3ACC@bioperl.org> <7B2EBA9A-E9DF-49A5-ABC7-C42512BA9C9A@bioperl.org> Message-ID: Thank you Jason.? I appreciate the help. Eric ----- Original Message ----- From: Jason Stajich Date: Monday, December 14, 2009 12:10 am Subject: Re: [Bioperl-l] problem with install To: eric_donaldson at med.unc.edu Cc: BioPerl List > Eric - > please CC the bioperl list when responding so others can help - > I? > can't be the only answerer. > > But since your @INC message doesn't include /sw/lib/perl5/5.8.8/ > you? > would need to make sure that is added to your PERL5LIB. > There are some help docs on the perl sites I expect on how to > get your? > PATHs in order. > > Or you can just install via CPAN which will put it in the right > path -? > there are docs on the bioperl website about installing via CPAN. > > -jason > On Dec 13, 2009, at 9:02 PM, eric_donaldson at med.unc.edu wrote: > > > Hi Jason, > > > > The fink package did not have support for perl 5.10, so I > attempted? > > to install the perl 5.8.6 package. > > > > When I attempted: locate Bio/SearchIO.pm > > I got: -bash: $: command not found > > > > So even though I can find SearchIO.pm in > sw/lib/perl5/5.8.8/Bio/ > > SearchIO.pm? I cannot access it.? Do I need to use > the older version? > > of perl? > > > > Would it be better to install with CPAN?? If so, can you > send me to? > > a page that has instructions? > > > > Thank you so much! > > > > ERic > > > > > > ----- Original Message ----- > > From: Jason Stajich > > Date: Sunday, December 13, 2009 11:10 pm > > Subject: Re: [Bioperl-l] problem with install > > To: eric_donaldson at med.unc.edu > > Cc: BioPerl List > > > >> So you installed perl-5.10 or using system perl?? I'm > >> confused if you > >> actually installed bioperl.pm or not via fink? > >> > >> It seems like since your @INC or $PERL5LIB points to > >> /sw/lib/perl5 > >> which is one of the dirs it would have installed in, but I don't > >> think > >> you actually installed bioperl. > >> > >> you can try and do: > >> $ locate Bio/SearchIO.pm > >> > >> We'll see if any of the other osx/fink gurus are on the list > >> that can > >> help or you can install it via CPAN I guess. > >> > >> -jason > >> On Dec 13, 2009, at 6:39 PM, eric_donaldson at med.unc.edu wrote: > >> > >>> > >>> I actually tried a different blastparser that uses > >> BIO::SearchIO and > >>> got the same message: > >>> > >>> Can't locate Bio/SearchIO.pm in @INC (@INC contains: > >> /sw/lib/perl5/ > >>> darwin-thread-multi-2level /sw/lib/perl5 /sw/lib/perl5/darwin > >> / > >>> Library/Perl/Updates/5.10.0 > /System/Library/Perl/5.10.0/darwin- > >> > >>> thread-multi-2level /System/Library/Perl/5.10.0 > >> /Library/Perl/5.10.0/ > >>> darwin-thread-multi-2level /Library/Perl/5.10.0 > >> /Network/Library/ > >>> Perl/5.10.0/darwin-thread-multi-2level > >> /Network/Library/Perl/5.10.0 / > >>> Network/Library/Perl > /System/Library/Perl/Extras/5.10.0/darwin- > >> > >>> thread-multi-2level /System/Library/Perl/Extras/5.10.0 .) > >> at > >>> blastparser.new.pl line 8. > >>> BEGIN failed--compilation aborted at blastparser.new.pl line 8. > >>> > >>> I suspect there is a path problem, but am not savvy enough to > >> know > >>> how to fix it.? I am really just a hacker.... I have > >> several scripts > >>> that I use regularly and that I know how to modify, but am > >> lost when > >>> they don't work... > >>> > >>> Thanks for any help, > >>> > >>> Eric > >>> > >>> ----- Original Message ----- > >>> From: Jason Stajich > >>> Date: Sunday, December 13, 2009 8:24 pm > >>> Subject: Re: [Bioperl-l] problem with install > >>> To: eric_donaldson at med.unc.edu > >>> Cc: bioperl-l at bioperl.org > >>> > >>>> Hi Eric - > >>>> > >>>> Bio::Tools::BPlite is no longer supported in Bioperl - it > >>>> was > >>>> deprecated several releases ago. > >>>> It was replaced with Bio::SearchIO > >>>> > >>>> -jason > >>>> On Dec 13, 2009, at 3:15 PM, eric_donaldson at med.unc.edu wrote: > >>>> > >>>>> Hello, > >>>>> > >>>>> Today I downloaded bioperl 1.61 on my new macbook pro using > >>>> fink.? I > >>>>> used the > >>>>> > >>>>> fink install bioperl.pm-588 as I could not get it to instal > >>>> using > >>>>> the perl version 5.10. > >>>>> > >>>>> But now I get an error when trying to run a bioperl script. > >>>>> > >>>>> Here is the error: > >>>>> > >>>>> Can't locate Bio/Tools/BPlite.pm in @INC (@INC contains: > >>>> /sw/lib/ > >>>>> perl5/darwin-thread-multi-2level /sw/lib/perl5 > >>>> /sw/lib/perl5/darwin / > >>>>> Library/Perl/Updates/5.10.0 > >> /System/Library/Perl/5.10.0/darwin- > >>>> > >>>>> thread-multi-2level /System/Library/Perl/5.10.0 > >>>> /Library/Perl/5.10.0/ > >>>>> darwin-thread-multi-2level /Library/Perl/5.10.0 > >>>> /Network/Library/ > >>>>> Perl/5.10.0/darwin-thread-multi-2level > >>>> /Network/Library/Perl/5.10.0 / > >>>>> Network/Library/Perl > >> /System/Library/Perl/Extras/5.10.0/darwin- > >>>> > >>>>> thread-multi-2level /System/Library/Perl/Extras/5.10.0 .) > >>>> at > >>>>> blastparser.pl line 8. > >>>>> BEGIN failed--compilation aborted at blastparser.pl line 8. > >>>>> > >>>>> > >>>>> I am a novice at unix and bioperl so I do not know how > >>>> to > >>>>> troubleshoot this, would you please hleo me? > >>>>> > >>>>> Thank you, > >>>>> > >>>>> Eric > >>>>> > >>>>> > >>>>> Eric F. Donaldson, Ph.D. > >>>>> Research Assistant Professor, Ralph Baric Lab > >>>>> University of North Carolina > >>>>> Department of Epidemiology > >>>>> > >>>>> > >>>>> > >>>> < > >>>> > >> eric_donaldson.vcf>_______________________________________________> > >>>> Bioperl-l mailing list > >>>>> Bioperl-l at lists.open-bio.org > >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>> > >>>> -- > >>>> Jason Stajich > >>>> jason.stajich at gmail.com > >>>> jason at bioperl.org > >>>> > >>>> > >>> > >>> Eric F. Donaldson, Ph.D. > >>> Research Assistant Professor, Ralph Baric Lab > >>> University of North Carolina > >>> Department of Epidemiology > >>> > >>> > >>> > >> > >> -- > >> Jason Stajich > >> jason.stajich at gmail.com > >> jason at bioperl.org > >> > >> > > > > Eric F. Donaldson, Ph.D. > > Research Assistant Professor, Ralph Baric Lab > > University of North Carolina > > Department of Epidemiology > > > > > > > > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > > Eric F. Donaldson, Ph.D. Research Assistant Professor, Ralph Baric Lab University of North Carolina Department of Epidemiology -------------- next part -------------- begin:vcard n:Donaldson;Eric fn:Eric F. Donaldson, PhD tel;work:919.966.3881 org:University of North Carolina, School of Medicine;Epidemiology adr:;;2107 McGavran-Greenberg Hall CB# 7435 ;Chapel Hill;NC;27599;USA email;internet:eric_donaldson at med.unc.edu email;home;internet:viralnerd at gmail.com title:Research Assistant Professor version:2.1 end:vcard From umjsm at leeds.ac.uk Mon Dec 14 11:58:03 2009 From: umjsm at leeds.ac.uk (Joan Segura Mora) Date: Mon, 14 Dec 2009 16:58:03 +0000 Subject: [Bioperl-l] extract and write a pdb chain In-Reply-To: <1260787172.7359.0.camel@limm-pc1254> References: <1260549882.6484.11.camel@limm-pc1254> <1260787172.7359.0.camel@limm-pc1254> Message-ID: <1260809883.7359.15.camel@limm-pc1254> Hi again, To extract a pdb chain in a file, I have had to do it adding atom by atom to a new structure. use Bio::Structure::IO; use strict; my $structio = Bio::Structure::IO->new(-file => "101m.pdb",'-format' => 'pdb'); my $struc = $structio->next_structure; my $new_struct = Bio::Structure::Entry->new( -id => 'structure_id'); for my $model ($struc->get_models){ $new_struct->add_model($model); for my $chain ($struc->get_chains) { $new_struct->add_chain($model,$chain); if($chain->id eq "A"){ foreach my $res ($struc->get_residues($chain)){ $new_struct->add_residue($chain,$res); foreach my $atom ($struc->get_atoms($res)){ $new_struct->add_atom($res,$atom); } } } last; } last; } my $out = Bio::Structure::IO->new(-file => ">out.pdb",'-format' => 'pdb'); $out->write_structure($new_struct); I suppose that there should be a more elegant way to do it. If someone knows it and can explain it I will be very grateful. kind regards, Joan On Mon, 2009-12-14 at 10:39 +0000, Joan Segura Mora wrote: > Hi Brian, > > I am not calling the method add_chain, I am calling the method chain > > http://doc.bioperl.org/releases/bioperl-1.0.1/Bio/Structure/Entry.html#POD6 > > and if I don't use as an argument an object of type > > Bio::Structure::Chain > > I get an error like this (-->depends of the argument<--) > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Supplied a -->Bio::Structure::Residue=HASH(0x11be6a0)<-- to chain, > we want a Bio::Structure::Chain or a list of these > > STACK: Error::throw > STACK: > Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:368 > STACK: > Bio::Structure::Entry::chain /usr/local/share/perl/5.8.8/Bio/Structure/Entry.pm:314 > STACK: read_pdb.pl:11 > ----------------------------------------------------------- > > > And if I use a Chain object I get the error that I told you. > > I have try this code: > > use Bio::Structure::IO; > use strict; > > my $structio = Bio::Structure::IO->new(-file => "101m.pdb",'-format' => > 'pdb'); > my $struc = $structio->next_structure; > my $new_entry = Bio::Structure::Entry->new( -id => 'structure_id'); > my $model = Bio::Structure::Model->new( -id => '0'); > > for my $chain ($struc->get_chains) { > if($chain->id eq "A"){ > $new_entry->add_chain($model,$chain); > > last; > } > } > $new_entry->add_model($model); > my $out = Bio::Structure::IO->new(-file => ">out.pdb",'-format' => > 'pdb'); > $out->write_structure($new_entry); > > > But I get an empty pdb > > HEADER DEFAULT CLASSIFICATION 24-JAN-70 > stru > REMARK > 1 > TER 1 A > 0 > MASTER > END > > I am trying a lot of combinations, but I can't write a single chain into > a file. I don't know what I am doing wrong. > > Thanks for helping > > regards, > Joan > > > On Fri, 2009-12-11 at 15:37 -0500, Brian Osborne wrote: > > Joan, > > > > It looks to me like the first argument to the add_chain() method has > > to be a Model object, the second is the Chain itself. See Structure/ > > Entry.pm, for example. However if you're seeing some documentation > > that says something else then tell us where, it needs to be corrected. > > > > In Bio::Structure an Entry consists of one or Models, each of which > > has one or more Chains. This allows you to build macromolecular > > complexes (an Entry), which could have more than one defined proteins > > or protein complexes (Models). > > > > Brian O. > > > > On Dec 11, 2009, at 11:44 AM, Joan Segura Mora wrote: > > > > > Hello, > > > > > > I am trying to do a very easy think but I don't get it. I want to > > > write > > > in a file a chain of a pdb. I have try a lot of thinks but what I > > > think > > > that it should work is the next script: > > > > > > use Bio::Structure::IO; > > > use strict; > > > > > > my $structio = Bio::Structure::IO->new(-file => "101m.pdb",'-format' > > > => > > > 'pdb'); > > > my $struc = $structio->next_structure; > > > > > > my $new_entry = Bio::Structure::Entry->new( -id => 'structure_id'); > > > > > > for my $chain ($struc->get_chains) { > > > if($chain->id eq "A"){ > > > $new_entry->chain($chain); > > > last; > > > } > > > } > > > > > > my $out = Bio::Structure::IO->new(-file => ">out.pdb",'-format' => > > > 'pdb');# > > > $out->write_structure($new_entry); > > > > > > it doesn't. I get the next error: > > > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > > MSG: add_chain: first argument needs to be a Model object () > > > > > > STACK: Error::throw > > > STACK: > > > Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/Root.pm: > > > 368 > > > STACK: > > > Bio::Structure::Entry::add_chain /usr/local/share/perl/5.8.8/Bio/ > > > Structure/Entry.pm:335 > > > STACK: > > > Bio::Structure::Entry::get_chains /usr/local/share/perl/5.8.8/Bio/ > > > Structure/Entry.pm:391 > > > STACK: > > > Bio::Structure::Entry::chain /usr/local/share/perl/5.8.8/Bio/ > > > Structure/Entry.pm:304 > > > STACK: read_pdb.pl:10 > > > ----------------------------------------------------------- > > > > > > As far I understand the documentation, the method chain of the object > > > Bio::Structure::Entry requires an as input an object of type Chain. > > > > > > Any solution will be very welcome. > > > > > > best regards, > > > Joan > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From gowthaman.ramasamy at sbri.org Mon Dec 14 14:16:32 2009 From: gowthaman.ramasamy at sbri.org (Gowthaman Ramasamy) Date: Mon, 14 Dec 2009 11:16:32 -0800 Subject: [Bioperl-l] GO::Parser / GO::Model::Term In-Reply-To: <67E6A22C-6968-460D-B192-E129773A0BA5@vecna.com> Message-ID: Hi All, I have a list of GO terms. And would like to pull GO accessions for them. I can easily do the revere of it using get_term("GO::00000051"). But can someone tell me how to get the GO accessions from GO Terms , for eg: retrive GO accession for "citrulline metabolic process". Thanks very much, Gowtham From lsbrath at gmail.com Mon Dec 14 14:41:39 2009 From: lsbrath at gmail.com (Mgavi Brathwaite) Date: Mon, 14 Dec 2009 14:41:39 -0500 Subject: [Bioperl-l] Issues with loading BioPerl-1.6.0 on to my Mac Message-ID: <69367b8f0912141141n5bf94978k61dc6e31e54a4a8a@mail.gmail.com> Hello, I have loaded BioPerl -1.6.0 onto my Mac. When I run my script I get the following error message: Can't locate Bio/SeqIO.pm in @INC (@INC contains: /sw/lib/perl5 /sw/lib/perl5/darwin /System/Library/Perl/5.8.8/darwin-thread-multi-2level /System/Library/Perl/5.8.8 /Library/Perl/5.8.8/darwin-thread-multi-2level /Library/Perl/5.8.8 /Library/Perl /Network/Library/Perl/5.8.8/darwin-thread-multi-2level /Network/Library/Perl/5.8.8 /Network/Library/Perl /System/Library/Perl/Extras/5.8.8/darwin-thread-multi-2level /System/Library/Perl/Extras/5.8.8 /Library/Perl/5.8.6 /Library/Perl/5.8.1 .) at project_example.pl line 4. BEGIN failed--compilation aborted at project_example.pl line 4. I moved the BioPerl dir to /sw/lib/perl5 and I still get the error message. Any ideas? MEB From scott at scottcain.net Mon Dec 14 14:47:05 2009 From: scott at scottcain.net (Scott Cain) Date: Mon, 14 Dec 2009 14:47:05 -0500 Subject: [Bioperl-l] Issues with loading BioPerl-1.6.0 on to my Mac In-Reply-To: <69367b8f0912141141n5bf94978k61dc6e31e54a4a8a@mail.gmail.com> References: <69367b8f0912141141n5bf94978k61dc6e31e54a4a8a@mail.gmail.com> Message-ID: <4536f7700912141147ld16d67av1a58bbf5c1fc5e9e@mail.gmail.com> Hi Mgavi, I think Jason may have already started helping, but the question is: is SeqIO.pm anywhere in those directories? If not, why not? If so, why can't the perl you are using find it? Do you have more than one instance of perl on your machine (fairly likely if you are using a fink-installed BioPerl)? When you execute your script, which perl are you using? Scott On Mon, Dec 14, 2009 at 2:41 PM, Mgavi Brathwaite wrote: > Hello, > > I have loaded BioPerl -1.6.0 onto my Mac. When I run my script I get the > following error message: > > Can't locate Bio/SeqIO.pm in @INC (@INC contains: /sw/lib/perl5 > /sw/lib/perl5/darwin /System/Library/Perl/5.8.8/darwin-thread-multi-2level > /System/Library/Perl/5.8.8 /Library/Perl/5.8.8/darwin-thread-multi-2level > /Library/Perl/5.8.8 /Library/Perl > /Network/Library/Perl/5.8.8/darwin-thread-multi-2level > /Network/Library/Perl/5.8.8 /Network/Library/Perl > /System/Library/Perl/Extras/5.8.8/darwin-thread-multi-2level > /System/Library/Perl/Extras/5.8.8 /Library/Perl/5.8.6 /Library/Perl/5.8.1 .) > at project_example.pl line 4. > BEGIN failed--compilation aborted at project_example.pl line 4. > > I moved the BioPerl dir to /sw/lib/perl5 and I still get the error message. > Any ideas? > > MEB > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From bosborne11 at verizon.net Mon Dec 14 14:45:35 2009 From: bosborne11 at verizon.net (Brian Osborne) Date: Mon, 14 Dec 2009 14:45:35 -0500 Subject: [Bioperl-l] Issues with loading BioPerl-1.6.0 on to my Mac In-Reply-To: <69367b8f0912141141n5bf94978k61dc6e31e54a4a8a@mail.gmail.com> References: <69367b8f0912141141n5bf94978k61dc6e31e54a4a8a@mail.gmail.com> Message-ID: <38104B41-104B-42D7-94FA-30016E110BFD@verizon.net> Mgavi, So there's a directory called /sw/lib/perl5/Bio? Or is it called something else? Brian O. On Dec 14, 2009, at 2:41 PM, Mgavi Brathwaite wrote: > Hello, > > I have loaded BioPerl -1.6.0 onto my Mac. When I run my script I get > the > following error message: > > Can't locate Bio/SeqIO.pm in @INC (@INC contains: /sw/lib/perl5 > /sw/lib/perl5/darwin /System/Library/Perl/5.8.8/darwin-thread- > multi-2level > /System/Library/Perl/5.8.8 /Library/Perl/5.8.8/darwin-thread- > multi-2level > /Library/Perl/5.8.8 /Library/Perl > /Network/Library/Perl/5.8.8/darwin-thread-multi-2level > /Network/Library/Perl/5.8.8 /Network/Library/Perl > /System/Library/Perl/Extras/5.8.8/darwin-thread-multi-2level > /System/Library/Perl/Extras/5.8.8 /Library/Perl/5.8.6 /Library/Perl/ > 5.8.1 .) > at project_example.pl line 4. > BEGIN failed--compilation aborted at project_example.pl line 4. > > I moved the BioPerl dir to /sw/lib/perl5 and I still get the error > message. > Any ideas? > > MEB > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason at bioperl.org Mon Dec 14 16:42:09 2009 From: jason at bioperl.org (Jason Stajich) Date: Mon, 14 Dec 2009 13:42:09 -0800 Subject: [Bioperl-l] fasta format In-Reply-To: References: <36E9C2F3282347918FD3B3ACA0EC8126@jonas> Message-ID: <614B8A2C-3B17-4E3B-AAC5-3210C7435BB5@bioperl.org> you can read the man page from sean Eddy or use it exactly as I showed you sreformat fasta filename > filename.new you can also use the 1st example which is a bioperl solution. -jason On Dec 13, 2009, at 7:02 AM, Jonas Schaer wrote: > Hi Jason, > thank you very much for your answer. > i am sorry to bother u again but i'm afraid i need some help with > that because i don't see how to use sreformat? > i dont get it managed to write a script that works. > > thank u again :) > jonas > > > ----- Original Message ----- From: "Jason Stajich" > To: "Jonas Schaer" > Cc: > Sent: Tuesday, December 08, 2009 6:44 PM > Subject: Re: [Bioperl-l] fasta format > > >> you can run >> sreformat (HMMER) or bp_sreformat.pl script in scripts/utilties (or >> that is installed when you install the Bioperl scripts) >> $ bp_sreformat.pl -if fasta -of fasta -i yourfile.fa -o >> yournewfile.fa >> # rename it back >> $ mv yournewfile.fa yourfile.fa >> >> or >> $ sreformat fasta yourfile.fa > yournewfile.fa >> $ mv yournewfile.fa yourfile.fa >> >> >> -jason >> On Dec 8, 2009, at 7:21 AM, Jonas Schaer wrote: >> >>> Hi there, >>> I have a little question concerning bioperl. I have >>> BioPerl-1.6.1.tar.gz installed and i use the fasta.pm module to read >>> in some fasta files. first it worked fine, but now i have some >>> fastafiles in slightly different format (not all lines have the same >>> length!). >>> >>> ------------- EXCEPTION ------------- >>> MSG: Each line of the fasta entry must be the same length except the >>> last. >>> Line above #49 ' >>> ..' is 28 != 101 chars. >>> STACK Bio::DB::Fasta::calculate_offsets C:/Perl/site/lib/Bio/DB/ >>> Fasta.pm:771 >>> STACK Bio::DB::Fasta::index_file C:/Perl/site/lib/Bio/DB/Fasta.pm: >>> 681 >>> STACK Bio::DB::Fasta::new C:/Perl/site/lib/Bio/DB/Fasta.pm:491 >>> STACK Bio::DB::Fasta::newFh C:/Perl/site/lib/Bio/DB/Fasta.pm:513 >>> STACK main::readfasta blast_eval.pm:174 >>> STACK toplevel blast_eval.pm:83 >>> ------------------------------------- >>> >>> indexing was interrupted, so unlinking test.fasta.index at C:/Perl/ >>> site/lib/Bio/ >>> DB/Fasta.pm line 1054. >>> >>> >>> Is there any way to use these fasta files with diffrent length of >>> lines with this fasta.pm module or will i have to change the format >>> of my fasta-files(big databases...) ? >>> >>> Thanks in advance for any help! >>> >>> Regards, Jonas >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Jason Stajich >> jason.stajich at gmail.com >> jason at bioperl.org > > > -------------------------------------------------------------------------------- > > > > No virus found in this incoming message. > Checked by AVG - www.avg.com > Version: 8.5.426 / Virus Database: 270.14.98/2552 - Release Date: > 12/08/09 07:34:00 > -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From cjfields at illinois.edu Mon Dec 14 20:23:05 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 14 Dec 2009 19:23:05 -0600 Subject: [Bioperl-l] LocatableSeq NSE format - suggested changes Message-ID: <579AEE4E-A79C-496B-881C-226616BF0A2B@illinois.edu> All, The current output for NSE format (Name/Start-End) via Bio::LocatableSeq::get_nse() currently doesn't allow for strandedness. I have seen two variations of NSE that incorporate strandedness: 1) Stockholm Rfam reverses start and end if the strand == -1 chrY/598-1 2) Sheldon McKay's Gbrowse_syn uses Name(strand)/start-end rice-3(+)/16598648-16600199 The former breaks fewer things within BioPerl, but the latter seems more explicit. Any preferences? Do we want a new method that creates this, and deprecate out simple non-stranded NSE? chris From bernd.web at gmail.com Tue Dec 15 03:37:44 2009 From: bernd.web at gmail.com (Bernd Web) Date: Tue, 15 Dec 2009 09:37:44 +0100 Subject: [Bioperl-l] GO::Parser / GO::Model::Term In-Reply-To: References: <67E6A22C-6968-460D-B192-E129773A0BA5@vecna.com> Message-ID: <716af09c0912150037k513c6efah442a236cb323e14e@mail.gmail.com> Dear Gowthaman, A non-BioPerl solution: the Ontology Lookup service at EBI. It also provides a web service interface. http://www.ebi.ac.uk/ontology-lookup/ citrulline metabolic process has to be selected from the pull-down list in the interactive page. This will return the ID (GO:0000052) and addional info: definition The chemical reactions and pathways involving citrulline, N5-carbamoyl-L-ornithine, an alpha amino acid not found in proteins. preferred name citrulline metabolic process exact synonym citrulline metabolism subset Prokaryotic GO subset xref_definition ISBN:209853"Oxford Dictionary of Biochemistry and Molecular Biology" The webservice is described at http://www.ebi.ac.uk/ontology-lookup/WSDLDocumentation.do Regards, Bernd On Mon, Dec 14, 2009 at 8:16 PM, Gowthaman Ramasamy wrote: > > Hi All, > I have a list of GO terms. And would like to pull GO accessions for them. > I can easily do the revere of it using get_term("GO::00000051"). > > But can someone tell me how to get the GO accessions from GO Terms , for eg: retrive GO accession for "citrulline metabolic process". > > > Thanks very much, > Gowtham > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From fs5 at sanger.ac.uk Tue Dec 15 05:38:40 2009 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Tue, 15 Dec 2009 10:38:40 +0000 Subject: [Bioperl-l] parse EMBL Feature Table only In-Reply-To: <0F8203F6-06D8-43EF-BB35-EB723F4B9DFA@sbc.su.se> References: <1260793098.17180.184.camel@deskpro15336.dynamic.sanger.ac.uk> <0F8203F6-06D8-43EF-BB35-EB723F4B9DFA@sbc.su.se> Message-ID: <1260873520.17180.215.camel@deskpro15336.dynamic.sanger.ac.uk> Thanks Dave, good to know that I haven't overlooked something bleedingly obvious in Bioperl that already does this :-) No problem, I have already implemented a simple parser to do it, which works fine for my files. Thanks Frank On Mon, 2009-12-14 at 15:06 +0100, Dave Messina wrote: > Hi Frank, > > You will need to look at the feature table parsing code that Bio::SeqIO::embl itself uses to read those lines, probably the _read_FTHelper_EMBL method: > http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/SeqIO/embl.html#POD12 > > Since you're trying to parse what is effectively a part of an EMBL record, and a somewhat complicated part at that, as you might imagine this could be a little hairy. > > It might be easier to go the route you started down: add a fake header and a (relatively long) fake sequence, and go through Bio::SeqIO in the normal way. > > > Dave > > > PS ? I suspect you may already be familiar with it, but for an overview on how to get at data in feature tables, look at the Feature Annotation HOWTO: > > http://www.bioperl.org/wiki/HOWTO:Feature-Annotation > > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From rmb32 at cornell.edu Tue Dec 15 10:09:43 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Tue, 15 Dec 2009 07:09:43 -0800 Subject: [Bioperl-l] AGI's fpc stuff: Bio::Map::Physical, Bio::MapIO::fpc, etc Message-ID: <4B27A6B7.6090709@cornell.edu> Hi all, Recently I caught an interesting thing related to making GFF files out of FPC maps built recently using Bio::MapIO;:fpc. All of the coordinates in the resulting GFF3 and the sizes of the contigs and clones seem to be dilated by 4x from where they should be. This didn't happen with some earlier FPC datasets I ran through these modules. I haven't gone through any of this very thoroughly, but I notice in Bio::Map::Physical::print_gffstyle() at line 765 there's a line like 'my $basepair = 4096', and the routine goes on to use $basepair as a sort of multiplier for converting the native physical map units into basepairs for GFF-style output. This makes me wonder if the newer FPC datasets coming out require a different $basepairs value, maybe 1024? Are the original authors of these modules still around on this list? Rob -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From tristan.lefebure at gmail.com Tue Dec 15 12:18:26 2009 From: tristan.lefebure at gmail.com (Tristan Lefebure) Date: Tue, 15 Dec 2009 12:18:26 -0500 Subject: [Bioperl-l] ncurses and bioperl? Message-ID: <200912151218.26357.tristan.lefebure@gmail.com> Hello, (Be careful: the following is a very naive question) Something that I find myself missing is a simple way to look at alignments and trees on remote machines where I don't have access to X. Since, (1) one can make wonderful terminal programs like screen and emacs by using ncurses, (2) that alignment and tree objects are already well handled in bioperl, and (3) that there is a CPAN Curses module; doing 1+2+3, may I dream of a curse/bioperl perl program to render alignment and trees? I suppose a plain C program would be much better, but well I am a biologist... Thanks, --Tristan From jason at bioperl.org Tue Dec 15 12:50:52 2009 From: jason at bioperl.org (Jason Stajich) Date: Tue, 15 Dec 2009 09:50:52 -0800 Subject: [Bioperl-l] ncurses and bioperl? In-Reply-To: <200912151218.26357.tristan.lefebure@gmail.com> References: <200912151218.26357.tristan.lefebure@gmail.com> Message-ID: not to say this isn't a good idea, but currently for curses I would use the treeviewing with retree from PHYLIP and for short read alignments the samtools tview or Gambit (MarthLab) works great or something like ralee for viewing MSA alignments (though targeted for RNA editing) http://personalpages.manchester.ac.uk/staff/sam.griffiths-jones/software/ralee/ http://dx.doi.org/10.1093/bioinformatics/bth489 Just that there are prior examples so would be able to learn from them if you still wanted to roll your own here. -jason On Dec 15, 2009, at 9:18 AM, Tristan Lefebure wrote: > Hello, > > (Be careful: the following is a very naive question) > > Something that I find myself missing is a simple way to look > at alignments and trees on remote machines where I don't > have access to X. Since, > (1) one can make wonderful terminal programs like screen > and emacs by using ncurses, > (2) that alignment and tree objects are already well > handled in bioperl, and > (3) that there is a CPAN Curses module; > > doing 1+2+3, may I dream of a curse/bioperl perl program to > render alignment and trees? I suppose a plain C program > would be much better, but well I am a biologist... > > Thanks, > > --Tristan > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From roy.chaudhuri at gmail.com Tue Dec 15 12:47:26 2009 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Tue, 15 Dec 2009 17:47:26 +0000 Subject: [Bioperl-l] ncurses and bioperl? In-Reply-To: <200912151218.26357.tristan.lefebure@gmail.com> References: <200912151218.26357.tristan.lefebure@gmail.com> Message-ID: <4B27CBAE.5000303@gmail.com> Hi Tristan, Not a Bioperl solution, but retree from the Phylip package displays trees in a terminal. Roy. On 15/12/2009 17:18, Tristan Lefebure wrote: > Hello, > > (Be careful: the following is a very naive question) > > Something that I find myself missing is a simple way to look > at alignments and trees on remote machines where I don't > have access to X. Since, > (1) one can make wonderful terminal programs like screen > and emacs by using ncurses, > (2) that alignment and tree objects are already well > handled in bioperl, and > (3) that there is a CPAN Curses module; > > doing 1+2+3, may I dream of a curse/bioperl perl program to > render alignment and trees? I suppose a plain C program > would be much better, but well I am a biologist... > > Thanks, > > --Tristan > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From nml5566 at gmail.com Tue Dec 15 16:37:30 2009 From: nml5566 at gmail.com (Nathan Liles) Date: Tue, 15 Dec 2009 15:37:30 -0600 Subject: [Bioperl-l] Bio::Ontology::OBOEngine for parsing obo files? Message-ID: <81a20b1e0912151337q786b6c35se18328173ec27abd@mail.gmail.com> Is the Bio::Ontology::OBOEngine module working or being currently maintained? I tried following the documentation in the module: * use Bio::Ontology::OBOEngine; my $parser = Bio::Ontology::OBOEngine->new ( -file => "gene_ontology.obo" ); my $engine = $parser->parse(); *But, it throws an error when I run the file saying 'Can't locate object method "parse" '. Does anyone have any experience getting this module working; or, is there any alternative bioperl module to extract terms and relationships out of sequence ontology files? From hlapp at drycafe.net Tue Dec 15 17:05:10 2009 From: hlapp at drycafe.net (Hilmar Lapp) Date: Tue, 15 Dec 2009 17:05:10 -0500 Subject: [Bioperl-l] Bio::Ontology::OBOEngine for parsing obo files? In-Reply-To: <81a20b1e0912151337q786b6c35se18328173ec27abd@mail.gmail.com> References: <81a20b1e0912151337q786b6c35se18328173ec27abd@mail.gmail.com> Message-ID: That shouldn't happen I suppose, but you're not supposed really to use the engine directly. Rather it will be used as a backing parser by the Bio::OntologyIO parser you choose. Have you tried that route and found it not to work? -hilmar On Dec 15, 2009, at 4:37 PM, Nathan Liles wrote: > Is the Bio::Ontology::OBOEngine module working or being currently > maintained? I tried following the documentation in the module: > > * use Bio::Ontology::OBOEngine; > > my $parser = Bio::Ontology::OBOEngine->new > ( -file => "gene_ontology.obo" ); > > my $engine = $parser->parse(); > > *But, it throws an error when I run the file saying 'Can't locate > object > method "parse" '. Does anyone have any experience getting this module > working; or, is there any alternative bioperl module to extract > terms and > relationships out of sequence ontology files? > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From David.Messina at sbc.su.se Wed Dec 16 04:58:16 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 16 Dec 2009 10:58:16 +0100 Subject: [Bioperl-l] LocatableSeq NSE format - suggested changes In-Reply-To: <579AEE4E-A79C-496B-881C-226616BF0A2B@illinois.edu> References: <579AEE4E-A79C-496B-881C-226616BF0A2B@illinois.edu> Message-ID: I'd tend to be inclined more towards option 1 over option 2 because option 2 pollutes the name field. (Although that's not a huge problem if the '(strand)' is always just before the '/'.) It's a question of whether to optimize human-readability over machine-readabilitiy: option 2 favors the former over the latter, and option 1 the reverse. Whichever way you go, I think > a new method that creates this, and deprecate[s] out simple non-stranded NSE would be great. Dave From maj at fortinbras.us Wed Dec 16 07:51:24 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 16 Dec 2009 07:51:24 -0500 Subject: [Bioperl-l] LocatableSeq NSE format - suggested changes In-Reply-To: <579AEE4E-A79C-496B-881C-226616BF0A2B@illinois.edu> References: <579AEE4E-A79C-496B-881C-226616BF0A2B@illinois.edu> Message-ID: <6723123C0ABD447190639AE1F5D1A6A7@NewLife> I'm with Dave; option 1 is cleaner. The only problem might be the automatic interpretation of older output as always plus strand, but presumably these would have had to record the strandedness explicitly elsewhere, so they would be updatable. I'm definitely for making strandedness part of the spec in some way. cheers MAJ ----- Original Message ----- From: "Chris Fields" To: "BioPerl List" Sent: Monday, December 14, 2009 8:23 PM Subject: [Bioperl-l] LocatableSeq NSE format - suggested changes > All, > > The current output for NSE format (Name/Start-End) via > Bio::LocatableSeq::get_nse() currently doesn't allow for strandedness. I have > seen two variations of NSE that incorporate strandedness: > > 1) Stockholm Rfam reverses start and end if the strand == -1 > > chrY/598-1 > > 2) Sheldon McKay's Gbrowse_syn uses Name(strand)/start-end > > rice-3(+)/16598648-16600199 > > The former breaks fewer things within BioPerl, but the latter seems more > explicit. Any preferences? Do we want a new method that creates this, and > deprecate out simple non-stranded NSE? > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From tuco at pasteur.fr Wed Dec 16 09:14:28 2009 From: tuco at pasteur.fr (Emmanuel Quevillon) Date: Wed, 16 Dec 2009 15:14:28 +0100 Subject: [Bioperl-l] Data missing into Annotation object using Bio::SeqIO (Genbank) Message-ID: <4B28EB44.3080006@pasteur.fr> Hi, I've wrote a small Genbank parser few months ago before BioPerl release 1.6.0. I tried to use my code once again but now the output of my parser is empty. It looks like Annotation from seqfeatures is not filled anymore. Here is the code I used previously: while(my $seq = $streamer->next_seq()){ #We only want to retrieve CDS features... foreach my $feat (grep { $_->primary_tag() eq 'CDS' } $seq->get_SeqFeatures()){ print $ofh join("#", $feat->annotation()->get_Annotations('locus_tag'), # Acc num $feat->annotation()->get_Annotations('gene') ? $feat->annotation()->get_Annotations('gene') # Gene name : $feat->annotation()->get_Annotations('locus_tag'), $feat->annotation()->get_Annotations('product'), # Description ),"\n"; } } $feat is a Bio::SeqFeature::Generic object If I print Dumper($feat->annotation()) here is the output : $VAR1 = bless( { '_typemap' => bless( { '_type' => { 'comment' => 'Bio::Annotation::Comment', 'reference' => 'Bio::Annotation::Reference', 'dblink' => 'Bio::Annotation::DBLink' } }, 'Bio::Annotation::TypeManager' ), '_annotation' => {} }, 'Bio::Annotation::Collection' ); Have some changes been made into the way annotation object is populated? Thanks for any clue and sorry if my question look stupid Regards Emmanuel -- ------------------------- Emmanuel Quevillon Biological Software and Databases Group Institut Pasteur +33 1 44 38 95 98 tuco at_ pasteur dot fr ------------------------- From cjfields at illinois.edu Wed Dec 16 10:09:56 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 16 Dec 2009 09:09:56 -0600 Subject: [Bioperl-l] Data missing into Annotation object using Bio::SeqIO (Genbank) In-Reply-To: <4B28EB44.3080006@pasteur.fr> References: <4B28EB44.3080006@pasteur.fr> Message-ID: <29CB0088-99C1-417E-BB3B-56FE7EC135F9@illinois.edu> Emmanuel, The previous behavior in the 1.5.x series was to store feature tags as Bio::Annotation. The problem had been the way this was implemented was considered unsatisfactory for various reasons, so we reverted back to using simple tag-value pairs as the default. You can get at the data this way (from the Feature/Annotation HOWTO): for my $feat_object ($seq_object->get_SeqFeatures) { print "primary tag: ", $feat_object->primary_tag, "\n"; for my $tag ($feat_object->get_all_tags) { print " tag: ", $tag, "\n"; for my $value ($feat_object->get_tag_values($tag)) { print " value: ", $value, "\n"; } } } You can also convert all the tag-value data into a Bio::Annotation::Collection using the Bio::SeqFeature::AnnotationAdaptor, but this is completely optional. chris On Dec 16, 2009, at 8:14 AM, Emmanuel Quevillon wrote: > Hi, > > I've wrote a small Genbank parser few months ago before BioPerl release 1.6.0. > I tried to use my code once again but now the output of my parser is empty. > It looks like Annotation from seqfeatures is not filled anymore. > > Here is the code I used previously: > > while(my $seq = $streamer->next_seq()){ > > #We only want to retrieve CDS features... > foreach my $feat (grep { $_->primary_tag() eq 'CDS' } $seq->get_SeqFeatures()){ > print $ofh join("#", > $feat->annotation()->get_Annotations('locus_tag'), # Acc num > $feat->annotation()->get_Annotations('gene') > ? $feat->annotation()->get_Annotations('gene') # Gene name > : $feat->annotation()->get_Annotations('locus_tag'), > $feat->annotation()->get_Annotations('product'), # Description > ),"\n"; > } > } > > $feat is a Bio::SeqFeature::Generic object > > If I print Dumper($feat->annotation()) here is the output : > > $VAR1 = bless( { > '_typemap' => bless( { > '_type' => { > 'comment' => 'Bio::Annotation::Comment', > 'reference' => 'Bio::Annotation::Reference', > 'dblink' => 'Bio::Annotation::DBLink' > } > }, 'Bio::Annotation::TypeManager' ), > '_annotation' => {} > }, 'Bio::Annotation::Collection' ); > > Have some changes been made into the way annotation object is populated? > > Thanks for any clue and sorry if my question look stupid > > Regards > > Emmanuel > > -- > ------------------------- > Emmanuel Quevillon > Biological Software and Databases Group > Institut Pasteur > +33 1 44 38 95 98 > tuco at_ pasteur dot fr > ------------------------- > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From tuco at pasteur.fr Wed Dec 16 10:37:45 2009 From: tuco at pasteur.fr (Emmanuel Quevillon) Date: Wed, 16 Dec 2009 16:37:45 +0100 Subject: [Bioperl-l] Data missing into Annotation object using Bio::SeqIO (Genbank) In-Reply-To: <29CB0088-99C1-417E-BB3B-56FE7EC135F9@illinois.edu> References: <4B28EB44.3080006@pasteur.fr> <29CB0088-99C1-417E-BB3B-56FE7EC135F9@illinois.edu> Message-ID: <4B28FEC9.1080509@pasteur.fr> On 12/16/2009 04:09 PM, Chris Fields wrote: > Emmanuel, > > The previous behavior in the 1.5.x series was to store feature tags as Bio::Annotation. The problem had been the way this was implemented was considered unsatisfactory for various reasons, so we reverted back to using simple tag-value pairs as the default. You can get at the data this way (from the Feature/Annotation HOWTO): > > for my $feat_object ($seq_object->get_SeqFeatures) { > print "primary tag: ", $feat_object->primary_tag, "\n"; > for my $tag ($feat_object->get_all_tags) { > print " tag: ", $tag, "\n"; > for my $value ($feat_object->get_tag_values($tag)) { > print " value: ", $value, "\n"; > } > } > } > > You can also convert all the tag-value data into a Bio::Annotation::Collection using the Bio::SeqFeature::AnnotationAdaptor, but this is completely optional. > > chris > > Hi Chris Thanks for the infos. I indeed revert back to using $feat->get_tag_values() and it works as previously. For my small problem I can keep this solution which far adapted for my problem. Regards Emmanuel -- ------------------------- Emmanuel Quevillon Biological Software and Databases Group Institut Pasteur +33 1 44 38 95 98 tuco at_ pasteur dot fr ------------------------- From sung at bio.cc Wed Dec 16 12:55:16 2009 From: sung at bio.cc (Sungsam Gong) Date: Wed, 16 Dec 2009 17:55:16 +0000 Subject: [Bioperl-l] pdb.pm and annotations Message-ID: <2dade3480912160955h4f77277dv8e6b47b7b0fda23a@mail.gmail.com> Hi, Wanted to get pubmed identifier from a PDB file using Bio::Structure, so hacked the code. Knew that Bio::Structure::IO::pdb.pm get relevant info from either 'JRNL' or 'REMARK 1'. However could not see any actual code parsing 'PMID'. >From pdb.pm, what I see: sub _read_PDB_jrnl { ... $auth = $self->_concatenate_lines($auth,$rol) if ($subr eq "AUTH"); $titl = $self->_concatenate_lines($titl,$rol) if ($subr eq "TITL"); $edit = $self->_concatenate_lines($edit,$rol) if ($subr eq "EDIT"); $ref = $self->_concatenate_lines($ref ,$rol) if ($subr eq "REF"); $publ = $self->_concatenate_lines($publ,$rol) if ($subr eq "PUBL"); $refn = $self->_concatenate_lines($refn,$rol) if ($subr eq "REFN"); ... } sub _read_PDB_remark_1 { ... $auth = $self->_concatenate_lines($auth,$rol) if ($subr eq "AUTH"); $titl = $self->_concatenate_lines($titl,$rol) if ($subr eq "TITL"); $edit = $self->_concatenate_lines($edit,$rol) if ($subr eq "EDIT"); $ref = $self->_concatenate_lines($ref ,$rol) if ($subr eq "REF"); $publ = $self->_concatenate_lines($publ,$rol) if ($subr eq "PUBL"); $refn = $self->_concatenate_lines($refn,$rol) if ($subr eq "REFN"); ... } >From my script, I did: ($struc->annotation->get_Annotations('reference'))[0]->authors ($struc->annotation->get_Annotations('reference'))[0]->title or my $hash_ref=($struc->annotation->get_Annotations('reference'))[0]->hash_tree for my $key (keys %{$hash_ref}) { print $key,": ",$hash_ref->{$key},"\n"; } Any plan to include a code chopping 'PMID' out? Or did I miss something? Cheers, Sung From nml5566 at gmail.com Wed Dec 16 14:42:57 2009 From: nml5566 at gmail.com (Nathan Liles) Date: Wed, 16 Dec 2009 13:42:57 -0600 Subject: [Bioperl-l] Bio::Ontology::OBOEngine for parsing obo files? In-Reply-To: References: <81a20b1e0912151337q786b6c35se18328173ec27abd@mail.gmail.com> Message-ID: <81a20b1e0912161142m77051529se59b4621a0add13b@mail.gmail.com> Actually, yes I did find that and it works very well. Now I'm wondering, is it possible to search for similar terms using a string instead of a Bio::Ontology term object? For examle, I'd like to search for the synonym: "transcription start site" and have it return all similar terms. But, it throws an error if I pass in a simple query like that. -Nathan On Tue, Dec 15, 2009 at 4:05 PM, Hilmar Lapp wrote: > That shouldn't happen I suppose, but you're not supposed really to use the > engine directly. Rather it will be used as a backing parser by the > Bio::OntologyIO parser you choose. Have you tried that route and found it > not to work? > > -hilmar > > > On Dec 15, 2009, at 4:37 PM, Nathan Liles wrote: > > Is the Bio::Ontology::OBOEngine module working or being currently >> maintained? I tried following the documentation in the module: >> >> * use Bio::Ontology::OBOEngine; >> >> my $parser = Bio::Ontology::OBOEngine->new >> ( -file => "gene_ontology.obo" ); >> >> my $engine = $parser->parse(); >> >> *But, it throws an error when I run the file saying 'Can't locate object >> method "parse" '. Does anyone have any experience getting this module >> working; or, is there any alternative bioperl module to extract terms and >> relationships out of sequence ontology files? >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > =========================================================== > > > > > From cjfields1 at gmail.com Wed Dec 16 19:53:50 2009 From: cjfields1 at gmail.com (Chris Fields) Date: Wed, 16 Dec 2009 16:53:50 -0800 (PST) Subject: [Bioperl-l] Test post from Google Groups Message-ID: Howdy from Google Groups From cjfields1 at gmail.com Wed Dec 16 20:01:38 2009 From: cjfields1 at gmail.com (Chris Fields) Date: Wed, 16 Dec 2009 17:01:38 -0800 (PST) Subject: [Bioperl-l] bioperl-l Google Groups mirror Message-ID: <87fe54c9-766e-458a-bc09-0a658d050453@m3g2000yqf.googlegroups.com> I would like to announce (with the tremendous help of Hilmar Lapp) the creation of a mirror for the BioPerl mail list, if the last post didn't already give it away. http://groups.google.com/group/bioperl-l One can join the group and submit posts via the Google Groups web interface or via email. Have fun! chris From ocarnorsk138 at gmail.com Wed Dec 16 20:12:21 2009 From: ocarnorsk138 at gmail.com (Ocar Campos) Date: Wed, 16 Dec 2009 17:12:21 -0800 (PST) Subject: [Bioperl-l] Test post from Google Groups In-Reply-To: References: Message-ID: <03416808-ec4b-44b3-8269-6743a26b5368@k4g2000yqb.googlegroups.com> testing back from google group! On Dec 16, 9:53?pm, Chris Fields wrote: > Howdy from Google Groups > _______________________________________________ > Bioperl-l mailing list > Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l From p.j.a.cock at googlemail.com Thu Dec 17 05:50:23 2009 From: p.j.a.cock at googlemail.com (Peter) Date: Thu, 17 Dec 2009 02:50:23 -0800 (PST) Subject: [Bioperl-l] bioperl-l Google Groups mirror In-Reply-To: <87fe54c9-766e-458a-bc09-0a658d050453@m3g2000yqf.googlegroups.com> References: <87fe54c9-766e-458a-bc09-0a658d050453@m3g2000yqf.googlegroups.com> Message-ID: On Dec 17, 1:01?am, Chris Fields wrote: > I would like to announce (with the tremendous help of Hilmar Lapp) the > creation of a mirror for the BioPerl mail list, if the last post > didn't already give it away. > > http://groups.google.com/group/bioperl-l > > One can join the group and submit posts via the Google Groups web > interface or via email. ?Have fun! > > chris Sounds particularly good in the long run (once there is enough of an archive on Google Groups to make searching there useful). Does this mean a Google Groups user doesn't have to be subscribed to the mailing list to post (since the mailing list normally only allows subscribers to post)? Peter From David.Messina at sbc.su.se Thu Dec 17 07:25:49 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 17 Dec 2009 13:25:49 +0100 Subject: [Bioperl-l] bioperl-l Google Groups mirror In-Reply-To: References: <87fe54c9-766e-458a-bc09-0a658d050453@m3g2000yqf.googlegroups.com> Message-ID: <1D13A126-0A51-4815-89D6-664AC062C2AD@sbc.su.se> Very nice, Chris and Hilmar! That'll be great. > Does this mean a Google Groups user doesn't have to be subscribed > to the mailing list to post (since the mailing list normally only > allows subscribers to post)? I think that's right. From the Google groups page: > You can join (and post to) the list either here through Google Groups, or at the BioPerl-l mailing list home, using the web-interface or email, respectively. Dave From cjfields at illinois.edu Thu Dec 17 08:21:46 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 17 Dec 2009 07:21:46 -0600 Subject: [Bioperl-l] bioperl-l Google Groups mirror In-Reply-To: <1D13A126-0A51-4815-89D6-664AC062C2AD@sbc.su.se> References: <87fe54c9-766e-458a-bc09-0a658d050453@m3g2000yqf.googlegroups.com> <1D13A126-0A51-4815-89D6-664AC062C2AD@sbc.su.se> Message-ID: <209F1321-37DD-4B6C-A153-8A5AA0EF3E0A@illinois.edu> On Dec 17, 2009, at 6:25 AM, Dave Messina wrote: > Very nice, Chris and Hilmar! That'll be great. > > > >> Does this mean a Google Groups user doesn't have to be subscribed >> to the mailing list to post (since the mailing list normally only >> allows subscribers to post)? > > > I think that's right. From the Google groups page: > >> You can join (and post to) the list either here through Google Groups, or at the BioPerl-l mailing list home, using the web-interface or email, respectively. > > > > > Dave It is moderated by user to deal with spam. Hilmar's already a manager/co-owner, and either of us can add more as needed. chris From hlapp at drycafe.net Thu Dec 17 09:52:33 2009 From: hlapp at drycafe.net (Hilmar Lapp) Date: Thu, 17 Dec 2009 09:52:33 -0500 Subject: [Bioperl-l] bioperl-l Google Groups mirror In-Reply-To: References: <87fe54c9-766e-458a-bc09-0a658d050453@m3g2000yqf.googlegroups.com> Message-ID: <56214506-9BE7-4761-9E87-3A43D3707A29@drycafe.net> On Dec 17, 2009, at 5:50 AM, Peter wrote: > Does this mean a Google Groups user doesn't have to be subscribed > to the mailing list to post Yes. They can post through the Google Groups web interface. The email address for mirrored groups is the one of the list being mirrored though, bioperl-l at bioperl.org in this case, and so in order to post by email you still have to be subscribed at the bioperl-l list. At least that's what the docs at Google say. I haven't tried yet posting to the group at the bioperl-l at googlegroups dot com email under an email address that isn't subscribed to bioperl-l at bioperl dot org. Maybe it actually would work, contrary to docs. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From jay at jays.net Thu Dec 17 12:05:24 2009 From: jay at jays.net (Jay Hannah) Date: Thu, 17 Dec 2009 11:05:24 -0600 Subject: [Bioperl-l] bioperl-l Google Groups mirror In-Reply-To: <56214506-9BE7-4761-9E87-3A43D3707A29@drycafe.net> References: <87fe54c9-766e-458a-bc09-0a658d050453@m3g2000yqf.googlegroups.com> <56214506-9BE7-4761-9E87-3A43D3707A29@drycafe.net> Message-ID: <9BDF08A3-67E0-4F5E-8429-11AE586F6504@jays.net> On Dec 17, 2009, at 8:52 AM, Hilmar Lapp wrote: > I haven't tried yet posting to the group at the bioperl-l at googlegroups dot com email under an email address that isn't subscribed to bioperl-l at bioperl dot org. Maybe it actually would work, contrary to docs. In my experience (and ignoring a brief glitch this summer) moderation of new members works great. Almost zero spam gets through. Not as convenient for the admin as MailMan self-service email verification, but perhaps easier for some users and not too much admin work if you don't have too many new legitimate members every month. Here is the configuration set I recommend: http://clab.ist.unomaha.edu/~jhannah/tmp/google_groups.png Your membership roles will end up with quite a few junk accounts, but those bots can't post, so it's not that big a deal. I purge mine manually once a year or so. HTH, j http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah From robert.bradbury at gmail.com Thu Dec 17 14:42:54 2009 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Thu, 17 Dec 2009 14:42:54 -0500 Subject: [Bioperl-l] Remote blast fork errors / Process limit restrictions In-Reply-To: <39420695-FC14-446F-A4C2-C187E2221E44@bioperl.org> References: <39420695-FC14-446F-A4C2-C187E2221E44@bioperl.org> Message-ID: Just to close out the issue of bioperl forking (in particular accesses to external databases through get_sequence) which involves individual database sub-modules and not collecting its children. As it turns out the code does do an explicit fork, it looks like so the child process can read from the database while the parent process manipulates the data as it becomes available. Now, one could argue that a threaded model might be better since now threads are fairly standard OS tools in current environments. But I couldn't find any functions which actually wait for the forked process (presumably because they are created for "future" use). But nor is there any indication in the pages I've found in most of the documentation (which is spread across the web) or Wiki that explain that "creating child processes" is how these functions work and one *needs* to collect those children after each use or else zombie processes will accumulate, which on "reasonable" systems with per-user process limits will create problems for proper program functioning. Nor (it would appear) does the parent process setup a SIGCHLD "catcher" which could collect the processes once they exit (which I expect in the case of "get_sequence" would be after closing of the socket which actually fetched the sequence from Genbank. It can be resolved easily enough by adding a call after each use of these functions: $kid = waitpid(-1, WNOHANG); But typically, as a programmer, I should not be responsible for having to clean up the leftovers of library calls (unless said cleanup requirements are clearly documented). But to a "newbie" using the functions, coming from a functional background (C), not an OO background (which at least I would tend to view as a wart on the otherwise robust Perl language), there are two problems 1. The lack of documentation and examples explaining how the functions work and how they must be handled at a higher level (by executing explicit wait system calls). 2. The lack of code in the BioPerl functions to deal with the forked processes which they create. Functional programmers have a perspective -- if you create it -- you have to clean it up. It would appear that in the transition to OO programming (or perhaps simply for expediency) that detail was left out of both (either/and) the documentation and the code. From this standpoint one could view garbage collectors as being fundamentally evil -- because they gloss over the fact that programmers should know what they are doing and when they are doing it. So, everywhere in the documentation where there is a get_sequence call (or anything which accesses an external database which causes a fork to occur) there should be a modification as I have outlined above -- or else the code should be corrected so orphaned children are always collected and not allowed to accumulate. From robert.bradbury at gmail.com Thu Dec 17 15:23:38 2009 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Thu, 17 Dec 2009 15:23:38 -0500 Subject: [Bioperl-l] Remote blast fork errors / Process limit restrictions In-Reply-To: References: <39420695-FC14-446F-A4C2-C187E2221E44@bioperl.org> Message-ID: Oh, yes, in case it was not clear, the fork calls which fails is in DB/WebDBSeqI.pm: line 722 defined(my $pid = fork) or $self->throw("'Couldn't fork: $!"); And of course that is because Linux has reached the process limits for the user (due to accumulated background processes which are uncollected). And they could be resolved by simply executing a simple waitpid call for prior orphaned children before forking [1] But such a succinct solution would violate "functional" programming rules -- clean up what you create -- instead they would tend to fall into the OO camp -- "Oh don't worry the garbage collector will take care of it". Green programming is a little less cavalier. Robert 1. IMO, a very very real problem with programming today is that there is no connection between programmers and the cost of their programs. How many programmers know the instruction cycle time of their computers, what does an instruction cost in terms of W consumed, W wasted (heat generation), fruitless scanning over uncollected zombie processes, etc. It may be that only that programmers who grew up in the era when CPU cycles were expensive (300 ns/cycle) who know what each instruction required in terms of cycles consider these perspectives. Now things (cpu use, processor use, etc) tend to be swept under the rug and it appears that that is the case with the standard implementation of bioper. The documentation does not clearly state that additional sub-processes may be created and need to be collected. You are providing a utility that only works "this much". And guess what -- I happen to have run into the "this". From cjfields at illinois.edu Thu Dec 17 15:25:56 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 17 Dec 2009 14:25:56 -0600 Subject: [Bioperl-l] Remote blast fork errors / Process limit restrictions In-Reply-To: References: <39420695-FC14-446F-A4C2-C187E2221E44@bioperl.org> Message-ID: Robert, I have previously outlined specifically why you are seeing the fork issue, and a possible solution. IIRC it primarily has to do with you trying to do something more advanced using the (very basic) Bio::Perl procedural interface, something along the lines of pulling a sequence and using RemoteBlast. Retrieving a sequence from a remote database is a forked process on most OS's (I think Win is the sole exception) and occurs internally in Bio::Perl via Bio::DB::GenBank. Setting up your own pipeline, using Bio::DB::GenBank (set to use temp files), followed by Bio::Tools::Run::RemoteBlast or Bio::Perl, are options in the meantime. Trying to catch signals can be notoriously flaky cross-platform and cross perl versions; I recall running into problems with CygWin and OS X. We can modify Bio::Perl to use a temp file instead, which avoids the whole use of forks altogether, and is probably the best long-term solution. My last bit: I don't usually say this, primarily b/c it's misconstrued by some, but 'patches are always welcome'. What doesn't work is just telling us to arbitrarily change code w/o indicating exactly where to do so. The tone you use, which comes off a tad condescending, can be abrasive and may not garner any response (or at least will get you one you don't expect). Please keep that in mind. chris On Dec 17, 2009, at 1:42 PM, Robert Bradbury wrote: > Just to close out the issue of bioperl forking (in particular accesses to > external databases through get_sequence) which involves individual database > sub-modules and not collecting its children. > > As it turns out the code does do an explicit fork, it looks like so the > child process can read from the database while the parent process > manipulates the data as it becomes available. Now, one could argue that a > threaded model might be better since now threads are fairly standard OS > tools in current environments. > > But I couldn't find any functions which actually wait for the forked process > (presumably because they are created for "future" use). But nor is there > any indication in the pages I've found in most of the documentation (which > is spread across the web) or Wiki that explain that "creating child > processes" is how these functions work and one *needs* to collect those > children after each use or else zombie processes will accumulate, which on > "reasonable" systems with per-user process limits will create problems for > proper program functioning. Nor (it would appear) does the parent process > setup a SIGCHLD "catcher" which could collect the processes once they exit > (which I expect in the case of "get_sequence" would be after closing of the > socket which actually fetched the sequence from Genbank. > > It can be resolved easily enough by adding a call after each use of these > functions: > $kid = waitpid(-1, WNOHANG); > But typically, as a programmer, I should not be responsible for having to > clean up the leftovers of library calls (unless said cleanup requirements > are clearly documented). > > > But to a "newbie" using the functions, coming from a functional background > (C), not an OO background (which at least I would tend to view as a wart on > the otherwise robust Perl language), there are two problems > 1. The lack of documentation and examples explaining how the functions work > and how they must be handled at a higher level (by executing explicit wait > system calls). > 2. The lack of code in the BioPerl functions to deal with the forked > processes which they create. Functional programmers have a perspective -- > if you create it -- you have to clean it up. It would appear that in the > transition to OO programming (or perhaps simply for expediency) that detail > was left out of both (either/and) the documentation and the code. From this > standpoint one could view garbage collectors as being fundamentally evil -- > because they gloss over the fact that programmers should know what they are > doing and when they are doing it. > > So, everywhere in the documentation where there is a get_sequence call (or > anything which accesses an external database which causes a fork to occur) > there should be a modification as I have outlined above -- or else the code > should be corrected so orphaned children are always collected and not > allowed to accumulate. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu Dec 17 15:29:10 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 17 Dec 2009 14:29:10 -0600 Subject: [Bioperl-l] Remote blast fork errors / Process limit restrictions In-Reply-To: References: <39420695-FC14-446F-A4C2-C187E2221E44@bioperl.org> Message-ID: On Dec 17, 2009, at 2:23 PM, Robert Bradbury wrote: > Oh, yes, in case it was not clear, the fork calls which fails is in > DB/WebDBSeqI.pm: line 722 > defined(my $pid = fork) > or $self->throw("'Couldn't fork: $!"); Okay, that's a bit more helpful. > And of course that is because Linux has reached the process limits for the > user (due to accumulated background processes which are uncollected). Right, but again, we need to check this in a cross-platform compatible way. > And they could be resolved by simply executing a simple waitpid call for > prior orphaned children before forking [1] But such a succinct solution > would violate "functional" programming rules -- clean up what you create -- > instead they would tend to fall into the OO camp -- "Oh don't worry the > garbage collector will take care of it". Green programming is a little less > cavalier. > > Robert > > 1. IMO, a very very real problem with programming today is that there is no > connection between programmers and the cost of their programs. How many > programmers know the instruction cycle time of their computers, what does an > instruction cost in terms of W consumed, W wasted (heat generation), > fruitless scanning over uncollected zombie processes, etc. It may be that > only that programmers who grew up in the era when CPU cycles were expensive > (300 ns/cycle) who know what each instruction required in terms of cycles > consider these perspectives. Now things (cpu use, processor use, etc) tend > to be swept under the rug and it appears that that is the case with the > standard implementation of bioper. The documentation does not clearly state > that additional sub-processes may be created and need to be collected. You > are providing a utility that only works "this much". And guess what -- I > happen to have run into the "this". Um, yeah. Okay. chris From robfsouza at gmail.com Fri Dec 18 13:07:34 2009 From: robfsouza at gmail.com (Robson Francisco de Souza) Date: Fri, 18 Dec 2009 13:07:34 -0500 Subject: [Bioperl-l] Fwd: blast.pm patch In-Reply-To: References: Message-ID: Hi, I've been dealing with an apparent bug in the output of NCBI's BLAST programs (blastall, blastpgp) which sometimes produces output like the one below. I think I've managed to produce a work around for Bioperl blast.pm parser and would like to contribute it to Bioperl. The fix is based on blast.pm from the CVS tree (downloaded some months ago...) and is attached to this message. Best, Robson PS: what happened to the bioperl-bugs mailing list? It does not seem to be working... >gi|156552846|ref|XP_001600053.1| PREDICTED: similar to conserved ? ? ? ? ? hypothetical protein [Nasonia vitripennis] ? ? ? ? ?Length = 1774 ?Score = 75.9 bits (185), Expect = 1e-11, ? Method: Compositional matrix adjust. ?Identities = 85/393 (21%), Positives = 175/393 (44%), Gaps = 28/393 (7%) Query: 0 ? - Sbjct: 328 P ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328 Query: 0 Sbjct: 328 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328 Query: 0 Sbjct: 328 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328 Query: 0 Sbjct: 328 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328 Query: 0 Sbjct: 328 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328 Query: 0 Sbjct: 328 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328 Query: 0 Sbjct: 328 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328 Query: 0 Sbjct: 328 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328 Query: 0 Sbjct: 328 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328 Query: 0 Sbjct: 328 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328 Query: 612 VPPPPGSGIPMPPGGGFFGMKTKLP-----KLPELKATKDTKKIHIAG ? ? ? ? ? ? 654 ? ? ? ? ? ?P PP + ? + P ? ? ? KTK+ ? ? ?K+P ?K ? ? ? ? + Sbjct: 329 TPEPPNNSAKLLPQQEIPTPKTKMKTINWNKIPNHKVIGKRNIWSLVA ? ? ? ? ? ? 376 Query: 655 DKINNKDIEGTGWMSILEENAEKMSKIFDKN-LFENNFQKKETRDAPSQEKENVPTLVSF 713 ? ? ? ? ? ++ ?N ?+ ? ?W ?+ ? ? +++ ?+ ? N ? ?NN ? ? ? D ? +E ? ?PT ++ Sbjct: 377 NEHQNSPMADLDWAEMEGLFCQQVPPMIPANTTCSNNLGNGVDTDKRRRE----PTEIAL 432 Query: 714 LDSKTSYQLALLLGFLKKNEREIRKHVIDLNEKELQKQTIHSLKDLCPEEDKFKEIESFV 773 ? ? ? ? ? LD K S ?+ + L ? + + ?+I + + D ? ?++ ?+ + ?L ?+ PE D+ + ++SF Sbjct: 433 LDGKRSLNVNIFLKQFRSSNEDIIQLIKDGGHDDIGAEKLRGLLKILPEVDELEMLKSF- 491 Query: 774 QKGDGYLEQLEPGDKLFYAMKDIPRLKQRFTAWSSQIYFEGSVISVEPDIESLNRACKNI 833 ? ? ? ? ? ? ?DG ? +L ? +K F ?+ ?+P ?K R ? ? ?+ ?F ?++ ?+EP I S+ ?A +++ Sbjct: 492 ---DGDKLKLGNAEKFFLQLIQVPNYKLRIECMLLKEEFAANMSYLEPSINSMILAGEDL 548 Query: 834 VQCKSLQRLMTLIVLLVNFLNKAKTDKDRVYGFKLNFLTKLGDIKSSSDPNRSMMNYLCE 893 ? ? ? ? ? + ?KSLQ ++ ++++ ?NFLN ? ? ?+ ? G KL+ L KL +I++ ? ?N+ ?MN L Sbjct: 549 MTNKSLQEVLYMVLVAGNFLNSGGYAGN-AAGVKLSSLQKLTEIRA----NKPGMN-LIH 602 Query: 894 FLLAKDDKLIPELLKELK--DYAEVGSRIELPELKKEIGKLNESLKVIQTELEFYKKEQK 951 ? ? ? ? ? ++ ?+ ++ ? +LL ? + ?+ ?+ ?++ ?+ +L ?E ? L+ ?+K I+++++ ? ?E Sbjct: 603 YVAMQAERKRKDLLNFARGMNALDSATKTTVEQLTNEFNALDTRIKKIRSQIQLPTTEA- 661 Query: 952 FINDKFPKQLDEFYQYAKSEMQKINKAQEKLEKILKEVAKFFGE 995 ? ? ? ? ? ? ? ? ?+Q+ +F Q A+ EM ++ + ?E+L+ + + +A+FF E Sbjct: 662 ----DIQEQMAQFLQMAEQEMSQLKRDMEELDGVRRTLAEFFCE 701 -------------- next part -------------- A non-text attachment was scrubbed... Name: blast_patched.pm Type: application/octet-stream Size: 91820 bytes Desc: not available URL: From cjfields at illinois.edu Fri Dec 18 13:33:44 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 18 Dec 2009 12:33:44 -0600 Subject: [Bioperl-l] Fwd: blast.pm patch In-Reply-To: References: Message-ID: Robson, Any chance you could check this against SVN? We haven't used the CVS tree for a few years (had a number of releases along the way as well). Not sure about bioperl-bugs, we have bugzilla still running though: http://bugzilla.open-bio.org/ chris On Dec 18, 2009, at 12:07 PM, Robson Francisco de Souza wrote: > Hi, > > I've been dealing with an apparent bug in the output of NCBI's BLAST > programs (blastall, blastpgp) which sometimes produces output like the > one below. > I think I've managed to produce a work around for Bioperl blast.pm > parser and would like to contribute it to Bioperl. > The fix is based on blast.pm from the CVS tree (downloaded some months > ago...) and is attached to this message. > Best, > Robson > > PS: what happened to the bioperl-bugs mailing list? It does not seem > to be working... > >> gi|156552846|ref|XP_001600053.1| PREDICTED: similar to conserved > hypothetical protein [Nasonia vitripennis] > Length = 1774 > > Score = 75.9 bits (185), Expect = 1e-11, Method: Compositional matrix adjust. > Identities = 85/393 (21%), Positives = 175/393 (44%), Gaps = 28/393 (7%) > > Query: 0 - > > Sbjct: 328 P 328 > > Query: 0 > > Sbjct: 328 328 > > Query: 0 > > Sbjct: 328 328 > > Query: 0 > > Sbjct: 328 328 > > Query: 0 > > Sbjct: 328 328 > > Query: 0 > > Sbjct: 328 328 > > Query: 0 > > Sbjct: 328 328 > > Query: 0 > > Sbjct: 328 328 > > Query: 0 > > Sbjct: 328 328 > > Query: 0 > > Sbjct: 328 328 > > Query: 612 VPPPPGSGIPMPPGGGFFGMKTKLP-----KLPELKATKDTKKIHIAG 654 > P PP + + P KTK+ K+P K + > Sbjct: 329 TPEPPNNSAKLLPQQEIPTPKTKMKTINWNKIPNHKVIGKRNIWSLVA 376 > > Query: 655 DKINNKDIEGTGWMSILEENAEKMSKIFDKN-LFENNFQKKETRDAPSQEKENVPTLVSF 713 > ++ N + W + +++ + N NN D +E PT ++ > Sbjct: 377 NEHQNSPMADLDWAEMEGLFCQQVPPMIPANTTCSNNLGNGVDTDKRRRE----PTEIAL 432 > > Query: 714 LDSKTSYQLALLLGFLKKNEREIRKHVIDLNEKELQKQTIHSLKDLCPEEDKFKEIESFV 773 > LD K S + + L + + +I + + D ++ + + L + PE D+ + ++SF > Sbjct: 433 LDGKRSLNVNIFLKQFRSSNEDIIQLIKDGGHDDIGAEKLRGLLKILPEVDELEMLKSF- 491 > > Query: 774 QKGDGYLEQLEPGDKLFYAMKDIPRLKQRFTAWSSQIYFEGSVISVEPDIESLNRACKNI 833 > DG +L +K F + +P K R + F ++ +EP I S+ A +++ > Sbjct: 492 ---DGDKLKLGNAEKFFLQLIQVPNYKLRIECMLLKEEFAANMSYLEPSINSMILAGEDL 548 > > Query: 834 VQCKSLQRLMTLIVLLVNFLNKAKTDKDRVYGFKLNFLTKLGDIKSSSDPNRSMMNYLCE 893 > + KSLQ ++ ++++ NFLN + G KL+ L KL +I++ N+ MN L > Sbjct: 549 MTNKSLQEVLYMVLVAGNFLNSGGYAGN-AAGVKLSSLQKLTEIRA----NKPGMN-LIH 602 > > Query: 894 FLLAKDDKLIPELLKELK--DYAEVGSRIELPELKKEIGKLNESLKVIQTELEFYKKEQK 951 > ++ + ++ +LL + + + ++ + +L E L+ +K I+++++ E > Sbjct: 603 YVAMQAERKRKDLLNFARGMNALDSATKTTVEQLTNEFNALDTRIKKIRSQIQLPTTEA- 661 > > Query: 952 FINDKFPKQLDEFYQYAKSEMQKINKAQEKLEKILKEVAKFFGE 995 > +Q+ +F Q A+ EM ++ + E+L+ + + +A+FF E > Sbjct: 662 ----DIQEQMAQFLQMAEQEMSQLKRDMEELDGVRRTLAEFFCE 701 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From biopython at maubp.freeserve.co.uk Fri Dec 18 18:00:47 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 18 Dec 2009 23:00:47 +0000 Subject: [Bioperl-l] Fwd: blast.pm patch In-Reply-To: References: Message-ID: <320fb6e00912181500r53c93284yc526ce654ca9050@mail.gmail.com> On Fri, Dec 18, 2009 at 6:07 PM, Robson Francisco de Souza wrote: > Hi, > > I've been dealing with an apparent bug in the output of NCBI's BLAST > programs (blastall, blastpgp) which sometimes produces output like the > one below. > I think I've managed to produce a work around for Bioperl blast.pm > parser and would like to contribute it to Bioperl. > The fix is based on blast.pm from the CVS tree (downloaded some months > ago...) and is attached to this message. > Best, > Robson Do you have a complete example of this kind of funny output? This problem has also been reported with blastpgp for the Biopython parser. I'd love an example for our unit tests (probably worth doing in BioPerl too). Could you upload a test case here?: http://bugzilla.open-bio.org/show_bug.cgi?id=2927 Thanks! Peter @ Biopython From biopython at maubp.freeserve.co.uk Sat Dec 19 06:19:53 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 19 Dec 2009 11:19:53 +0000 Subject: [Bioperl-l] Fwd: blast.pm patch In-Reply-To: References: <320fb6e00912181500r53c93284yc526ce654ca9050@mail.gmail.com> Message-ID: <320fb6e00912190319s75a0eb75m94dfbd7946a310e5@mail.gmail.com> On Sat, Dec 19, 2009 at 11:06 AM, Robson Francisco de Souza wrote: > > Hi Peter, > > I just upload my example. I also reported this bug to the NCBI > developers and I hope they can fix it, since it is easy to reproduce. > I just forgot to mention the blastpgp version: 2.2.18 > Best, > Robson Thank you, Peter From maj at fortinbras.us Sat Dec 19 14:52:45 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sat, 19 Dec 2009 14:52:45 -0500 Subject: [Bioperl-l] NCBI BlastPlus wrapper for your enjoyment Message-ID: Hi All, Your full-service BLAST wrapper, Bio::Tools::Run::StandAloneBlastPlus, is at beta in the bioperl-run trunk. It wraps all the programs of the NCBI's new blast+-2.2.22 suite ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ and integrates them, allowing you to create, mask, and query databases from within a single factory object. See the HOWTO http://www.bioperl.org/wiki/HOWTO:BlastPlus for the usual usage and implementation details. Happy coding-- MAJ From David.Messina at sbc.su.se Sat Dec 19 15:34:10 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Sat, 19 Dec 2009 21:34:10 +0100 Subject: [Bioperl-l] NCBI BlastPlus wrapper for your enjoyment In-Reply-To: References: Message-ID: <8F67673F-E71E-46A1-BD7C-6465C4D13398@sbc.su.se> Sweet! Thanks, Mark. Dave From cjfields at illinois.edu Sat Dec 19 17:44:46 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 19 Dec 2009 16:44:46 -0600 Subject: [Bioperl-l] NCBI BlastPlus wrapper for your enjoyment In-Reply-To: References: Message-ID: <3DC558C9-DD64-45F9-8A6F-EA4238D22EA5@illinois.edu> Very nice! We'll definitely give it a try here (along with the requisite feedback, of course). chris On Dec 19, 2009, at 1:52 PM, Mark A. Jensen wrote: > Hi All, > > Your full-service BLAST wrapper, Bio::Tools::Run::StandAloneBlastPlus, > is at beta in the bioperl-run trunk. It wraps all the programs of the > NCBI's new blast+-2.2.22 suite > ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ > and integrates them, allowing you to create, mask, and query > databases from within a single factory object. See the HOWTO > http://www.bioperl.org/wiki/HOWTO:BlastPlus > for the usual usage and implementation details. > > Happy coding-- > MAJ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Sat Dec 19 23:59:38 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 19 Dec 2009 22:59:38 -0600 Subject: [Bioperl-l] LocatableSeq NSE format - suggested changes In-Reply-To: <6723123C0ABD447190639AE1F5D1A6A7@NewLife> References: <579AEE4E-A79C-496B-881C-226616BF0A2B@illinois.edu> <6723123C0ABD447190639AE1F5D1A6A7@NewLife> Message-ID: <97DC7C2B-2433-4B8D-A16C-DF0507A29B22@illinois.edu> I think option 1 is cleaner as well; very easily added, so committed to main trunk as I consider this a bug, as one can potentially lose strand information when round-tripping data (original data with a -1 strand would be converted to +1). I'll work out the test fails on trunk along the way (ensure they're due to erroneous test data and not something else). chris On Dec 16, 2009, at 6:51 AM, Mark A. Jensen wrote: > I'm with Dave; option 1 is cleaner. The only problem might be the automatic interpretation of older output as always plus strand, but presumably these would have had to record the strandedness explicitly elsewhere, so they would be updatable. I'm definitely for making strandedness part of the spec in some way. cheers MAJ > ----- Original Message ----- From: "Chris Fields" > To: "BioPerl List" > Sent: Monday, December 14, 2009 8:23 PM > Subject: [Bioperl-l] LocatableSeq NSE format - suggested changes > > >> All, >> >> The current output for NSE format (Name/Start-End) via Bio::LocatableSeq::get_nse() currently doesn't allow for strandedness. I have seen two variations of NSE that incorporate strandedness: >> >> 1) Stockholm Rfam reverses start and end if the strand == -1 >> >> chrY/598-1 >> >> 2) Sheldon McKay's Gbrowse_syn uses Name(strand)/start-end >> >> rice-3(+)/16598648-16600199 >> >> The former breaks fewer things within BioPerl, but the latter seems more explicit. Any preferences? Do we want a new method that creates this, and deprecate out simple non-stranded NSE? >> >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From e.osimo at gmail.com Sun Dec 20 13:19:37 2009 From: e.osimo at gmail.com (Emanuele Osimo) Date: Sun, 20 Dec 2009 19:19:37 +0100 Subject: [Bioperl-l] Bio::Graphics and different Glyph sizes Message-ID: <2ac05d0f0912201019w278c1101q534749dd453fa1d1@mail.gmail.com> Hello everyone, I have a very particular problem: I'd like to draw in a single track different SNPs with a glyph that allows me to see graphically their importance. For example, if I have 10 SNPs 1 to 10 in importance, I'd like to have the first depicted small, and the last one big, with the ones in between with according sizes. I'd be satisfied also with a color gradient. What I cannot do is to set the option -height , for example, instead than in the add_track section, in the Bio::SeqFeature::Generic->new that I use for each of my objects. If I set it in the add_track section, all the glyphs are then of the same size (or color). If, otherwise, I add a different track for each object, my picture becomes too big. Please, help! Thanks Emanuele From ajmackey at gmail.com Sun Dec 20 13:41:14 2009 From: ajmackey at gmail.com (Aaron Mackey) Date: Sun, 20 Dec 2009 13:41:14 -0500 Subject: [Bioperl-l] Bio::Graphics and different Glyph sizes In-Reply-To: <2ac05d0f0912201019w278c1101q534749dd453fa1d1@mail.gmail.com> References: <2ac05d0f0912201019w278c1101q534749dd453fa1d1@mail.gmail.com> Message-ID: <24c96eca0912201041i37c32845k9e261414588b9bf4@mail.gmail.com> You can set the height as a callback sub, rather than a constant -- the callback will get passed the feature about to be drawn, from which you can calculate the "importance", and return the desired height, dynamically. -Aaron On Sun, Dec 20, 2009 at 1:19 PM, Emanuele Osimo wrote: > Hello everyone, > I have a very particular problem: I'd like to draw in a single track > different SNPs with a glyph that allows me to see graphically their > importance. > For example, if I have 10 SNPs 1 to 10 in importance, I'd like to have the > first depicted small, and the last one big, with the ones in between with > according sizes. > I'd be satisfied also with a color gradient. > What I cannot do is to set the option -height , for example, instead than > in > the add_track section, in the Bio::SeqFeature::Generic->new that I use for > each of my objects. > If I set it in the add_track section, all the glyphs are then of the same > size (or color). > If, otherwise, I add a different track for each object, my picture becomes > too big. > > Please, help! > Thanks > Emanuele > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From robfsouza at gmail.com Sat Dec 19 06:06:16 2009 From: robfsouza at gmail.com (Robson Francisco de Souza) Date: Sat, 19 Dec 2009 06:06:16 -0500 Subject: [Bioperl-l] Fwd: blast.pm patch In-Reply-To: <320fb6e00912181500r53c93284yc526ce654ca9050@mail.gmail.com> References: <320fb6e00912181500r53c93284yc526ce654ca9050@mail.gmail.com> Message-ID: Hi Peter, I just upload my example. I also reported this bug to the NCBI developers and I hope they can fix it, since it is easy to reproduce. I just forgot to mention the blastpgp version: 2.2.18 Best, Robson On Fri, Dec 18, 2009 at 6:00 PM, Peter wrote: > On Fri, Dec 18, 2009 at 6:07 PM, Robson Francisco de Souza > wrote: >> Hi, >> >> I've been dealing with an apparent bug in the output of NCBI's BLAST >> programs (blastall, blastpgp) which sometimes produces output like the >> one below. >> I think I've managed to produce a work around for Bioperl blast.pm >> parser and would like to contribute it to Bioperl. >> The fix is based on blast.pm from the CVS tree (downloaded some months >> ago...) and is attached to this message. >> Best, >> Robson > > Do you have a complete example of this kind of funny output? > This problem has also been reported with blastpgp for the > Biopython parser. I'd love an example for our unit tests > (probably worth doing in BioPerl too). Could you upload a > test case here?: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2927 > > Thanks! > > Peter @ Biopython > From biopython at maubp.freeserve.co.uk Mon Dec 21 10:27:47 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 21 Dec 2009 15:27:47 +0000 Subject: [Bioperl-l] Fwd: blast.pm patch In-Reply-To: References: <320fb6e00912181500r53c93284yc526ce654ca9050@mail.gmail.com> Message-ID: <320fb6e00912210727m522d2039if78891ab32fe0983@mail.gmail.com> On Sat, Dec 19, 2009 at 11:06 AM, Robson Francisco de Souza wrote: > > Hi Peter, > > I just upload my example. I also reported this bug to the NCBI > developers and I hope they can fix it, since it is easy to reproduce. > I just forgot to mention the blastpgp version: 2.2.18 > Best, > Robson Hi again Robson, Having a reproducible example to investigate this issue is incredibly helpful - thank you! I've been looking at the output, and while I can make sense of it "by hand", it would be very tricky to try and parse as a special case. It really does look like a bug in BLAST to me. The alignment includes an initial pair, a leading gap in the query (with a coordinate of zero), plus a residue from the match sequence (with a sensible coordinate). The alignment statistics include this (extra) pair in the alignment length. You said you were using blastpgp version 2.2.18, so I tried this with the latest (final?) version of the "legacy" BLAST suite, blastpgp 2.2.22, which I already had installed. It looks like my copy of NR is more recent (bigger), but the same odd output was produced: blastpgp -d nr -i Ngru1000013938.fa -o Ngru1000013938.fa.br -a 8 -j 1 -b 10000 I also tried what I think would be the equivalent command line on the new BLAST+ suite, using psiblast 2.2.22+ like this: psiblast -db nr -query Ngru1000013938.fa -out Ngru1000013938.fa.blast -num_threads 8 -parse_deflines -num_alignments 10000 This was much faster, and seems to output sensible alignments. I might therefore expect the NCBI so say "yes, this is a bug in the old blastpgp tool, just use the new psiblast tool instead". However, fingers crossed they will do another maintenance release of the "legacy" BLAST suite and fix this in blastpgp. Have you had any reply from the NCBI? Admittedly it is almost Christmas/New Year so we may not expect an answer until Jan. Peter From maj at fortinbras.us Mon Dec 21 13:52:01 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 21 Dec 2009 13:52:01 -0500 Subject: [Bioperl-l] test fail Message-ID: <5614E9FF133A47A694EF892D38A1717A@NewLife> fyi, getting following failure (Perl 5.10, GNU/Linux x86_64) t/SeqTools/SeqUtils..........................NOK 46/51# Failed test at t/SeqTools/SeqUtils.t line 275. # got: '1..4' # expected: 'complement(5..8)' t/SeqTools/SeqUtils..........................NOK 47/51# Failed test at t/SeqTools/SeqUtils.t line 276. # got: 'complement(5..8)' # expected: '1..4' # Looks like you failed 2 tests of 51. MAJ From cjfields at illinois.edu Mon Dec 21 14:20:32 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 21 Dec 2009 13:20:32 -0600 Subject: [Bioperl-l] test fail In-Reply-To: <5614E9FF133A47A694EF892D38A1717A@NewLife> References: <5614E9FF133A47A694EF892D38A1717A@NewLife> Message-ID: Saw that from the other day (LocatableSeq commit). I'll check it out. chris On Dec 21, 2009, at 12:52 PM, Mark A. Jensen wrote: > fyi, getting following failure (Perl 5.10, GNU/Linux x86_64) > > t/SeqTools/SeqUtils..........................NOK 46/51# Failed test at t/SeqTools/SeqUtils.t line 275. > # got: '1..4' > # expected: 'complement(5..8)' > > t/SeqTools/SeqUtils..........................NOK 47/51# Failed test at t/SeqTools/SeqUtils.t line 276. > # got: 'complement(5..8)' > # expected: '1..4' > # Looks like you failed 2 tests of 51. > > MAJ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From scott at scottcain.net Mon Dec 21 15:02:20 2009 From: scott at scottcain.net (Scott Cain) Date: Mon, 21 Dec 2009 15:02:20 -0500 Subject: [Bioperl-l] Bio::Graphics documentation Message-ID: <4536f7700912211202j4de81bb4k1e9039ed19b4ef97@mail.gmail.com> Hi All, Today it was pointed out to me that the Bio::Graphics documentation links on the BioPerl wiki are broken, no doubt because Bio::Graphics is no longer part of bioperl-core (is that how it should be referred to?). Anyway, the question is: what is the right way to rectify this problem? Since other things may get broken out in the future, I suppose we should get some sort of standard established. Can a release of Bio::Graphics be placed somewhere on the BioPerl wiki server to be processed? Thanks, Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From cjfields at illinois.edu Mon Dec 21 15:22:39 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 21 Dec 2009 14:22:39 -0600 Subject: [Bioperl-l] Bio::Graphics documentation In-Reply-To: <4536f7700912211202j4de81bb4k1e9039ed19b4ef97@mail.gmail.com> References: <4536f7700912211202j4de81bb4k1e9039ed19b4ef97@mail.gmail.com> Message-ID: <6FC2F08B-E902-449A-9E67-D1417A0BE20C@illinois.edu> We can come up with some standard wiki template for those modules no longer in svn, maybe with just CPAN links. Shouldn't be too hard to do. chris On Dec 21, 2009, at 2:02 PM, Scott Cain wrote: > Hi All, > > Today it was pointed out to me that the Bio::Graphics documentation > links on the BioPerl wiki are broken, no doubt because Bio::Graphics > is no longer part of bioperl-core (is that how it should be referred > to?). Anyway, the question is: what is the right way to rectify this > problem? Since other things may get broken out in the future, I > suppose we should get some sort of standard established. Can a > release of Bio::Graphics be placed somewhere on the BioPerl wiki > server to be processed? > > Thanks, > Scott > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Dec 21 16:12:45 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 21 Dec 2009 15:12:45 -0600 Subject: [Bioperl-l] test fail In-Reply-To: References: <5614E9FF133A47A694EF892D38A1717A@NewLife> Message-ID: T'was a bad test call. I basically changed the test to pull each feature directly by the primary tag, check it against the original sf prior to revcom, then check that the location was revcomp'ed correctly. chris On Dec 21, 2009, at 1:20 PM, Chris Fields wrote: > Saw that from the other day (LocatableSeq commit). I'll check it out. > > chris > > On Dec 21, 2009, at 12:52 PM, Mark A. Jensen wrote: > >> fyi, getting following failure (Perl 5.10, GNU/Linux x86_64) >> >> t/SeqTools/SeqUtils..........................NOK 46/51# Failed test at t/SeqTools/SeqUtils.t line 275. >> # got: '1..4' >> # expected: 'complement(5..8)' >> >> t/SeqTools/SeqUtils..........................NOK 47/51# Failed test at t/SeqTools/SeqUtils.t line 276. >> # got: 'complement(5..8)' >> # expected: '1..4' >> # Looks like you failed 2 tests of 51. >> >> MAJ >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Mon Dec 21 16:27:25 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 21 Dec 2009 16:27:25 -0500 Subject: [Bioperl-l] Bio::Graphics documentation In-Reply-To: <6FC2F08B-E902-449A-9E67-D1417A0BE20C@illinois.edu> References: <4536f7700912211202j4de81bb4k1e9039ed19b4ef97@mail.gmail.com> <6FC2F08B-E902-449A-9E67-D1417A0BE20C@illinois.edu> Message-ID: <1F54D94CE87E4238BC2C6128002FBC6A@NewLife> I've modified Template:Doclink ; if you now do {{Doclink|Bio::Graphics|cpan}} you'll get a page with only the cpan link. {{Doclink|Bio::SeqIO}} etc. works as usual. MAJ ----- Original Message ----- From: "Chris Fields" To: "Scott Cain" Cc: "BioPerl List" Sent: Monday, December 21, 2009 3:22 PM Subject: Re: [Bioperl-l] Bio::Graphics documentation > We can come up with some standard wiki template for those modules no longer in > svn, maybe with just CPAN links. Shouldn't be too hard to do. > > chris > > On Dec 21, 2009, at 2:02 PM, Scott Cain wrote: > >> Hi All, >> >> Today it was pointed out to me that the Bio::Graphics documentation >> links on the BioPerl wiki are broken, no doubt because Bio::Graphics >> is no longer part of bioperl-core (is that how it should be referred >> to?). Anyway, the question is: what is the right way to rectify this >> problem? Since other things may get broken out in the future, I >> suppose we should get some sort of standard established. Can a >> release of Bio::Graphics be placed somewhere on the BioPerl wiki >> server to be processed? >> >> Thanks, >> Scott >> >> >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D. scott at scottcain dot >> net >> GMOD Coordinator (http://gmod.org/) 216-392-3087 >> Ontario Institute for Cancer Research >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Mon Dec 21 16:34:40 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 21 Dec 2009 16:34:40 -0500 Subject: [Bioperl-l] Bio::Graphics documentation In-Reply-To: <6FC2F08B-E902-449A-9E67-D1417A0BE20C@illinois.edu> References: <4536f7700912211202j4de81bb4k1e9039ed19b4ef97@mail.gmail.com> <6FC2F08B-E902-449A-9E67-D1417A0BE20C@illinois.edu> Message-ID: <5081DC24D9AE46FF95075559898B2574@NewLife> Also, applied the new Doclink to Bio::Graphics on wiki. ----- Original Message ----- From: "Chris Fields" To: "Scott Cain" Cc: "BioPerl List" Sent: Monday, December 21, 2009 3:22 PM Subject: Re: [Bioperl-l] Bio::Graphics documentation > We can come up with some standard wiki template for those modules no longer in > svn, maybe with just CPAN links. Shouldn't be too hard to do. > > chris > > On Dec 21, 2009, at 2:02 PM, Scott Cain wrote: > >> Hi All, >> >> Today it was pointed out to me that the Bio::Graphics documentation >> links on the BioPerl wiki are broken, no doubt because Bio::Graphics >> is no longer part of bioperl-core (is that how it should be referred >> to?). Anyway, the question is: what is the right way to rectify this >> problem? Since other things may get broken out in the future, I >> suppose we should get some sort of standard established. Can a >> release of Bio::Graphics be placed somewhere on the BioPerl wiki >> server to be processed? >> >> Thanks, >> Scott >> >> >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D. scott at scottcain dot >> net >> GMOD Coordinator (http://gmod.org/) 216-392-3087 >> Ontario Institute for Cancer Research >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Mon Dec 21 21:51:32 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 21 Dec 2009 21:51:32 -0500 Subject: [Bioperl-l] pdb.pm and annotations In-Reply-To: <2dade3480912160955h4f77277dv8e6b47b7b0fda23a@mail.gmail.com> References: <2dade3480912160955h4f77277dv8e6b47b7b0fda23a@mail.gmail.com> Message-ID: <6292EDA0F05B48578AF7B7E5864C8707@NewLife> Hi Sung-- We didn't plan it, but we added it anyway: see revision 16559 of bioperl-live/trunk. You can then do $pmid = ($struct->annotation->get_Annotations('reference'))[0]->pubmed; and even $doi = ($struct->annotation->get_Annotations('reference'))[0]->doi; Thanks for the heads-up! cheers, MAJ ----- Original Message ----- From: "Sungsam Gong" To: Sent: Wednesday, December 16, 2009 12:55 PM Subject: [Bioperl-l] pdb.pm and annotations > Hi, > > Wanted to get pubmed identifier from a PDB file using Bio::Structure, > so hacked the code. > Knew that Bio::Structure::IO::pdb.pm get relevant info from either > 'JRNL' or 'REMARK 1'. > However could not see any actual code parsing 'PMID'. > >>From pdb.pm, what I see: > > sub _read_PDB_jrnl { > ... > $auth = $self->_concatenate_lines($auth,$rol) if ($subr eq "AUTH"); > $titl = $self->_concatenate_lines($titl,$rol) if ($subr eq "TITL"); > $edit = $self->_concatenate_lines($edit,$rol) if ($subr eq "EDIT"); > $ref = $self->_concatenate_lines($ref ,$rol) if ($subr eq "REF"); > $publ = $self->_concatenate_lines($publ,$rol) if ($subr eq "PUBL"); > $refn = $self->_concatenate_lines($refn,$rol) if ($subr eq "REFN"); > ... > } > > sub _read_PDB_remark_1 { > ... > $auth = $self->_concatenate_lines($auth,$rol) if > ($subr eq "AUTH"); > $titl = $self->_concatenate_lines($titl,$rol) if > ($subr eq "TITL"); > $edit = $self->_concatenate_lines($edit,$rol) if > ($subr eq "EDIT"); > $ref = $self->_concatenate_lines($ref ,$rol) if > ($subr eq "REF"); > $publ = $self->_concatenate_lines($publ,$rol) if > ($subr eq "PUBL"); > $refn = $self->_concatenate_lines($refn,$rol) if > ($subr eq "REFN"); > ... > } > >>From my script, I did: > > ($struc->annotation->get_Annotations('reference'))[0]->authors > ($struc->annotation->get_Annotations('reference'))[0]->title > > or > > my $hash_ref=($struc->annotation->get_Annotations('reference'))[0]->hash_tree > for my $key (keys %{$hash_ref}) { > print $key,": ",$hash_ref->{$key},"\n"; > } > > Any plan to include a code chopping 'PMID' out? > Or did I miss something? > > Cheers, > Sung > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From dan.kortschak at adelaide.edu.au Mon Dec 21 22:24:04 2009 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Tue, 22 Dec 2009 13:54:04 +1030 Subject: [Bioperl-l] call for help and comments on module Message-ID: <1261452244.9520.86.camel@zoidberg.mbs.adelaide.edu.au> Hi, I've been working on a Bio::Tools::Run module to handle the bowtie rapid alignment tool (and associated tools): Bio::Tools::Run::Bowtie (in bioperl-run tree). I have 90% of what I want included in the module and would like some advice from more experienced bioperlers. Feedback on approach is also welcomed (this is my first significant wrapper, and after a long gap from writing module, so I am rusty). The module has ended up being significantly more complicated than I had hoped. There are a few issues I'm having, so I apologise for the list: 1. Informal tests run correctly (outside the t/ tree and Test harness), but formal Test harness tests fail for reasons I cannot understand. (The module is still lacking a lot of tests, but since things were failing in the harness I have placed them as a lower priority and have been working to my micro-script tests - yes, bad form. 2. I am having a big problem with IPC::Run for one of the executables (the module can call 5 different excutables for 7 commands), bowtie-maptool (module command 'map'). All the other commands tested (this excludes bowtie-maqconvert [convert command]) work fine, but maptool fails with an illegal seek - presumably due to the redirection handling? I have no idea how to resolve this, so help would be greatly appreciated (a small script that demonstrates the use that results in the failure is below). There will be provision for returning a Bio::Assembly::IO object through samtools in the finished module, but currently the Bio::Assembly::IO::sam builder doesn't like what bowtie can provide. Thanks for any help. Dan #!/usr/bin/perl use strict; use warnings; use Bio::Tools::Run::Bowtie; # These files are in the bioperl-run t/data/ tree my $rdq = '/usr/local/src/bioperl-run/t/data/bowtie/reads/e_coli_1000.fq'; my $refseq = '/usr/local/src/bioperl-run/t/data/bowtie/indexes/e_coli'; my $bowtiefac = Bio::Tools::Run::Bowtie->new( -command => 'single', -max_seed_mismatches => 2, -seed_length => 28, -max_qual_mismatch => 70, -sam_format => 0 ); my $align = $bowtiefac->run($rdq,$refseq); # this runs fine my $bowtiemap = Bio::Tools::Run::Bowtie->new( -command => 'map' ); my $map = $bowtiemap->run($align); # throws Illegal seek print "$map\n"; open (IN,$map); my $lines =(my @lines)= ; print @lines; print "\n\n$lines\n"; close IN; From maj at fortinbras.us Tue Dec 22 00:19:35 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 22 Dec 2009 00:19:35 -0500 Subject: [Bioperl-l] call for help and comments on module In-Reply-To: <1261452244.9520.86.camel@zoidberg.mbs.adelaide.edu.au> References: <1261452244.9520.86.camel@zoidberg.mbs.adelaide.edu.au> Message-ID: Hey Dan, It looks like if the outfile isn't specified on the commandline for maptool, then the align is written to stdout. So, you could try this workaround in in Bowtie/Config.pm: our %command_files = ( 'single' => [qw( ind seq #out )], 'paired' => [qw( ind seq seq2 #out )], 'crossbow' => [qw( ind seq #out )], 'build' => [qw( ref out )], 'inspect' => [qw( ind >#out )], 'convert' => [qw( bwt out bfa )], - 'map' => [qw( bwt #out )] + 'map' => [qw( bwt >#out )] ); which should be transparent to the user. If this works, then there is probably something funky going on with IPC::Run + maptool; if it doesn't, then the funkiness is prob. in my code. I notice, however, that both bowtie-maptool and bowtie-maqconvert have been removed from the 0.12.0-beta release (http://bowtie-bio.sourceforge.net/index.shtml)... cheers MAJ ----- Original Message ----- From: "Dan Kortschak" To: Sent: Monday, December 21, 2009 10:24 PM Subject: [Bioperl-l] call for help and comments on module > Hi, > > I've been working on a Bio::Tools::Run module to handle the bowtie rapid > alignment tool (and associated tools): Bio::Tools::Run::Bowtie (in > bioperl-run tree). > > I have 90% of what I want included in the module and would like some > advice from more experienced bioperlers. Feedback on approach is also > welcomed (this is my first significant wrapper, and after a long gap > from writing module, so I am rusty). The module has ended up being > significantly more complicated than I had hoped. > > There are a few issues I'm having, so I apologise for the list: > > 1. Informal tests run correctly (outside the t/ tree and Test > harness), but formal Test harness tests fail for reasons I > cannot understand. (The module is still lacking a lot of tests, > but since things were failing in the harness I have placed them > as a lower priority and have been working to my micro-script > tests - yes, bad form. > 2. I am having a big problem with IPC::Run for one of the > executables (the module can call 5 different excutables for 7 > commands), bowtie-maptool (module command 'map'). All the other > commands tested (this excludes bowtie-maqconvert [convert > command]) work fine, but maptool fails with an illegal seek - > presumably due to the redirection handling? I have no idea how > to resolve this, so help would be greatly appreciated (a small > script that demonstrates the use that results in the failure is > below). > > There will be provision for returning a Bio::Assembly::IO object through > samtools in the finished module, but currently the > Bio::Assembly::IO::sam builder doesn't like what bowtie can provide. > > Thanks for any help. > Dan > > > #!/usr/bin/perl > > use strict; > use warnings; > > use Bio::Tools::Run::Bowtie; > > # These files are in the bioperl-run t/data/ tree > my $rdq = '/usr/local/src/bioperl-run/t/data/bowtie/reads/e_coli_1000.fq'; > my $refseq = '/usr/local/src/bioperl-run/t/data/bowtie/indexes/e_coli'; > > my $bowtiefac = Bio::Tools::Run::Bowtie->new( > -command => 'single', > -max_seed_mismatches => 2, > -seed_length => 28, > -max_qual_mismatch => 70, > -sam_format => 0 > ); > > my $align = $bowtiefac->run($rdq,$refseq); # this runs fine > > my $bowtiemap = Bio::Tools::Run::Bowtie->new( > -command => 'map' > ); > > my $map = $bowtiemap->run($align); # throws Illegal seek > > print "$map\n"; > > open (IN,$map); > my $lines =(my @lines)= ; > print @lines; > print "\n\n$lines\n"; > close IN; > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From dan.kortschak at adelaide.edu.au Tue Dec 22 00:51:30 2009 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Tue, 22 Dec 2009 16:21:30 +1030 Subject: [Bioperl-l] call for help and comments on module In-Reply-To: References: <1261452244.9520.86.camel@zoidberg.mbs.adelaide.edu.au> Message-ID: <1261461090.4411.13.camel@epistle> Hi Mark, maptool either outputs to stdout or a specified file - I chose to use a specified file and run it that way, but I've tried the redirect a you suggest, with the same failure result. I think it's a strangeness of maptool (which may well be a reason for it being dropped - also note the maptool output doesn't seem reasonable for the test data provided even when run from the command line). It's probably a result of difficult interaction between IPC::Run and maptool. Any funkiness in your code is not likely to be a cause as I've deeply analysed what is being passed to IPC::Run, and I've quite extensively modified the IPC run handling method from your code to take into account the differences between a single executable with many commands as the base code managed from a cluster of executables each taking a small subset of different filespecs as bowtie needs. My funkiness will undoubtedly swamp yours. Resolution: Will drop bowtie-maptool from module. (Should test maqconvert - if it fails, this will be dropped also unless someone asks otherwise). When the module copes with 0.11.* properly I'll start thinking about 0.12.* which has colourspace handling to deal with. cheers Dan On Tue, 2009-12-22 at 00:19 -0500, Mark A. Jensen wrote: > Hey Dan, > It looks like if the outfile isn't specified on the commandline for > maptool, then the align is written to stdout. So, you could > try this workaround in in Bowtie/Config.pm: > > our %command_files = ( > 'single' => [qw( ind seq #out )], > 'paired' => [qw( ind seq seq2 #out )], > 'crossbow' => [qw( ind seq #out )], > 'build' => [qw( ref out )], > 'inspect' => [qw( ind >#out )], > 'convert' => [qw( bwt out bfa )], > - 'map' => [qw( bwt #out )] > + 'map' => [qw( bwt >#out )] > ); > > which should be transparent to the user. If this works, then > there is probably something funky going on with IPC::Run > + maptool; if it doesn't, then the funkiness is prob. in my code. > > I notice, however, that both bowtie-maptool and bowtie-maqconvert > have been removed from the 0.12.0-beta release > (http://bowtie-bio.sourceforge.net/index.shtml)... > > cheers MAJ From lovebaby39 at gmail.com Wed Dec 23 05:48:55 2009 From: lovebaby39 at gmail.com (Hsueh) Date: Wed, 23 Dec 2009 18:48:55 +0800 Subject: [Bioperl-l] About bioperl issue: get string In-Reply-To: <15F92119-7625-4491-899A-0D49CE1BC861@sbc.su.se> References: <5F281DC3E4514B3AAA8881169B240227@SHAPC> <107080B6-BC05-470C-B426-5DB69BD574C1@sbc.su.se> <9DEC7152C11A4F00B2F919B653E6D572@SHAPC> <15F92119-7625-4491-899A-0D49CE1BC861@sbc.su.se> Message-ID: <52CDD8F61DDC48B9BBADD020EF18E9E0@SHAPC> Dear all I use "$hit_u->name" to get "gnl|uv|Z46234.1:664-3444", but I don't know how to get "P.pastoris DNA for pPIC9K expression vector". while (my $result_u = $blast_report_u-> next_result ) { while (my $hit_u = $result_u->next_hit()){ while (my $hsp_u = $hit_u->next_hsp()){ $hit_u->name; $hsp_u->evalue; $hsp_u->score; } } } I will appreciate if you could tell me how to do it. P.S. How can I download the BioPerl's Manual? (BioPerl's Manual download link?) The flow is BLAST result: ------------------------------------------------------------------------------------------------------------------------------------- BLASTN 2.2.16 [Mar-25-2007] Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. Query= (458 letters) Database: UniVec (build 4.0) 2416 sequences; 597,480 total letters Searching..................................................done Score E Sequences producing significant alignments: (bits) Value gnl|uv|Z46234.1:664-3444 P.pastoris DNA for pPIC9K expression ve... 26 3.1 gnl|uv|U89673.1:863-1946 Cloning vector pIRES1neo 26 3.1 gnl|uv|U13843.1:1887-9923 pBPV cloning vector 26 3.1 >gnl|uv|Z46234.1:664-3444 P.pastoris DNA for pPIC9K expression vector Length = 2781 Score = 26.3 bits (13), Expect = 3.1 Identities = 13/13 (100%) Strand = Plus / Plus Query: 352 tactaccgccatt 364 ||||||||||||| Sbjct: 2209 tactaccgccatt 2221 ------------------------------------------------------------------------------------------------------------------------------------- Reginald Hsueh From hrh at fmi.ch Wed Dec 23 10:14:06 2009 From: hrh at fmi.ch (Hotz, Hans-Rudolf) Date: Wed, 23 Dec 2009 16:14:06 +0100 Subject: [Bioperl-l] About bioperl issue: get string In-Reply-To: <52CDD8F61DDC48B9BBADD020EF18E9E0@SHAPC> Message-ID: Hi Assuming you are using "SearchIO", try: $hit_u->description for more details see: http://www.bioperl.org/wiki/HOWTO:SearchIO Regards, Hans On 12/23/09 11:48 AM, "Hsueh" wrote: > Dear all > > I use "$hit_u->name" to get "gnl|uv|Z46234.1:664-3444", but I don't know how > to get "P.pastoris DNA for pPIC9K expression vector". > > while (my $result_u = $blast_report_u-> next_result ) { > while (my $hit_u = $result_u->next_hit()){ > while (my $hsp_u = $hit_u->next_hsp()){ > $hit_u->name; > $hsp_u->evalue; > $hsp_u->score; > } > } > } > > I will appreciate if you could tell me how to do it. > > P.S. How can I download the BioPerl's Manual? (BioPerl's Manual download > link?) > > > > The flow is BLAST result: > ------------------------------------------------------------------------------ > ------------------------------------------------------- > BLASTN 2.2.16 [Mar-25-2007] > Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, > Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), > "Gapped BLAST and PSI-BLAST: a new generation of protein database search > programs", Nucleic Acids Res. 25:3389-3402. > Query= > (458 letters) > > Database: UniVec (build 4.0) > 2416 sequences; 597,480 total letters > Searching..................................................done > > Score E > Sequences producing significant alignments: > (bits) Value > > gnl|uv|Z46234.1:664-3444 P.pastoris DNA for pPIC9K expression ve... > 26 3.1 > gnl|uv|U89673.1:863-1946 Cloning vector pIRES1neo > 26 3.1 > gnl|uv|U13843.1:1887-9923 pBPV cloning vector > 26 3.1 > >> gnl|uv|Z46234.1:664-3444 P.pastoris DNA for pPIC9K expression vector > Length = 2781 > > Score = 26.3 bits (13), Expect = 3.1 > Identities = 13/13 (100%) > Strand = Plus / Plus > > Query: 352 tactaccgccatt 364 > ||||||||||||| > Sbjct: 2209 tactaccgccatt 2221 > ------------------------------------------------------------------------------ > ------------------------------------------------------- > > Reginald Hsueh > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From pkuonline at gmail.com Wed Dec 23 13:36:49 2009 From: pkuonline at gmail.com (pkuonline) Date: Wed, 23 Dec 2009 12:36:49 -0600 Subject: [Bioperl-l] PAML parser failed for PAML 4.3b and 4.1 Message-ID: <200912231236490784820@gmail.com> Hi Everyone, I used the latest Bioperl build, http://www.bioperl.org/DIST/nightly_builds/bioperl-live.tar.gz and tried to parse CODEML result. I searched the mail list and found current PAML parser is compatible with PAML 4.3a, http://lists.open-bio.org/pipermail/bioperl-l/2009-November/031602.html. However, recently, Ziheng Yang updates his PAML to 4.3b. I found the parser does not work. More strangely, I tested it on the old PAML 4.1 result and also failed. I attached my CODEML outputs here to see whether you guys have some idea. Many thanks ahead! Best regards, ------------------------------------------------------------- Yong Zhang Ph.D, Research Scholar Manyuan Long's Lab University of Chicago -------------- next part -------------- A non-text attachment was scrubbed... Name: rst4.1 Type: application/octet-stream Size: 60616 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: mlc4.1 Type: application/octet-stream Size: 11635 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: mlc4.3b Type: application/octet-stream Size: 11330 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: rst4.3b Type: application/octet-stream Size: 60616 bytes Desc: not available URL: From cjfields at illinois.edu Wed Dec 23 16:19:48 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 23 Dec 2009 15:19:48 -0600 Subject: [Bioperl-l] PAML parser failed for PAML 4.3b and 4.1 In-Reply-To: <200912231236490784820@gmail.com> References: <200912231236490784820@gmail.com> Message-ID: Well, not completely unexpected, but very frustrating nonetheless. Changes to PAML output have broken in just about every PAML parser revision. Not sure when this will be addressed unfortunately, my hope is sooner than later. Can you file a bioperl bug report for this? It's the best place to keep track. http://bugzilla.open-bio.org/ chris On Dec 23, 2009, at 12:36 PM, pkuonline wrote: > Hi Everyone, > > I used the latest Bioperl build, http://www.bioperl.org/DIST/nightly_builds/bioperl-live.tar.gz and tried to parse CODEML result. I searched the mail list and found current PAML parser is compatible with PAML 4.3a, http://lists.open-bio.org/pipermail/bioperl-l/2009-November/031602.html. However, recently, Ziheng Yang updates his PAML to 4.3b. I found the parser does not work. More strangely, I tested it on the old PAML 4.1 result and also failed. > > I attached my CODEML outputs here to see whether you guys have some idea. > > Many thanks ahead! > > Best regards, > ------------------------------------------------------------- > Yong Zhang > Ph.D, Research Scholar > Manyuan Long's Lab > University of Chicago_______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From pkuonline at gmail.com Wed Dec 23 17:45:54 2009 From: pkuonline at gmail.com (pkuonline) Date: Wed, 23 Dec 2009 16:45:54 -0600 Subject: [Bioperl-l] PAML parser failed for PAML 4.3b and 4.1 References: <200912231236490784820@gmail.com>, Message-ID: <200912231645536094087@gmail.com> Hi Chris, Thanks for your reply and I just submitted this bug to bugzilla. Have a nice holiday! ------------------------------------------------------------- Yong Zhang Ph.D, Research Scholar Manyuan Long's Lab University of Chicago >------------------------------------------------------------- >From: Chris Fields >Time: 2009-12-23 15:19:50 >To: pkuonline bioperl-l >Subject: Re: [Bioperl-l] PAML parser failed for PAML 4.3b and 4.1 >Well, not completely unexpected, but very frustrating nonetheless. Changes to PAML output have broken in just about every PAML parser revision. Not sure when this will be addressed unfortunately, my hope is sooner than later. > >Can you file a bioperl bug report for this? It's the best place to keep track. > >http://bugzilla.open-bio.org/ > >chris > >On Dec 23, 2009, at 12:36 PM, pkuonline wrote: > >> Hi Everyone, >> >> I used the latest Bioperl build, http://www.bioperl.org/DIST/nightly_builds/bioperl-live.tar.gz and tried to parse CODEML result. I searched the mail list and found current PAML parser is compatible with PAML 4.3a, http://lists.open-bio.org/pipermail/bioperl-l/2009-November/031602.html. However, recently, Ziheng Yang updates his PAML to 4.3b. I found the parser does not work. More strangely, I tested it on the old PAML 4.1 result and also failed. >> >> I attached my CODEML outputs here to see whether you guys have some idea. >> >> Many thanks ahead! >> >> Best regards, >> ------------------------------------------------------------- >> Yong Zhang >> Ph.D, Research Scholar >> Manyuan Long's Lab >> University of Chicago_______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From David.Messina at sbc.su.se Wed Dec 23 18:23:44 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 24 Dec 2009 00:23:44 +0100 Subject: [Bioperl-l] PAML parser failed for PAML 4.3b and 4.1 In-Reply-To: <200912231645536094087@gmail.com> References: <200912231236490784820@gmail.com>, <200912231645536094087@gmail.com> Message-ID: <08E748F4-1398-4543-AB77-0640441BC323@sbc.su.se> Hi Yong, Could you attach your codeml output to the bug report, too? I'll take a look at this as soon as I can. Dave From maj at fortinbras.us Thu Dec 24 00:47:10 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 24 Dec 2009 00:47:10 -0500 Subject: [Bioperl-l] PAML parser failed for PAML 4.3b and 4.1 In-Reply-To: <200912231645536094087@gmail.com> References: <200912231236490784820@gmail.com>, <200912231645536094087@gmail.com> Message-ID: <2DF45CDC2BE44A85ADCD865A98CD13D6@NewLife> Yong-- say 'ni hao' to Manyuan for me --- cheers MAJ ----- Original Message ----- From: "pkuonline" To: "Chris Fields" Cc: "bioperl-l" Sent: Wednesday, December 23, 2009 5:45 PM Subject: Re: [Bioperl-l] PAML parser failed for PAML 4.3b and 4.1 > Hi Chris, > > Thanks for your reply and I just submitted this bug to bugzilla. > > Have a nice holiday! > ------------------------------------------------------------- > Yong Zhang > Ph.D, Research Scholar > Manyuan Long's Lab > University of Chicago > >>------------------------------------------------------------- >>From: Chris Fields >>Time: 2009-12-23 15:19:50 >>To: pkuonline bioperl-l >>Subject: Re: [Bioperl-l] PAML parser failed for PAML 4.3b and 4.1 > >>Well, not completely unexpected, but very frustrating nonetheless. Changes to >>PAML output have broken in just about every PAML parser revision. Not sure >>when this will be addressed unfortunately, my hope is sooner than later. >> >>Can you file a bioperl bug report for this? It's the best place to keep >>track. >> >>http://bugzilla.open-bio.org/ >> >>chris >> >>On Dec 23, 2009, at 12:36 PM, pkuonline wrote: >> >>> Hi Everyone, >>> >>> I used the latest Bioperl build, >>> http://www.bioperl.org/DIST/nightly_builds/bioperl-live.tar.gz and tried to >>> parse CODEML result. I searched the mail list and found current PAML parser >>> is compatible with PAML 4.3a, >>> http://lists.open-bio.org/pipermail/bioperl-l/2009-November/031602.html. >>> However, recently, Ziheng Yang updates his PAML to 4.3b. I found the parser >>> does not work. More strangely, I tested it on the old PAML 4.1 result and >>> also failed. >>> >>> I attached my CODEML outputs here to see whether you guys have some idea. >>> >>> Many thanks ahead! >>> >>> Best regards, >>> ------------------------------------------------------------- >>> Yong Zhang >>> Ph.D, Research Scholar >>> Manyuan Long's Lab >>> University of >>> Chicago_______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > -------------------------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bhakti.dwivedi at gmail.com Fri Dec 25 21:46:51 2009 From: bhakti.dwivedi at gmail.com (Bhakti Dwivedi) Date: Fri, 25 Dec 2009 21:46:51 -0500 Subject: [Bioperl-l] how to retrieve organism name from accession number? Message-ID: Hi, Does anyone know how to retrieve the "Source" or the "Species name" given the accession number using Bioperl. I have these 30,000 accession numbers for which I need to get the source organisms. Any kind of help will be appreciated. Thanks BD From maj at fortinbras.us Fri Dec 25 22:52:10 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 25 Dec 2009 22:52:10 -0500 Subject: [Bioperl-l] how to retrieve organism name from accession number? In-Reply-To: References: Message-ID: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife> Bhakti, The following example (using EUtilities) may serve your purpose: use Bio::DB::EUtilities; my (%taxa, @taxa); my (%names, %idmap); # these are protein ids; nuc ids will work by changing -dbfrom => 'nucleotide', # (probably) my @ids = qw(1621261 89318838 68536103 20807972 730439); my $factory = Bio::DB::EUtilities->new(-eutil => 'elink', -db => 'taxonomy', -dbfrom => 'protein', -correspondence => 1, -id => \@ids); # iterate through the LinkSet objects while (my $ds = $factory->next_LinkSet) { $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0] } @taxa = @taxa{@ids}; $factory = Bio::DB::EUtilities->new(-eutil => 'esummary', -db => 'taxonomy', -id => \@taxa ); while (local $_ = $factory->next_DocSum) { $names{($_->get_contents_by_name('TaxId'))[0]} = ($_->get_contents_by_name('ScientificName'))[0]; } foreach (@ids) { $idmap{$_} = $names{$taxa{$_}}; } # %idmap is # 1621261 => 'Mycobacterium tuberculosis H37Rv' # 20807972 => 'Thermoanaerobacter tengcongensis MB4' # 68536103 => 'Corynebacterium jeikeium K411' # 730439 => 'Bacillus caldolyticus' # 89318838 => undef (this record has been removed from the db) 1; You probably will need to break up your 30000 into chunks (say, 1000-3000 each), and do the above on each chunk with a sleep 3; or so separating the queries. MAJ ----- Original Message ----- From: "Bhakti Dwivedi" To: Sent: Friday, December 25, 2009 9:46 PM Subject: [Bioperl-l] how to retrieve organism name from accession number? > Hi, > > Does anyone know how to retrieve the "Source" or the "Species name" given > the accession number using Bioperl. I have these 30,000 accession numbers > for which I need to get the source organisms. Any kind of help will be > appreciated. > > Thanks > > BD > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Sat Dec 26 06:47:29 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 26 Dec 2009 05:47:29 -0600 Subject: [Bioperl-l] how to retrieve organism name from accession number? In-Reply-To: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife> References: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife> Message-ID: On Dec 25, 2009, at 9:52 PM, Mark A. Jensen wrote: > Bhakti, > The following example (using EUtilities) may serve your purpose: > > use Bio::DB::EUtilities; > > ... > You probably will need to break up your 30000 into chunks > (say, 1000-3000 each), and do the above on each chunk with a > > sleep 3; > > or so separating the queries. > MAJ The 'sleep 3' is built-in and now (on main trunk) matches NCBI's current spec of 3 queries/sec. chris From arpm9 at charter.net Sun Dec 27 16:42:09 2009 From: arpm9 at charter.net (arpm9) Date: Sun, 27 Dec 2009 16:42:09 -0500 Subject: [Bioperl-l] Should Bio::Tools::BPlite be deprecated? In-Reply-To: 4533A8D3.90709@sendu.me.uk Message-ID: <867A36FEE0244EF2950108C42BD2BE58@paulb0d5af35b9> hi chris, I was trying to make sense of this backpacking lite and just simply wanted to view the light...and got nowhere and very frustrated...please help if you can...or whoever can...thanks Pm From pengyu.ut at gmail.com Tue Dec 29 11:08:09 2009 From: pengyu.ut at gmail.com (Peng Yu) Date: Tue, 29 Dec 2009 10:08:09 -0600 Subject: [Bioperl-l] Comparison between bioperl and biopython? Message-ID: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com> May I ask somebody who are versitile in both bioperl and biopython comment on the pros and cons of bioperl and biopython? I'm sending this email to both bioperl and biopython mailing lists. But I hope that it will not result in any contention. I assume that the functionality between bioperl or biopython is the same, i.e., tasks can be done in bioperl can be done biopython and vice versa, as both libraries have been out there over 10 years. Please correct me if my understanding is not true. Given that a task that can be done with either bioperl or biopython, I, in particularly, want to know how long it will take to write the code for the task in bioperl and biopython, with the same readability requirement (see below) and the assumption that users have the same fluency in perl and python. python is claimed to be good for maintainability. But perl is criticized for there-are-many-ways-for-a-given-task. Since there are multiple ways in perl, let us assume that we always use perl in a readable way. From jason at bioperl.org Tue Dec 29 11:49:20 2009 From: jason at bioperl.org (Jason Stajich) Date: Tue, 29 Dec 2009 08:49:20 -0800 Subject: [Bioperl-l] Comparison between bioperl and biopython? In-Reply-To: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com> References: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com> Message-ID: <2B85EF86-8A84-491B-8C33-7EC16CCB8CBC@bioperl.org> Are you asking for the purposes of choosing a toolkit for your work or just curious about the advantages/disadvantages of language choice? -jason On Dec 29, 2009, at 8:08 AM, Peng Yu wrote: > May I ask somebody who are versitile in both bioperl and biopython > comment on the pros and cons of bioperl and biopython? I'm sending > this email to both bioperl and biopython mailing lists. But I hope > that it will not result in any contention. > > I assume that the functionality between bioperl or biopython is the > same, i.e., tasks can be done in bioperl can be done biopython and > vice versa, as both libraries have been out there over 10 years. > Please correct me if my understanding is not true. > > Given that a task that can be done with either bioperl or biopython, > I, in particularly, want to know how long it will take to write the > code for the task in bioperl and biopython, with the same readability > requirement (see below) and the assumption that users have the same > fluency in perl and python. > > python is claimed to be good for maintainability. But perl is > criticized for there-are-many-ways-for-a-given-task. Since there are > multiple ways in perl, let us assume that we always use perl in a > readable way. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ From ak at ebi.ac.uk Tue Dec 29 11:57:18 2009 From: ak at ebi.ac.uk (Andreas =?iso-8859-1?B?S+Ro5HJp?=) Date: Tue, 29 Dec 2009 16:57:18 +0000 Subject: [Bioperl-l] Comparison between bioperl and biopython? In-Reply-To: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com> References: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com> Message-ID: <20091229165718.GB30356@quux.windows.ebi.ac.uk> On Tue, Dec 29, 2009 at 10:08:09AM -0600, Peng Yu wrote: > May I ask somebody who are versitile in both bioperl and biopython > comment on the pros and cons of bioperl and biopython? I'm sending > this email to both bioperl and biopython mailing lists. But I hope > that it will not result in any contention. > > I assume that the functionality between bioperl or biopython is the > same, i.e., tasks can be done in bioperl can be done biopython and > vice versa, as both libraries have been out there over 10 years. > Please correct me if my understanding is not true. > > Given that a task that can be done with either bioperl or biopython, > I, in particularly, want to know how long it will take to write the > code for the task in bioperl and biopython, with the same readability > requirement (see below) and the assumption that users have the same > fluency in perl and python. > > python is claimed to be good for maintainability. But perl is > criticized for there-are-many-ways-for-a-given-task. Since there are > multiple ways in perl, let us assume that we always use perl in a > readable way. Assuming, as you do, that the functionality of BioPerl and BioPython is the same: Which of the two programming languages are you (or your team) most proficient in? Use that language. Regards, Andreas -- Andreas K?h?ri, Ensembl Software Developer European Bioinformatics Institute (EMBL-EBI) Wellcome Trust Genome Campus, Hinxton Cambridge CB10 1SD, United Kingdom From sdavis2 at mail.nih.gov Tue Dec 29 12:03:40 2009 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Tue, 29 Dec 2009 12:03:40 -0500 Subject: [Bioperl-l] [Biopython] Comparison between bioperl and biopython? In-Reply-To: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com> References: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com> Message-ID: <264855a00912290903m213d7cc4l607e8fa0bad55571@mail.gmail.com> On Tue, Dec 29, 2009 at 11:08 AM, Peng Yu wrote: > May I ask somebody who are versitile in both bioperl and biopython > comment on the pros and cons of bioperl and biopython? I'm sending > this email to both bioperl and biopython mailing lists. But I hope > that it will not result in any contention. > > I assume that the functionality between bioperl or biopython is the > same, i.e., tasks can be done in bioperl can be done biopython and > vice versa, as both libraries have been out there over 10 years. > Please correct me if my understanding is not true. The two projects have similar goals, but saying that the functionality is the same would be an extreme oversimplification. You will need to define what you want to do and then check to see what the two projects have to offer. This will, in general, require perusing the websites for both projects as well as the relevant documentation. > Given that a task that can be done with either bioperl or biopython, > I, in particularly, want to know how long it will take to write the > code for the task in bioperl and biopython, with the same readability > requirement (see below) and the assumption that users have the same > fluency in perl and python. Again, you will want to define the task(s) to be accomplished and then weigh the pros and cons of each project combined with local expertise. If you don't know what you want to do, then you can certainly read some examples on the websites and see which project strikes you as a "winner" for you. > python is claimed to be good for maintainability. But perl is > criticized for there-are-many-ways-for-a-given-task. Since there are > multiple ways in perl, let us assume that we always use perl in a > readable way. These two statements are generalizations that provide little insight into the strengths or weaknesses of the languages. In other words, one can write good or bad code in both languages. Hope that helps. Sean From wenzhiwang1983 at yahoo.com.cn Tue Dec 29 13:30:02 2009 From: wenzhiwang1983 at yahoo.com.cn (WangWenzhi) Date: Wed, 30 Dec 2009 02:30:02 +0800 (CST) Subject: [Bioperl-l] Comparison between bioperl and biopython? In-Reply-To: <2B85EF86-8A84-491B-8C33-7EC16CCB8CBC@bioperl.org> Message-ID: <658770.25534.qm@web15204.mail.cnb.yahoo.com> Dear Jason, Plink is a very useful program in the population genetics, especially in the Genome-Wide SNP scan era. Is there any plan to add the Plink (ped or tped) format to Bio::PopGen::IO? Thanks. Wenzhi Wang State Key Laboratory of Genetic Resources and Evolution Kunming Institute of Zoology, Chinese Academy of Sciences Kunming, Yunnan 650223 P. R. China Tel: 86 871 5198 993 Fax: 86 871 5195 430 E-mail: wenzhiwang1983 at yahoo.com.cn ___________________________________________________________ ?????????????????????????????????? http://card.mail.cn.yahoo.com/ From pengyu.ut at gmail.com Tue Dec 29 13:58:59 2009 From: pengyu.ut at gmail.com (Peng Yu) Date: Tue, 29 Dec 2009 12:58:59 -0600 Subject: [Bioperl-l] Comparison between bioperl and biopython? In-Reply-To: <2B85EF86-8A84-491B-8C33-7EC16CCB8CBC@bioperl.org> References: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com> <2B85EF86-8A84-491B-8C33-7EC16CCB8CBC@bioperl.org> Message-ID: <366c6f340912291058t6c601e57re0c35e69fe81e09d@mail.gmail.com> To choose a toolkit for my work. On Tue, Dec 29, 2009 at 10:49 AM, Jason Stajich wrote: > Are you asking for the purposes of choosing a toolkit for your work or just > curious about the advantages/disadvantages of language choice? > > -jason > On Dec 29, 2009, at 8:08 AM, Peng Yu wrote: > >> May I ask somebody who are versitile in both bioperl and biopython >> comment on the pros and cons of bioperl and biopython? I'm sending >> this email to both bioperl and biopython mailing lists. But I hope >> that it will not result in any contention. >> >> I assume that the functionality between bioperl or biopython is the >> same, i.e., tasks can be done in bioperl can be done biopython and >> vice versa, as both libraries have been out there over 10 years. >> Please correct me if my understanding is not true. >> >> Given that a task that can be done with either bioperl or biopython, >> I, in particularly, want to know how long it will take to write the >> code for the task in bioperl and biopython, with the same readability >> requirement (see below) and the assumption that users have the same >> fluency in perl and python. >> >> python is claimed to be good for maintainability. But perl is >> criticized for there-are-many-ways-for-a-given-task. Since there are >> multiple ways in perl, let us assume that we always use perl in a >> readable way. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > http://fungalgenomes.org/ > > From pengyu.ut at gmail.com Tue Dec 29 14:15:14 2009 From: pengyu.ut at gmail.com (Peng Yu) Date: Tue, 29 Dec 2009 13:15:14 -0600 Subject: [Bioperl-l] [Biopython] Comparison between bioperl and biopython? In-Reply-To: <264855a00912290903m213d7cc4l607e8fa0bad55571@mail.gmail.com> References: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com> <264855a00912290903m213d7cc4l607e8fa0bad55571@mail.gmail.com> Message-ID: <366c6f340912291115o58ba0b82kce74e18fecd833c8@mail.gmail.com> On Tue, Dec 29, 2009 at 11:03 AM, Sean Davis wrote: > On Tue, Dec 29, 2009 at 11:08 AM, Peng Yu wrote: >> May I ask somebody who are versitile in both bioperl and biopython >> comment on the pros and cons of bioperl and biopython? I'm sending >> this email to both bioperl and biopython mailing lists. But I hope >> that it will not result in any contention. >> >> I assume that the functionality between bioperl or biopython is the >> same, i.e., tasks can be done in bioperl can be done biopython and >> vice versa, as both libraries have been out there over 10 years. >> Please correct me if my understanding is not true. > > The two projects have similar goals, but saying that the functionality > is the same would be an extreme oversimplification. ?You will need to > define what you want to do and then check to see what the two projects > have to offer. ?This will, in general, require perusing the websites > for both projects as well as the relevant documentation. According to your experience, are there some tasks that are easier with one than with another? >> Given that a task that can be done with either bioperl or biopython, >> I, in particularly, want to know how long it will take to write the >> code for the task in bioperl and biopython, with the same readability >> requirement (see below) and the assumption that users have the same >> fluency in perl and python. > > Again, you will want to define the task(s) to be accomplished and then > weigh the pros and cons of each project combined with local expertise. > ?If you don't know what you want to do, then you can certainly read > some examples on the websites and see which project strikes you as a > "winner" for you. > >> python is claimed to be good for maintainability. But perl is >> criticized for there-are-many-ways-for-a-given-task. Since there are >> multiple ways in perl, let us assume that we always use perl in a >> readable way. > > These two statements are generalizations that provide little insight > into the strengths or weaknesses of the languages. ?In other words, > one can write good or bad code in both languages. > > Hope that helps. > > Sean > From alperyilmaz at gmail.com Tue Dec 29 14:36:03 2009 From: alperyilmaz at gmail.com (Alper Yilmaz) Date: Tue, 29 Dec 2009 14:36:03 -0500 Subject: [Bioperl-l] Bio::TreeIO, Bio::Tree::Draw::Cladogram and phyloxml issues.. Message-ID: Hello, I have a tree in phyloxml format, and am trying to draw a subtree by using a spefic node as the root. I used Bio::Tree::Draw::Cladogram for drawing and encountered some problems. When I use whole tree and draw it, everything is fine; but, when I pick a particular node and construct the subtree from that node's ancestor by using "my $subtree = Bio::Tree::Tree->new(-root => $new_root, -nodelete => 1);", Bio::Tree::Draw::Cladogram creates a faulty EPS file, which contains extra lines added in the middle of the file. For instance: . . . 72.0820393261372 126 moveto (OsIBCD006509) show 30 81.25 moveto 81.25 lineto lineto 48.5410196630686 120 moveto 30 120 lineto . . . Should read: 72.0820393261372 126 moveto (OsIBCD006509) show 48.5410196630686 120 moveto 30 120 lineto Also, I tried to write the subtree into a new phyloxml file first, then draw it. The code is shown as follows: my $savefile = "save.phyloxml"; my $treeout = Bio::TreeIO->new(-format =>'phyloxml', -file => ">$savefile"); $treeout->write_tree($subtree); my $tree2 = Bio::TreeIO->new(-format =>'phyloxml', -file => "save.phyloxml"); my $t1 = $tree2->next_tree; my $image_output = "test.eps"; my $obj1 = Bio::Tree::Draw::Cladogram->new(-tree => $t1, -top => 10, -bottom => 10,); $obj1->print(-file => $image_output); The generated phyloxml file, which is named save.phyloxml, has an additional new line between "" and "" at the end of the file. And this additional new line lead an error when doing the parsing(open file and draw eps). I removed the new line, manually, then Bio::Tree::Draw::Cladogram gave me the eps file successfully. Anyone knows how to fix these problems: 1- faulty eps file generation 2- additional newline character in phyloxml output Is it the problem about the way I create the subtree? The phyloxml file I used can be downloaded from: http://grassius.org/download/HSF.phyloxml Run this code with the phyloxml file to see newline character problem: http://pastebin.com/f87ee1ee Run this code with the phyloxml file to see faulty eps file problem: http://pastebin.com/fc4715a1 Alper Yilmaz Post-doctoral Researcher Plant Biotechnology Center The Ohio State University 1060 Carmack Rd Columbus, OH 43210 (614)688-4954 From pengyu.ut at gmail.com Tue Dec 29 16:32:17 2009 From: pengyu.ut at gmail.com (Peng Yu) Date: Tue, 29 Dec 2009 15:32:17 -0600 Subject: [Bioperl-l] Document missing on Core/Latest/modules.html Message-ID: <366c6f340912291332i257f8061r6fa277324137033b@mail.gmail.com> http://bioperl.org/Core/Latest/modules.html Many links if not all are broken on the above pages. Could somebody fix it? For example, on http://www.bioperl.org/wiki/HOWTOs/txt/Beginners.txt, I see the following error. There is currently no text in this page. You can search for this page title in other pages, search the related logs, or edit this page. From jason at bioperl.org Tue Dec 29 16:49:00 2009 From: jason at bioperl.org (Jason Stajich) Date: Tue, 29 Dec 2009 13:49:00 -0800 Subject: [Bioperl-l] Document missing on Core/Latest/modules.html In-Reply-To: <366c6f340912291332i257f8061r6fa277324137033b@mail.gmail.com> References: <366c6f340912291332i257f8061r6fa277324137033b@mail.gmail.com> Message-ID: That is an outdated URL I am not sure where you are linking it from. We can probably now disable all old '/Core' URLs. All documentation links are in the /wiki/ The beginner's howto is here for example http://bioperl.org/wiki/HOWTO:Beginners > http://www.bioperl.org/wiki/HOWTOs On Dec 29, 2009, at 1:32 PM, Peng Yu wrote: > http://bioperl.org/Core/Latest/modules.html > > Many links if not all are broken on the above pages. Could somebody > fix it? > > For example, on http://www.bioperl.org/wiki/HOWTOs/txt/Beginners.txt, > I see the following error. > > There is currently no text in this page. You can search for this page > title in other pages, search the related logs, or edit this page. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ From jason at bioperl.org Tue Dec 29 16:50:26 2009 From: jason at bioperl.org (Jason Stajich) Date: Tue, 29 Dec 2009 13:50:26 -0800 Subject: [Bioperl-l] Comparison between bioperl and biopython? In-Reply-To: <658770.25534.qm@web15204.mail.cnb.yahoo.com> References: <658770.25534.qm@web15204.mail.cnb.yahoo.com> Message-ID: yep - be great if someone were to write it. This being a volunteer project we welcome your contribution. No I don't specifically have plans to do it, but maybe you can give it a try or another population genetics interested bioperl user/developer? -jason On Dec 29, 2009, at 10:30 AM, WangWenzhi wrote: > Dear Jason, > > Plink is a very useful program in the population genetics, > especially in the Genome-Wide SNP scan era. Is there any plan to add > the Plink (ped or tped) format to Bio::PopGen::IO? > > Thanks. > > Wenzhi Wang > State Key Laboratory of Genetic Resources and Evolution > Kunming Institute of Zoology, Chinese Academy of Sciences > Kunming, Yunnan 650223 P. R. China > Tel: 86 871 5198 993 > Fax: 86 871 5195 430 > E-mail: wenzhiwang1983 at yahoo.com.cn > > > ___________________________________________________________ > ????????????????? > http://card.mail.cn.yahoo.com/ -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ From jason at bioperl.org Tue Dec 29 16:57:49 2009 From: jason at bioperl.org (Jason Stajich) Date: Tue, 29 Dec 2009 13:57:49 -0800 Subject: [Bioperl-l] [Biopython] Comparison between bioperl and biopython? In-Reply-To: <366c6f340912291115o58ba0b82kce74e18fecd833c8@mail.gmail.com> References: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com> <264855a00912290903m213d7cc4l607e8fa0bad55571@mail.gmail.com> <366c6f340912291115o58ba0b82kce74e18fecd833c8@mail.gmail.com> Message-ID: <02851B8A-E74E-453E-9725-6FA8F3995F82@bioperl.org> On Dec 29, 2009, at 11:15 AM, Peng Yu wrote: > On Tue, Dec 29, 2009 at 11:03 AM, Sean Davis > wrote: >> On Tue, Dec 29, 2009 at 11:08 AM, Peng Yu >> wrote: >>> May I ask somebody who are versitile in both bioperl and biopython >>> comment on the pros and cons of bioperl and biopython? I'm sending >>> this email to both bioperl and biopython mailing lists. But I hope >>> that it will not result in any contention. >>> >>> I assume that the functionality between bioperl or biopython is the >>> same, i.e., tasks can be done in bioperl can be done biopython and >>> vice versa, as both libraries have been out there over 10 years. >>> Please correct me if my understanding is not true. >> >> The two projects have similar goals, but saying that the >> functionality >> is the same would be an extreme oversimplification. You will need to >> define what you want to do and then check to see what the two >> projects >> have to offer. This will, in general, require perusing the websites >> for both projects as well as the relevant documentation. > > According to your experience, are there some tasks that are easier > with one than with another? As you have still failed to give much insight into the 'tasks' it is hard to give you a better answer. If there is a module or set of routines already written then yes one might be easier than the other. Otherwise it just depends on your strengths in the programming language. We discussed the strengths of the different toolkits briefly on the podcast last month. http://twit.tv/floss96 I echo Sean. Use whichever language you are a better programmer in. BioPerl is more mature in some facets than is BioPython, but BioPython has some components that are more heavily developed and supported than BioPerl (structures being one of those and interfacing that to pyMol would be a strength). I personally think the Gbrowse, Bio-Graphics, and Bio::DB::GFF/Bio::DB::SeqFeature::Store interface to Sequence databases and Features is a critical aspect of mining genomic data and features and use these heavily in my work, making BioPerl easy and powerful for my tasks. That and sequence and alignment parsing and reformatting. But there are comparable tools written in python with and without BioPython that you can also use so mainly it is about building up an expertise in a toolkit and going forward. The BioPerl faithful will probably say it is more useful toolkit to us, but we are of course a biased sample. Both projects can benefit from more users and developers contributing code and documentation so I would just jump in and give it a try if you are unsure which will be easier for you. > >>> Given that a task that can be done with either bioperl or biopython, >>> I, in particularly, want to know how long it will take to write the >>> code for the task in bioperl and biopython, with the same >>> readability >>> requirement (see below) and the assumption that users have the same >>> fluency in perl and python. >> >> Again, you will want to define the task(s) to be accomplished and >> then >> weigh the pros and cons of each project combined with local >> expertise. >> If you don't know what you want to do, then you can certainly read >> some examples on the websites and see which project strikes you as a >> "winner" for you. >> >>> python is claimed to be good for maintainability. But perl is >>> criticized for there-are-many-ways-for-a-given-task. Since there are >>> multiple ways in perl, let us assume that we always use perl in a >>> readable way. >> >> These two statements are generalizations that provide little insight >> into the strengths or weaknesses of the languages. In other words, >> one can write good or bad code in both languages. >> >> Hope that helps. >> >> Sean >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ From pengyu.ut at gmail.com Tue Dec 29 17:01:05 2009 From: pengyu.ut at gmail.com (Peng Yu) Date: Wed, 30 Dec 2009 16:01:05 +1800 Subject: [Bioperl-l] How to download the exon sequences, and the exon and CDS boundary for a RefSeq ID? Message-ID: <366c6f340912291401t3ff173fbrc44fe0d4078be148@mail.gmail.com> I see the following example. But it is not clear to me how to get the exon sequences. I also want to get the exon boundaries and associated CDS boundaries. Although, I can get the boundary information from ucsc table browser, but it would be convenient if I can get it in bioperl along with the sequence. Could somebody let me know how do it? http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/DB/RefSeq.html From sdavis2 at mail.nih.gov Tue Dec 29 17:13:30 2009 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Tue, 29 Dec 2009 17:13:30 -0500 Subject: [Bioperl-l] Document missing on Core/Latest/modules.html In-Reply-To: <366c6f340912291332i257f8061r6fa277324137033b@mail.gmail.com> References: <366c6f340912291332i257f8061r6fa277324137033b@mail.gmail.com> Message-ID: <264855a00912291413r7ce37e2h673dec7c2624db6@mail.gmail.com> On Tue, Dec 29, 2009 at 4:32 PM, Peng Yu wrote: > http://bioperl.org/Core/Latest/modules.html > > Many links if not all are broken on the above pages. Could somebody fix it? > > For example, on http://www.bioperl.org/wiki/HOWTOs/txt/Beginners.txt, > I see the following error. > > There is currently no text in this page. You can search for this page > title in other pages, search the related logs, or edit this page. It is unfortunate that the links are broken on that page. However, I believe that page is somewhat outdated, anyway. Here are the HOWTO pages: http://www.bioperl.org/wiki/HOWTOs Sean From pengyu.ut at gmail.com Tue Dec 29 17:21:16 2009 From: pengyu.ut at gmail.com (Peng Yu) Date: Wed, 30 Dec 2009 16:21:16 +1800 Subject: [Bioperl-l] Document missing on Core/Latest/modules.html In-Reply-To: References: <366c6f340912291332i257f8061r6fa277324137033b@mail.gmail.com> Message-ID: <366c6f340912291421m38bb8348oe6b224f29208f9f4@mail.gmail.com> On Wed, Dec 30, 2009 at 3:49 PM, Jason Stajich wrote: > That is an outdated URL I am not sure where you are linking it from. We can > probably now disable all old '/Core' URLs. I'm linked from here. http://www.bioperl.org/wiki/BioPerl_Tutorial Since those URLs are outdated. Could you please fix the links on the above link? > All documentation links are in the /wiki/ > > The beginner's howto is here for example > ?http://bioperl.org/wiki/HOWTO:Beginners > >> http://www.bioperl.org/wiki/HOWTOs > > > On Dec 29, 2009, at 1:32 PM, Peng Yu wrote: > >> http://bioperl.org/Core/Latest/modules.html >> >> Many links if not all are broken on the above pages. Could somebody fix >> it? >> >> For example, on http://www.bioperl.org/wiki/HOWTOs/txt/Beginners.txt, >> I see the following error. >> >> There is currently no text in this page. You can search for this page >> title in other pages, search the related logs, or edit this page. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > http://fungalgenomes.org/ > > From sdavis2 at mail.nih.gov Tue Dec 29 18:06:17 2009 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Tue, 29 Dec 2009 18:06:17 -0500 Subject: [Bioperl-l] How to download the exon sequences, and the exon and CDS boundary for a RefSeq ID? In-Reply-To: <366c6f340912291401t3ff173fbrc44fe0d4078be148@mail.gmail.com> References: <366c6f340912291401t3ff173fbrc44fe0d4078be148@mail.gmail.com> Message-ID: <264855a00912291506s13c32d5dg7b46f0cc34c20f94@mail.gmail.com> On Tue, Dec 29, 2009 at 5:01 PM, Peng Yu wrote: > I see the following example. But it is not clear to me how to get the > exon sequences. I also want to get the exon boundaries and associated > CDS boundaries. Although, I can get the boundary information from ucsc > table browser, but it would be convenient if I can get it in bioperl > along with the sequence. > > Could somebody let me know how do it? > > http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/DB/RefSeq.html Hi, Peng. There may be some confusion, as the UCSC database aligns RefSeq sequence to a genome to generate exon start and end coordinates. However, the RefSeq records retrieved by Bio::DB::RefSeq are not in genomic context and so do not have start and end locations on the genome. That is, if you want the starts and ends along the genome, that information is not available from the RefSeq record itself, I don't think. If that is what you need (genomic coordinates), you can download the information directly from UCSC, download flat files from NCBI mapview, or even from ensembl (using biomart, for instance). If you are looking for a bioperl-compliant way of doing this, look at the Ensembl Perl API. Sean From jkhilmer at gmail.com Tue Dec 29 14:55:18 2009 From: jkhilmer at gmail.com (Jonathan Hilmer) Date: Tue, 29 Dec 2009 12:55:18 -0700 Subject: [Bioperl-l] [Biopython] Comparison between bioperl and biopython? In-Reply-To: <366c6f340912291115o58ba0b82kce74e18fecd833c8@mail.gmail.com> References: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com> <264855a00912290903m213d7cc4l607e8fa0bad55571@mail.gmail.com> <366c6f340912291115o58ba0b82kce74e18fecd833c8@mail.gmail.com> Message-ID: <81277ce10912291155x6dde10ewe2055b9692d077c1@mail.gmail.com> Personally, I think that the differences between Python and Perl (although substantial) are not large enough to make the language itself the deciding factor. Instead, consider the larger community of software. I haven't yet found a situation in which Python cannot be applied: it can be used with R (statistics); lower-level code C or fortran; visualization software such as PyMol, Chimera, Blender, VTK; plotting with matplotlib; and scipy/numpy or sage, which provide innumerable benefits for computation, data-processing, etc. Although I don't claim to have a great deal of experience with Perl, I haven't seen the same integration with that language: I'm assuming it can be used with R and VTK (not sure about C or fortran?). For this reason, unless your work is highly targeted and you have no use programming language integration with other software, I would recommend Python. For perl experts, I would truly appreciate any corrections you could offer to these observations of mine, since I wouldn't mind using perl if it offers benefits either in general or for specific applications. Jonathan On Tue, Dec 29, 2009 at 12:15 PM, Peng Yu wrote: > On Tue, Dec 29, 2009 at 11:03 AM, Sean Davis wrote: >> On Tue, Dec 29, 2009 at 11:08 AM, Peng Yu wrote: >>> May I ask somebody who are versitile in both bioperl and biopython >>> comment on the pros and cons of bioperl and biopython? I'm sending >>> this email to both bioperl and biopython mailing lists. But I hope >>> that it will not result in any contention. >>> >>> I assume that the functionality between bioperl or biopython is the >>> same, i.e., tasks can be done in bioperl can be done biopython and >>> vice versa, as both libraries have been out there over 10 years. >>> Please correct me if my understanding is not true. >> >> The two projects have similar goals, but saying that the functionality >> is the same would be an extreme oversimplification. ?You will need to >> define what you want to do and then check to see what the two projects >> have to offer. ?This will, in general, require perusing the websites >> for both projects as well as the relevant documentation. > > According to your experience, are there some tasks that are easier > with one than with another? > >>> Given that a task that can be done with either bioperl or biopython, >>> I, in particularly, want to know how long it will take to write the >>> code for the task in bioperl and biopython, with the same readability >>> requirement (see below) and the assumption that users have the same >>> fluency in perl and python. >> >> Again, you will want to define the task(s) to be accomplished and then >> weigh the pros and cons of each project combined with local expertise. >> ?If you don't know what you want to do, then you can certainly read >> some examples on the websites and see which project strikes you as a >> "winner" for you. >> >>> python is claimed to be good for maintainability. But perl is >>> criticized for there-are-many-ways-for-a-given-task. Since there are >>> multiple ways in perl, let us assume that we always use perl in a >>> readable way. >> >> These two statements are generalizations that provide little insight >> into the strengths or weaknesses of the languages. ?In other words, >> one can write good or bad code in both languages. >> >> Hope that helps. >> >> Sean >> > > _______________________________________________ > Biopython mailing list ?- ?Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From wgheath at gmail.com Tue Dec 29 15:16:39 2009 From: wgheath at gmail.com (William Heath) Date: Tue, 29 Dec 2009 12:16:39 -0800 Subject: [Bioperl-l] [Biopython] Comparison between bioperl and biopython? In-Reply-To: <81277ce10912291155x6dde10ewe2055b9692d077c1@mail.gmail.com> References: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com> <264855a00912290903m213d7cc4l607e8fa0bad55571@mail.gmail.com> <366c6f340912291115o58ba0b82kce74e18fecd833c8@mail.gmail.com> <81277ce10912291155x6dde10ewe2055b9692d077c1@mail.gmail.com> Message-ID: The biggest reason to go with python is the ease of use. Biologists are not programmers and the learning curve for python is much smaller than that of perl. I like perl but choose python because of this issue. Perl 6 does address some of these issues however but this has not been fully implemented as of yet. -Tim P.S. I love, love, love cpan though which is only for perl right now :( On Tue, Dec 29, 2009 at 11:55 AM, Jonathan Hilmer wrote: > Personally, I think that the differences between Python and Perl > (although substantial) are not large enough to make the language > itself the deciding factor. > > Instead, consider the larger community of software. I haven't yet > found a situation in which Python cannot be applied: it can be used > with R (statistics); lower-level code C or fortran; visualization > software such as PyMol, Chimera, Blender, VTK; plotting with > matplotlib; and scipy/numpy or sage, which provide innumerable > benefits for computation, data-processing, etc. > > Although I don't claim to have a great deal of experience with Perl, I > haven't seen the same integration with that language: I'm assuming it > can be used with R and VTK (not sure about C or fortran?). For this > reason, unless your work is highly targeted and you have no use > programming language integration with other software, I would > recommend Python. > > For perl experts, I would truly appreciate any corrections you could > offer to these observations of mine, since I wouldn't mind using perl > if it offers benefits either in general or for specific applications. > > > Jonathan > > On Tue, Dec 29, 2009 at 12:15 PM, Peng Yu wrote: > > On Tue, Dec 29, 2009 at 11:03 AM, Sean Davis > wrote: > >> On Tue, Dec 29, 2009 at 11:08 AM, Peng Yu wrote: > >>> May I ask somebody who are versitile in both bioperl and biopython > >>> comment on the pros and cons of bioperl and biopython? I'm sending > >>> this email to both bioperl and biopython mailing lists. But I hope > >>> that it will not result in any contention. > >>> > >>> I assume that the functionality between bioperl or biopython is the > >>> same, i.e., tasks can be done in bioperl can be done biopython and > >>> vice versa, as both libraries have been out there over 10 years. > >>> Please correct me if my understanding is not true. > >> > >> The two projects have similar goals, but saying that the functionality > >> is the same would be an extreme oversimplification. You will need to > >> define what you want to do and then check to see what the two projects > >> have to offer. This will, in general, require perusing the websites > >> for both projects as well as the relevant documentation. > > > > According to your experience, are there some tasks that are easier > > with one than with another? > > > >>> Given that a task that can be done with either bioperl or biopython, > >>> I, in particularly, want to know how long it will take to write the > >>> code for the task in bioperl and biopython, with the same readability > >>> requirement (see below) and the assumption that users have the same > >>> fluency in perl and python. > >> > >> Again, you will want to define the task(s) to be accomplished and then > >> weigh the pros and cons of each project combined with local expertise. > >> If you don't know what you want to do, then you can certainly read > >> some examples on the websites and see which project strikes you as a > >> "winner" for you. > >> > >>> python is claimed to be good for maintainability. But perl is > >>> criticized for there-are-many-ways-for-a-given-task. Since there are > >>> multiple ways in perl, let us assume that we always use perl in a > >>> readable way. > >> > >> These two statements are generalizations that provide little insight > >> into the strengths or weaknesses of the languages. In other words, > >> one can write good or bad code in both languages. > >> > >> Hope that helps. > >> > >> Sean > >> > > > > _______________________________________________ > > Biopython mailing list - Biopython at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython > > > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From pengyu.ut at gmail.com Wed Dec 30 12:26:45 2009 From: pengyu.ut at gmail.com (Peng Yu) Date: Thu, 31 Dec 2009 11:26:45 +1800 Subject: [Bioperl-l] How to read in the whole fasta file in the memory? Message-ID: <366c6f340912300926k5af5cc88nc3c3babda541fd1@mail.gmail.com> With Bio::SeqIO, I can only read in the records in a fasta file one by one. This is preferable if there are many records in a file. But I also want to read all the records in. I could use a while loop to read all records in. But could somebody let me know if there is a function in bioperl that can read in all the record at once and return me an object? http://www.bioperl.org/wiki/HOWTO:SeqIO From sdavis2 at mail.nih.gov Wed Dec 30 13:04:53 2009 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Wed, 30 Dec 2009 13:04:53 -0500 Subject: [Bioperl-l] How to read in the whole fasta file in the memory? In-Reply-To: <366c6f340912300926k5af5cc88nc3c3babda541fd1@mail.gmail.com> References: <366c6f340912300926k5af5cc88nc3c3babda541fd1@mail.gmail.com> Message-ID: <264855a00912301004t396e0d4fwf9d291c5d82c3fb9@mail.gmail.com> On Wed, Dec 30, 2009 at 12:26 PM, Peng Yu wrote: > With Bio::SeqIO, I can only read in the records in a fasta file one by > one. This is preferable if there are many records in a file. > > But I also want to read all the records in. I could use a while loop > to read all records in. But could somebody let me know if there is a > function in bioperl that can read in all the record at once and return > me an object? In perl, you can use an array to store the records. You could also use a hash if you have reasonable keys for the entries. Sean > http://www.bioperl.org/wiki/HOWTO:SeqIO > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jason at bioperl.org Wed Dec 30 14:58:54 2009 From: jason at bioperl.org (Jason Stajich) Date: Wed, 30 Dec 2009 11:58:54 -0800 Subject: [Bioperl-l] How to read in the whole fasta file in the memory? In-Reply-To: <264855a00912301004t396e0d4fwf9d291c5d82c3fb9@mail.gmail.com> References: <366c6f340912300926k5af5cc88nc3c3babda541fd1@mail.gmail.com> <264855a00912301004t396e0d4fwf9d291c5d82c3fb9@mail.gmail.com> Message-ID: <3550F192-111F-48A7-B1B7-113FFFAC105B@bioperl.org> or use a database object so you can retrieve sequences that have a particular id. See Bio::DB::Fasta On Dec 30, 2009, at 10:04 AM, Sean Davis wrote: > On Wed, Dec 30, 2009 at 12:26 PM, Peng Yu wrote: >> With Bio::SeqIO, I can only read in the records in a fasta file one >> by >> one. This is preferable if there are many records in a file. >> >> But I also want to read all the records in. I could use a while loop >> to read all records in. But could somebody let me know if there is a >> function in bioperl that can read in all the record at once and >> return >> me an object? > > In perl, you can use an array to store the records. You could also > use a hash if you have reasonable keys for the entries. > > Sean > > >> http://www.bioperl.org/wiki/HOWTO:SeqIO >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ From maj at fortinbras.us Wed Dec 30 16:20:31 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 30 Dec 2009 16:20:31 -0500 Subject: [Bioperl-l] How to read in the whole fasta file in the memory? In-Reply-To: <366c6f340912300926k5af5cc88nc3c3babda541fd1@mail.gmail.com> References: <366c6f340912300926k5af5cc88nc3c3babda541fd1@mail.gmail.com> Message-ID: <2646F627E6D14AADB412A6E6B51E24DA@NewLife> I think you might want Bio::AlignIO: $alnio = Bio::AlignIO->new(-file=> 'my.fas' ); $aln = $alnio->next_aln; @seqs = $aln->each_seqs; MAJ ----- Original Message ----- From: "Peng Yu" To: Sent: Wednesday, December 30, 2009 12:26 PM Subject: [Bioperl-l] How to read in the whole fasta file in the memory? > With Bio::SeqIO, I can only read in the records in a fasta file one by > one. This is preferable if there are many records in a file. > > But I also want to read all the records in. I could use a while loop > to read all records in. But could somebody let me know if there is a > function in bioperl that can read in all the record at once and return > me an object? > > http://www.bioperl.org/wiki/HOWTO:SeqIO > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From David.Messina at sbc.su.se Thu Dec 31 05:55:32 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 31 Dec 2009 11:55:32 +0100 Subject: [Bioperl-l] question about a PAML module In-Reply-To: <31992102.1262223390984.JavaMail.oracle@rif2.s.upf.edu> References: <17885902.1262198478831.JavaMail.oracle@rif1.s.upf.edu> <31992102.1262223390984.JavaMail.oracle@rif2.s.upf.edu> Message-ID: Hi Rui and Sandra, Could you file this as a bug report at http://bugzilla.open-bio.org/enter_bug.cgi?product=Bioperl ? Once you've created the bug report with a brief description of the problem and submitted it, please attach the following to the bug report: - sample input files (a sequence file and a tree file, probably) - a script which reproduces the problem - the output (error messages) like you show below When I updated the code to work with the current version, I didn't exhaustively test all of the different modes of running codeml, so I appreciate you reporting this. There was another, similar issue reported a few days ago. I will try to take a look at both of these bug reports soon. Dave From David.Messina at sbc.su.se Tue Dec 1 05:14:40 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 1 Dec 2009 11:14:40 +0100 Subject: [Bioperl-l] [Bug 2937] Strand in fasta35 output does not seem to be parsed In-Reply-To: <8D08960C647E64438CE5740657CBBDC50148731FDA@iahcexch1.iah.bbsrc.ac.uk> References: <8D08960C647E64438CE5740657CBBDC50148731E47@iahcexch1.iah.bbsrc.ac.uk> <50F0159A-DE58-4405-A2FE-4FA95A3CDDA4@sbc.su.se> <8D08960C647E64438CE5740657CBBDC50148731FDA@iahcexch1.iah.bbsrc.ac.uk> Message-ID: Hi Mick, Did you try running the test case that you had originally attached to the bug report? Or is the below from different code and a diffrent fasta output file? In any case, I'll need to look at the fasta35 output file and the parse2.pl you ran in order to reproduce and fix this -- could you please open a new bug report and attach them to it? Thanks, Dave On Nov 30, 2009, at 17:49, michael watson (IAH-C) wrote: > Hi Dave > > Just got round to looking at this. > > In bioperl-1.6.0, the strand didn't get parsed, but the module only warned about something: > > --------------------- WARNING --------------------- > MSG: Unrecognized alignment line (1) ' /usr/local/fasta3/bin/fasta35 -n -U -Q -H -A -E 2.0 -C 19 -m 0 -m 9i -O iltv_pre.fasta35 iltv_pre.fasta clusters.fasta' > --------------------------------------------------- > > However, in the bioperl-live I just downloaded, this had turned into a full-on stack trace: > > ------------- EXCEPTION ------------- > MSG: Unrecognized alignment line (1) ' /usr/local/fasta3/bin/fasta35 -n -U -Q -H -A -E 2.0 -C 19 -m 0 -m 9i -O iltv_pre.fasta35 iltv_pre.fasta clusters.fasta' > STACK Bio::SearchIO::fasta::next_result /usr/local/bioperl-live_301109//Bio/SearchIO/fasta.pm:1347 > STACK toplevel parse2.pl:20 > ------------------------------------- > > I'm not sure if this is even related to the strand issue (I suspect not, but you never know) but something changed between bioperl-1.6.0 and the live trunk I downloaded today to ensure I still can't use the module. > > Is this another bug report? > > Thanks again for all your help > > Mick > > -----Original Message----- > From: Dave Messina [mailto:David.Messina at sbc.su.se] > Sent: 23 November 2009 17:46 > To: michael watson (IAH-C) > Subject: Re: [Bug 2937] Strand in fasta35 output does not seem to be parsed > > Hi Mick, > > Sure thing -- the current build from subversion is packaged up every > night and available here: > http://www.bioperl.org/DIST/nightly_builds/ > > Just grab bioperl-live.tar.gz from there and you'll get the changes. > > > Dave > > > > > On Nov 23, 2009, at 6:34 PM, michael watson (IAH-C) wrote: > >> Hi Dave >> >> Thanks for the hard work. >> >> Trying to get the latest updates so I can use this... don't have svn >> on my server, tried to install it and I don't have python either, >> which is needed to install it. >> >> I face about 3 weeks whilst my IT department sort this out, unless I >> can access the changes any other way? >> >> Thanks >> Mick >> >> -----Original Message----- >> From: bugzilla-daemon at portal.open-bio.org [mailto:bugzilla- >> daemon at portal.open-bio.org] >> Sent: 20 November 2009 15:12 >> To: michael watson (IAH-C) >> Subject: [Bug 2937] Strand in fasta35 output does not seem to be >> parsed >> >> http://bugzilla.open-bio.org/show_bug.cgi?id=2937 >> >> >> online at davemessina.com changed: >> >> What |Removed |Added >> ---------------------------------------------------------------------------- >> Status|NEW |RESOLVED >> Resolution| |FIXED >> >> >> >> >> ------- Comment #7 from online at davemessina.com 2009-11-20 10:12 EST >> ------- >> Fixed in r16394. >> >> Michael, thanks for the report. Your test cases pass, but please >> reopen the bug >> if needed. >> >> >> -- >> Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi? >> tab=email >> ------- You are receiving this mail because: ------- >> You reported the bug, or are watching the reporter. > From e.osimo at gmail.com Tue Dec 1 13:05:48 2009 From: e.osimo at gmail.com (Emanuele Osimo) Date: Tue, 1 Dec 2009 19:05:48 +0100 Subject: [Bioperl-l] Statistics: how to obtain the p value of a T test Message-ID: <2ac05d0f0912011005n6140869aoc634ad08cdf10ca4@mail.gmail.com> Hello everyone, I'm trying to get the p value of a statistic made with Statistics::TTest I cannot find this function: I can find if the null hypothesis is rejected at a certain confidence level, but I cannot make the script show me the actual p value. Do you know other scripts that can do that? Thanks Emanuele From cjfields at illinois.edu Tue Dec 1 14:25:03 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 1 Dec 2009 13:25:03 -0600 Subject: [Bioperl-l] Fwd: [Utilities-announce] NCBI E-Utility Policy Change References: <7B6F170840CA6C4DA63EE0C8A7BB43EC09CA7387@NIHCESMLBX15.nih.gov> Message-ID: <964687F9-989B-4F11-B74B-977912A922EB@illinois.edu> I'll be adjusting the requisite parameters as indicated below. I'm reluctant to include a time-based limit on submissions (NCBI wants a max of 100 requests at peak hours), but it may become necessary if they request it. chris Begin forwarded message: > From: > Date: December 1, 2009 12:59:34 PM CST > To: > Subject: [Utilities-announce] NCBI E-Utility Policy Change > Reply-To: utilities-announce at ncbi.nlm.nih.gov > > As part of an ongoing effort to ensure efficient access to the Entrez Utilities (E-utilities) by all users, NCBI has decided to change the usage policy for the E-utilities effective June 1, 2010. Effective on June 1, 2010, all E-utility requests, either using standard URLs or SOAP, must contain non-null values for both the &tool and &email parameters. Any E-utility request made after June 1, 2010 that does not contain values for both parameters will return an error explaining that these parameters must be included in E-utility requests. > > The value of the &tool parameter should be a URI-safe string that is the name of the software package, script or web page producing the E-utility request. > > The value of the &email parameter should be a valid e-mail address for the appropriate contact person or group responsible for maintaining the tool producing the E-utility request. > > NCBI uses these parameters to contact users whose use of the E-utilities violates the standard usage policies described athttp://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html#UserSystemRequirements. These usage policies are designed to prevent excessive requests from a small group of users from reducing or eliminating the wider community's access to the E-utilities. NCBI will attempt to contact a user at the e-mail address provided in the &email parameter prior to blocking access to the E-utilities. > > NCBI realizes that this policy change will require many of our users to change their code. Based on past experience, we anticipate that most of our users should be able to make the necessary changes before the June 1, 2010 deadline. If you have any concerns about making these changes by that date, or if you have any questions about these policies, please contact eutilities at ncbi.nlm.nih.gov. > > Thank you for your understanding and cooperation in helping us continue to deliver a reliable and efficient web service. > > _______________________________________________ > Utilities-announce mailing list > http://www.ncbi.nlm.nih.gov/mailman/listinfo/utilities-announce From maj at fortinbras.us Tue Dec 1 21:27:06 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 1 Dec 2009 21:27:06 -0500 Subject: [Bioperl-l] test test test Message-ID: <95142B0024EC48928CB56A69A17A8559@NewLife> MAJ From ocarnorsk138 at gmail.com Tue Dec 1 21:59:48 2009 From: ocarnorsk138 at gmail.com (Ocar Campos) Date: Tue, 1 Dec 2009 23:59:48 -0300 Subject: [Bioperl-l] test test test In-Reply-To: <95142B0024EC48928CB56A69A17A8559@NewLife> References: <95142B0024EC48928CB56A69A17A8559@NewLife> Message-ID: test test test test back O'car Campos C. Bioinformatics Engineering Student. University of Talca. Chile. 2009/12/1 Mark A. Jensen > MAJ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From maj at fortinbras.us Tue Dec 1 22:08:23 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 1 Dec 2009 22:08:23 -0500 Subject: [Bioperl-l] test test test In-Reply-To: References: <95142B0024EC48928CB56A69A17A8559@NewLife> Message-ID: I love when people are paying attention! ----- Original Message ----- From: Ocar Campos To: Mark A. Jensen ; Bioperl Mailing List. Sent: Tuesday, December 01, 2009 9:59 PM Subject: Re: [Bioperl-l] test test test test test test test back O'car Campos C. Bioinformatics Engineering Student. University of Talca. Chile. 2009/12/1 Mark A. Jensen MAJ _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From rtbio.2009 at gmail.com Wed Dec 2 07:07:08 2009 From: rtbio.2009 at gmail.com (Roopa Raghuveer) Date: Wed, 2 Dec 2009 13:07:08 +0100 Subject: [Bioperl-l] Remote blast Message-ID: Hello everyone, I have a problem. I am new to Bioperl. I am working on RNAi tool wherein a cgi script was written which connects to NCBI blast using remote blast program,i.e., The input sequence given in the html page is taken as input and Remote blast is performed on this based on the code for Remote blast.But,I have a problem in the Remote blast code. My code goes like this @compseqs=blastcode($in{'Inputseq'}); sub blastcode { $input1= $_[0]; open(NUC,'>',$nuc); print NUC $input1; close(NUC); my $prog = 'blastn'; my $db = 'refseq_rna'; my $e_val= '1e-10'; my $organism= 'Trypanosoma Brucei'; $gb = new Bio::DB::GenBank; my @params = ( '-prog' => $prog, '-data' => $db, '-expect' => $e_val, '-readmethod' => 'SearchIO', '-Organism' => $organism ); my $factory = Bio::Tools::Run::RemoteBlast->new(@params); #change a paramter $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma brucei[ORGN]'; my $v = 1; #$v is just to turn on and off the messages my $str = Bio::SeqIO->new(-file => $nuc , '-format' => 'fasta' , '-organism' => 'Trypanosoma Brucei' ); while (my $input = $str->next_seq()) { #Blast a sequence against a database: #Alternatively, you could pass in a file with many #sequences rather than loop through sequence one at a time #Remove the loop starting 'while (my $input = $str->next_seq())' #and swap the two lines below for an example of that. my $r = $factory->submit_blast($input); print STDERR "waiting...." if($v>0); while ( my @rids = $factory->each_rid ) { foreach my $rid ( @rids ) { my $rc = $factory->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $factory->remove_rid($rid); } print STDERR "." if ( $v > 0 ); sleep 5; } else { my $result = $rc->next_result(); #save the output my $filename = $result->query_name()."\.out"; $factory->save_output($filename); $factory->remove_rid($rid); # open(BLASTDEBUGFILE,'>',$blastdebugfile); # print BLASTDEBUGFILE "Test1 $result"; # close(BLASTDEBUGFILE); open(OUTFILE,'>',$outfile); print OUTFILE "Test2 $result->database_name()"; close(OUTFILE); while ( my $hit = $result->next_hit ) { next unless ( $v > 0); # open(OUTFILE,'>',$outfile); # print OUTFILE "in while hits"; #close(OUTFILE); my $sequ = $gb->get_Seq_by_version($hit->name); my $dna = $sequ->seq(); # get the sequence as a string push(@seqs,$dna); } } } } } # open(OUTFILE,'>',$outfile); #print OUTFILE $seqs[0]; # close(OUTFILE); return(@seqs); } Here in the above code,my program is able to go till the 'else' part and writing the output file i.e.,this step. my $filename = $result->query_name()."\.out"; But when I tried to enter in to the next while loop where I can get the hits,the program is not entering into the while loop i.e., Not entering into this while ( my $hit = $result->next_hit ) { next unless ( $v > 0); Hence I am unable to get any hits for my query. Ex:-If the query's accession number is Tb11.02.2210, I could just get a file Tb11.02.2210.out file,it is just displaying the file name on the browser. Please help me in solving this problem and mail me regarding any confusions. Regards, Roopa. From ashvip at gmail.com Wed Dec 2 00:24:09 2009 From: ashvip at gmail.com (Vipin Singh) Date: Wed, 2 Dec 2009 10:54:09 +0530 Subject: [Bioperl-l] Problems with installation Message-ID: <8d766b180912012124q44c58f62hecc598615f65e99c@mail.gmail.com> Dear Sir/Madam, I have not been able to install bioperl on my Windows 32 machine despite repeated attempts. I have tried both Active Perl and Strwaberry perl but both do not seem to work. I have followed the instruction given at -- http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows Please guide. Thanks, Vipin. Vipin Singh, Senior Research Fellow, Centre for Cellular and Molecular Biology, Hyderabad - 500007 India. contact - 91-040-27192778 From scott at scottcain.net Wed Dec 2 09:18:37 2009 From: scott at scottcain.net (Scott Cain) Date: Wed, 2 Dec 2009 09:18:37 -0500 Subject: [Bioperl-l] Problems with installation In-Reply-To: <8d766b180912012124q44c58f62hecc598615f65e99c@mail.gmail.com> References: <8d766b180912012124q44c58f62hecc598615f65e99c@mail.gmail.com> Message-ID: <4536f7700912020618y31f8fa15i6e01ce9614a87341@mail.gmail.com> Hello Vipin, "do not seem to work" doesn't give us much to go on; can you tell us what happened? Scott On Wed, Dec 2, 2009 at 12:24 AM, Vipin Singh wrote: > Dear Sir/Madam, > I have not been able to install bioperl on my Windows 32 machine despite > repeated attempts. I have tried both Active Perl and Strwaberry perl but > both do not seem to work. > I have followed the instruction given at > -- http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows > > Please guide. > Thanks, > Vipin. > Vipin Singh, > Senior Research Fellow, > Centre for Cellular and Molecular Biology, > Hyderabad - 500007 > India. > contact - 91-040-27192778 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From maj at fortinbras.us Wed Dec 2 09:18:31 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 2 Dec 2009 09:18:31 -0500 Subject: [Bioperl-l] Problems with installation In-Reply-To: <8d766b180912012124q44c58f62hecc598615f65e99c@mail.gmail.com> References: <8d766b180912012124q44c58f62hecc598615f65e99c@mail.gmail.com> Message-ID: <4A3B25FFC79F43E1AF65E56FD1630F44@NewLife> Hi Vipin-- We need some more information; your commands, error messages you received. Thanks, Mark ----- Original Message ----- From: "Vipin Singh" To: Sent: Wednesday, December 02, 2009 12:24 AM Subject: [Bioperl-l] Problems with installation > Dear Sir/Madam, > I have not been able to install bioperl on my Windows 32 machine despite > repeated attempts. I have tried both Active Perl and Strwaberry perl but > both do not seem to work. > I have followed the instruction given at > -- http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows > > Please guide. > Thanks, > Vipin. > Vipin Singh, > Senior Research Fellow, > Centre for Cellular and Molecular Biology, > Hyderabad - 500007 > India. > contact - 91-040-27192778 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From bcantarel at som.umaryland.edu Wed Dec 2 13:36:27 2009 From: bcantarel at som.umaryland.edu (Brandi Cantarel) Date: Wed, 2 Dec 2009 13:36:27 -0500 Subject: [Bioperl-l] Parsing Genbank Message-ID: Hi all, I am not sure if this is normal, but when I use SEQIO to parse genbank files, it changes the coordinates of things on the minus strand. For example, I have a sequence that has a CDS on the minus strand at it is from 911 to 974. The sequence is 974 nt. x $cds->start 1 x $cds->end 64 How can I get the original coordinates? Is there a command for that or will I have to just do the math? Feature or Bug? ~~~~~~~~~~~~~~~~~~~~ Brandi Cantarel, PhD Bioinformatics Analyst Institute for Genome Sciences School of Medicine University of Maryland, Baltimore From maj at fortinbras.us Wed Dec 2 14:09:11 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 2 Dec 2009 14:09:11 -0500 Subject: [Bioperl-l] Parsing Genbank In-Reply-To: References: Message-ID: Hi Brandi- If $cds is a Bio::SeqFeature::Generic, that's weird (I believe); if its an ordinary Bio::Seq, that's normal. Can you elaborate by posting your code? cheers, MAJ ----- Original Message ----- From: "Brandi Cantarel" To: Sent: Wednesday, December 02, 2009 1:36 PM Subject: [Bioperl-l] Parsing Genbank > Hi all, > I am not sure if this is normal, but when I use SEQIO to parse genbank files, > it changes the coordinates of things on the minus strand. > > > For example, I have a sequence that has a CDS on the minus strand at it is > from 911 to 974. The sequence is 974 nt. > > x $cds->start > 1 > x $cds->end > 64 > > How can I get the original coordinates? Is there a command for that or will I > have to just do the math? > > Feature or Bug? > > > ~~~~~~~~~~~~~~~~~~~~ > Brandi Cantarel, PhD > Bioinformatics Analyst > Institute for Genome Sciences > School of Medicine > University of Maryland, Baltimore > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From bcantarel at som.umaryland.edu Wed Dec 2 14:29:56 2009 From: bcantarel at som.umaryland.edu (Brandi Cantarel) Date: Wed, 2 Dec 2009 14:29:56 -0500 Subject: [Bioperl-l] Parsing Genbank In-Reply-To: References: Message-ID: <854933BD-4EF5-4A74-8CE2-62E11AD64B6A@som.umaryland.edu> Here is some of my code, the real code actually enters the data into a database. $in = Bio::SeqIO->new(-file => $gbkfile, '-format' => 'genbank'); W1:while (my $seq = $in->next_seq()) { my @feats = $seq->get_all_SeqFeatures(); my $j = 0; F1:foreach $cds (@feats) { next F1 unless ($cds->primary_tag() eq 'CDS'); #do something with the cds start and cds end } } LOCUS subjpool12_contig3 974 bp DNA linear UNK 19-Nov-2009 ACCESSION subjpool12_contig3 KEYWORDS . SOURCE human metagenome ORGANISM human metagenome unclassified sequences; organismal metagenomes,metagenomes. FEATURES Location/Qualifiers source 1..974 /mol_type="genomic DNA" /isolation_source="Homo sapiens" /organism="human metagenome" /collection_date="19-Nov-2009" CDS complement(911..974) /locus_tag="subjpool12_contig3|metagene|gene_2" /translation="IRIMTVELINPYIRHVEHST" /score="2.52804" /product="hypothetical protein" /note="score=2.52804" /note="score=2.52804" /note="frame=1" ORIGIN #some sequence?. >From this example, I would like to get the coordinates 911 and 974, rather than 1 and 64. ~~~~~~~~~~~~~~~~~~~~ Brandi Cantarel, PhD Bioinformatics Analyst Institute for Genome Sciences School of Medicine University of Maryland, Baltimore On Dec 2, 2009, at 2:09 PM, Mark A. Jensen wrote: > Hi Brandi- > If $cds is a Bio::SeqFeature::Generic, that's weird (I believe); if its an ordinary Bio::Seq, that's normal. > Can you elaborate by posting your code? > cheers, > MAJ > ----- Original Message ----- From: "Brandi Cantarel" > To: > Sent: Wednesday, December 02, 2009 1:36 PM > Subject: [Bioperl-l] Parsing Genbank > > >> Hi all, >> I am not sure if this is normal, but when I use SEQIO to parse genbank files, it changes the coordinates of things on the minus strand. >> >> >> For example, I have a sequence that has a CDS on the minus strand at it is from 911 to 974. The sequence is 974 nt. >> >> x $cds->start >> 1 >> x $cds->end >> 64 >> >> How can I get the original coordinates? Is there a command for that or will I have to just do the math? >> >> Feature or Bug? >> >> >> ~~~~~~~~~~~~~~~~~~~~ >> Brandi Cantarel, PhD >> Bioinformatics Analyst >> Institute for Genome Sciences >> School of Medicine >> University of Maryland, Baltimore >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > From maj at fortinbras.us Wed Dec 2 14:48:44 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 2 Dec 2009 14:48:44 -0500 Subject: [Bioperl-l] Parsing Genbank In-Reply-To: <854933BD-4EF5-4A74-8CE2-62E11AD64B6A@som.umaryland.edu> References: <854933BD-4EF5-4A74-8CE2-62E11AD64B6A@som.umaryland.edu> Message-ID: <24B3D1A1667D44338CDE5A4FFE425C56@NewLife> with fake seq data and that header, I don't get a problem: DB<2> x $cds->location 0 Bio::Location::Simple=HASH(0x37b1df4) '_end' => 974 '_location_type' => 'EXACT' '_root_verbose' => 0 '_seqid' => 'subjpool12_contig3' '_start' => 911 '_strand' => '-1' Are you using the latest BioPerl (1.6.1 or the trunk) ? MAJ ----- Original Message ----- From: "Brandi Cantarel" Cc: Sent: Wednesday, December 02, 2009 2:29 PM Subject: Re: [Bioperl-l] Parsing Genbank Here is some of my code, the real code actually enters the data into a database. $in = Bio::SeqIO->new(-file => $gbkfile, '-format' => 'genbank'); W1:while (my $seq = $in->next_seq()) { my @feats = $seq->get_all_SeqFeatures(); my $j = 0; F1:foreach $cds (@feats) { next F1 unless ($cds->primary_tag() eq 'CDS'); ###>> debugger stops here for above output #do something with the cds start and cds end } } LOCUS subjpool12_contig3 974 bp DNA linear UNK 19-Nov-2009 ACCESSION subjpool12_contig3 KEYWORDS . SOURCE human metagenome ORGANISM human metagenome unclassified sequences; organismal metagenomes,metagenomes. FEATURES Location/Qualifiers source 1..974 /mol_type="genomic DNA" /isolation_source="Homo sapiens" /organism="human metagenome" /collection_date="19-Nov-2009" CDS complement(911..974) /locus_tag="subjpool12_contig3|metagene|gene_2" /translation="IRIMTVELINPYIRHVEHST" /score="2.52804" /product="hypothetical protein" /note="score=2.52804" /note="score=2.52804" /note="frame=1" ORIGIN #some sequence?. >From this example, I would like to get the coordinates 911 and 974, rather than >1 and 64. ~~~~~~~~~~~~~~~~~~~~ Brandi Cantarel, PhD Bioinformatics Analyst Institute for Genome Sciences School of Medicine University of Maryland, Baltimore On Dec 2, 2009, at 2:09 PM, Mark A. Jensen wrote: > Hi Brandi- > If $cds is a Bio::SeqFeature::Generic, that's weird (I believe); if its an > ordinary Bio::Seq, that's normal. > Can you elaborate by posting your code? > cheers, > MAJ > ----- Original Message ----- From: "Brandi Cantarel" > > To: > Sent: Wednesday, December 02, 2009 1:36 PM > Subject: [Bioperl-l] Parsing Genbank > > >> Hi all, >> I am not sure if this is normal, but when I use SEQIO to parse genbank files, >> it changes the coordinates of things on the minus strand. >> >> >> For example, I have a sequence that has a CDS on the minus strand at it is >> from 911 to 974. The sequence is 974 nt. >> >> x $cds->start >> 1 >> x $cds->end >> 64 >> >> How can I get the original coordinates? Is there a command for that or will >> I have to just do the math? >> >> Feature or Bug? >> >> >> ~~~~~~~~~~~~~~~~~~~~ >> Brandi Cantarel, PhD >> Bioinformatics Analyst >> Institute for Genome Sciences >> School of Medicine >> University of Maryland, Baltimore >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Dec 2 14:39:40 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 2 Dec 2009 13:39:40 -0600 Subject: [Bioperl-l] Parsing Genbank In-Reply-To: <854933BD-4EF5-4A74-8CE2-62E11AD64B6A@som.umaryland.edu> References: <854933BD-4EF5-4A74-8CE2-62E11AD64B6A@som.umaryland.edu> Message-ID: <0E82A338-9D28-4685-A7DA-5019060D96F5@illinois.edu> That one's odd; the coordinates should relate back to the original sequence. Any chance you could pass on the sequence file so we can confirm it? you can do this off-list if the information is sensitive, or you can create a faux sequence that has the same problem). chris On Dec 2, 2009, at 1:29 PM, Brandi Cantarel wrote: > Here is some of my code, the real code actually enters the data into a database. > > > $in = Bio::SeqIO->new(-file => $gbkfile, > '-format' => 'genbank'); > > W1:while (my $seq = $in->next_seq()) { > my @feats = $seq->get_all_SeqFeatures(); > my $j = 0; > F1:foreach $cds (@feats) { > next F1 unless ($cds->primary_tag() eq 'CDS'); > #do something with the cds start and cds end > } > } > > > LOCUS subjpool12_contig3 974 bp DNA linear UNK 19-Nov-2009 > ACCESSION subjpool12_contig3 > KEYWORDS . > SOURCE human metagenome > ORGANISM human metagenome > unclassified sequences; organismal metagenomes,metagenomes. > FEATURES Location/Qualifiers > source 1..974 > /mol_type="genomic DNA" > /isolation_source="Homo sapiens" > /organism="human metagenome" > /collection_date="19-Nov-2009" > CDS complement(911..974) > /locus_tag="subjpool12_contig3|metagene|gene_2" > /translation="IRIMTVELINPYIRHVEHST" > /score="2.52804" > /product="hypothetical protein" > /note="score=2.52804" > /note="score=2.52804" > /note="frame=1" > ORIGIN > #some sequence?. > > > > >> From this example, I would like to get the coordinates 911 and 974, rather than 1 and 64. > > > > ~~~~~~~~~~~~~~~~~~~~ > Brandi Cantarel, PhD > Bioinformatics Analyst > Institute for Genome Sciences > School of Medicine > University of Maryland, Baltimore > > On Dec 2, 2009, at 2:09 PM, Mark A. Jensen wrote: > >> Hi Brandi- >> If $cds is a Bio::SeqFeature::Generic, that's weird (I believe); if its an ordinary Bio::Seq, that's normal. >> Can you elaborate by posting your code? >> cheers, >> MAJ >> ----- Original Message ----- From: "Brandi Cantarel" >> To: >> Sent: Wednesday, December 02, 2009 1:36 PM >> Subject: [Bioperl-l] Parsing Genbank >> >> >>> Hi all, >>> I am not sure if this is normal, but when I use SEQIO to parse genbank files, it changes the coordinates of things on the minus strand. >>> >>> >>> For example, I have a sequence that has a CDS on the minus strand at it is from 911 to 974. The sequence is 974 nt. >>> >>> x $cds->start >>> 1 >>> x $cds->end >>> 64 >>> >>> How can I get the original coordinates? Is there a command for that or will I have to just do the math? >>> >>> Feature or Bug? >>> >>> >>> ~~~~~~~~~~~~~~~~~~~~ >>> Brandi Cantarel, PhD >>> Bioinformatics Analyst >>> Institute for Genome Sciences >>> School of Medicine >>> University of Maryland, Baltimore >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Wed Dec 2 15:52:28 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 2 Dec 2009 15:52:28 -0500 Subject: [Bioperl-l] Parsing Genbank In-Reply-To: <001B6793-D1C3-46EF-AA96-CCA1B684AD8E@som.umaryland.edu> References: <854933BD-4EF5-4A74-8CE2-62E11AD64B6A@som.umaryland.edu> <24B3D1A1667D44338CDE5A4FFE425C56@NewLife> <001B6793-D1C3-46EF-AA96-CCA1B684AD8E@som.umaryland.edu> Message-ID: <07332179362A4D53ACAA9A72AD208049@NewLife> Yes, 1.006 is 1.6. There is a later update 1.6.1, but it sounds as if there is a bug. If you can provide data that can reproduce it, as Chris suggests, we can get onto it. thanks MAJ ----- Original Message ----- From: Brandi Cantarel To: Mark A. Jensen Sent: Wednesday, December 02, 2009 3:38 PM Subject: Re: [Bioperl-l] Parsing Genbank How can I tell what version I am using?When I use the command from the website: perl -MBio::Root::Version -e 'printf "%vd\n", $Bio::Root::Version::VERSION' I get 1.006, but the bioperl lib was updated in July, so probably 1.6.0 version since that was the last stable release?. Brandi On Dec 2, 2009, at 2:48 PM, Mark A. Jensen wrote: with fake seq data and that header, I don't get a problem: DB<2> x $cds->location 0 Bio::Location::Simple=HASH(0x37b1df4) '_end' => 974 '_location_type' => 'EXACT' '_root_verbose' => 0 '_seqid' => 'subjpool12_contig3' '_start' => 911 '_strand' => '-1' Are you using the latest BioPerl (1.6.1 or the trunk) ? MAJ ----- Original Message ----- From: "Brandi Cantarel" Cc: Sent: Wednesday, December 02, 2009 2:29 PM Subject: Re: [Bioperl-l] Parsing Genbank Here is some of my code, the real code actually enters the data into a database. $in = Bio::SeqIO->new(-file => $gbkfile, '-format' => 'genbank'); W1:while (my $seq = $in->next_seq()) { my @feats = $seq->get_all_SeqFeatures(); my $j = 0; F1:foreach $cds (@feats) { next F1 unless ($cds->primary_tag() eq 'CDS'); ###>> debugger stops here for above output #do something with the cds start and cds end } } LOCUS subjpool12_contig3 974 bp DNA linear UNK 19-Nov-2009 ACCESSION subjpool12_contig3 KEYWORDS . SOURCE human metagenome ORGANISM human metagenome unclassified sequences; organismal metagenomes,metagenomes. FEATURES Location/Qualifiers source 1..974 /mol_type="genomic DNA" /isolation_source="Homo sapiens" /organism="human metagenome" /collection_date="19-Nov-2009" CDS complement(911..974) /locus_tag="subjpool12_contig3|metagene|gene_2" /translation="IRIMTVELINPYIRHVEHST" /score="2.52804" /product="hypothetical protein" /note="score=2.52804" /note="score=2.52804" /note="frame=1" ORIGIN #some sequence?. From this example, I would like to get the coordinates 911 and 974, rather than 1 and 64. ~~~~~~~~~~~~~~~~~~~~ Brandi Cantarel, PhD Bioinformatics Analyst Institute for Genome Sciences School of Medicine University of Maryland, Baltimore On Dec 2, 2009, at 2:09 PM, Mark A. Jensen wrote: Hi Brandi- If $cds is a Bio::SeqFeature::Generic, that's weird (I believe); if its an ordinary Bio::Seq, that's normal. Can you elaborate by posting your code? cheers, MAJ ----- Original Message ----- From: "Brandi Cantarel" To: Sent: Wednesday, December 02, 2009 1:36 PM Subject: [Bioperl-l] Parsing Genbank Hi all, I am not sure if this is normal, but when I use SEQIO to parse genbank files, it changes the coordinates of things on the minus strand. For example, I have a sequence that has a CDS on the minus strand at it is from 911 to 974. The sequence is 974 nt. x $cds->start 1 x $cds->end 64 How can I get the original coordinates? Is there a command for that or will I have to just do the math? Feature or Bug? ~~~~~~~~~~~~~~~~~~~~ Brandi Cantarel, PhD Bioinformatics Analyst Institute for Genome Sciences School of Medicine University of Maryland, Baltimore _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Dec 2 16:07:58 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 2 Dec 2009 15:07:58 -0600 Subject: [Bioperl-l] Parsing Genbank In-Reply-To: <07332179362A4D53ACAA9A72AD208049@NewLife> References: <854933BD-4EF5-4A74-8CE2-62E11AD64B6A@som.umaryland.edu> <24B3D1A1667D44338CDE5A4FFE425C56@NewLife> <001B6793-D1C3-46EF-AA96-CCA1B684AD8E@som.umaryland.edu> <07332179362A4D53ACAA9A72AD208049@NewLife> Message-ID: <23AE9399-B370-4DB3-94AA-AC8021AF321E@illinois.edu> One never knows, but I would be very surprised if this somehow snuck by the test suite we have, particularly since Gbrowse extensively uses SeqFeatures (any changes should have popped out along the way). Not much we can do unless we have something to help confirm the problem. Also might help to know the source of the genbank file itself. chris On Dec 2, 2009, at 2:52 PM, Mark A. Jensen wrote: > Yes, 1.006 is 1.6. There is a later update 1.6.1, but it sounds > as if there is a bug. If you can provide data that can reproduce > it, as Chris suggests, we can get onto it. > thanks MAJ > ----- Original Message ----- > From: Brandi Cantarel > To: Mark A. Jensen > Sent: Wednesday, December 02, 2009 3:38 PM > Subject: Re: [Bioperl-l] Parsing Genbank > > > How can I tell what version I am using?When I use the command from the website: > > > perl -MBio::Root::Version -e 'printf "%vd\n", $Bio::Root::Version::VERSION' > > > I get 1.006, but the bioperl lib was updated in July, so probably 1.6.0 version since that was the last stable release?. > > > Brandi > > > > > On Dec 2, 2009, at 2:48 PM, Mark A. Jensen wrote: > > > with fake seq data and that header, I don't get a problem: > > DB<2> x $cds->location > 0 Bio::Location::Simple=HASH(0x37b1df4) > '_end' => 974 > '_location_type' => 'EXACT' > '_root_verbose' => 0 > '_seqid' => 'subjpool12_contig3' > '_start' => 911 > '_strand' => '-1' > > Are you using the latest BioPerl (1.6.1 or the trunk) ? > MAJ > ----- Original Message ----- From: "Brandi Cantarel" > Cc: > Sent: Wednesday, December 02, 2009 2:29 PM > Subject: Re: [Bioperl-l] Parsing Genbank > > > Here is some of my code, the real code actually enters the data into a database. > > > $in = Bio::SeqIO->new(-file => $gbkfile, > '-format' => 'genbank'); > > W1:while (my $seq = $in->next_seq()) { > my @feats = $seq->get_all_SeqFeatures(); > my $j = 0; > F1:foreach $cds (@feats) { > next F1 unless ($cds->primary_tag() eq 'CDS'); > ###>> debugger stops here for above output > > #do something with the cds start and cds end > } > } > > > LOCUS subjpool12_contig3 974 bp DNA linear UNK 19-Nov-2009 > ACCESSION subjpool12_contig3 > KEYWORDS . > SOURCE human metagenome > ORGANISM human metagenome > unclassified sequences; organismal metagenomes,metagenomes. > FEATURES Location/Qualifiers > source 1..974 > /mol_type="genomic DNA" > /isolation_source="Homo sapiens" > /organism="human metagenome" > /collection_date="19-Nov-2009" > CDS complement(911..974) > /locus_tag="subjpool12_contig3|metagene|gene_2" > /translation="IRIMTVELINPYIRHVEHST" > /score="2.52804" > /product="hypothetical protein" > /note="score=2.52804" > /note="score=2.52804" > /note="frame=1" > ORIGIN > #some sequence?. > > > > > > From this example, I would like to get the coordinates 911 and 974, rather than 1 and 64. > > > > > ~~~~~~~~~~~~~~~~~~~~ > Brandi Cantarel, PhD > Bioinformatics Analyst > Institute for Genome Sciences > School of Medicine > University of Maryland, Baltimore > > On Dec 2, 2009, at 2:09 PM, Mark A. Jensen wrote: > > > Hi Brandi- > > If $cds is a Bio::SeqFeature::Generic, that's weird (I believe); if its an ordinary Bio::Seq, that's normal. > > Can you elaborate by posting your code? > > cheers, > > MAJ > > ----- Original Message ----- From: "Brandi Cantarel" > > To: > > Sent: Wednesday, December 02, 2009 1:36 PM > > Subject: [Bioperl-l] Parsing Genbank > > > > > > Hi all, > > I am not sure if this is normal, but when I use SEQIO to parse genbank files, it changes the coordinates of things on the minus strand. > > > > > > For example, I have a sequence that has a CDS on the minus strand at it is from 911 to 974. The sequence is 974 nt. > > > > x $cds->start > > 1 > > x $cds->end > > 64 > > > > How can I get the original coordinates? Is there a command for that or will I have to just do the math? > > > > Feature or Bug? > > > > > > ~~~~~~~~~~~~~~~~~~~~ > > Brandi Cantarel, PhD > > Bioinformatics Analyst > > Institute for Genome Sciences > > School of Medicine > > University of Maryland, Baltimore > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From lstein at cshl.edu Thu Dec 3 05:31:31 2009 From: lstein at cshl.edu (Lincoln Stein) Date: Thu, 3 Dec 2009 05:31:31 -0500 Subject: [Bioperl-l] modENCODE seeking data managers Message-ID: <6dce9a0b0912030231p740d0ecbj4a7e79a6ab71801d@mail.gmail.com> Hi All, My apologies for spamming the list, but this announcement may be of interest: The modENCODE Data Coordinating Center (Model Organism Encylopedia of DNA Elements; www.modencode.org) is seeking data managers to gather and curate large scale functional genomics data sets in fly and worm. For details, see http://blog.modencode.org/?p=350. Lincoln -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa From dan.bolser at gmail.com Thu Dec 3 06:44:40 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Thu, 3 Dec 2009 11:44:40 +0000 Subject: [Bioperl-l] Merging separate sequence and quality files to FASTQ ? Message-ID: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com> Hi, can someone test the script here on zero length fasta / qual files? http://www.bioperl.org/wiki/Merging_separate_sequence_and_quality_files_to_FASTQ It seems the output has an extra newline in the sequence part of the output (which throws off scripts that rely on the 'four lines per record' structure of the fastq (although I'm not sure if it's illegal fastq). Here is what I see BEGIN $ head one.fna >FVF7ZWH02PFOVG length=0 xy=2116_2074 region=2 $ head one.qual >FVF7ZWH02PFOVG length=0 xy=2116_2074 region=2 $ createFastq.plx one.fna one.qual @FVF7ZWH02PFOVG +FVF7ZWH02PFOVG END Currently I just put in a clause in the script to skip any zero length sequences, but I think the Qual shouldn't output an extra newline like this. Cheers, Dan. -- JHB: Bioinformatics is Biology and Biology is Bioinformatics. From biopython at maubp.freeserve.co.uk Thu Dec 3 07:12:15 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 3 Dec 2009 12:12:15 +0000 Subject: [Bioperl-l] Merging separate sequence and quality files to FASTQ ? In-Reply-To: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com> References: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com> Message-ID: <320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com> On Thu, Dec 3, 2009 at 11:44 AM, Dan Bolser wrote: > Hi, can someone test the script here on zero length fasta / qual files? > > http://www.bioperl.org/wiki/Merging_separate_sequence_and_quality_files_to_FASTQ > > It seems the output has an extra newline in the sequence part of the > output (which throws off scripts that rely on the 'four lines per > record' structure of the fastq (although I'm not sure if it's illegal > fastq). Hi Dan, The OBF consensus was FASTQ records with a zero length sequence might be useful, and should be output as exactly four lines (one blank sequence line, one blank quality line). However for parsing, any number of blank lines should be OK. http://lists.open-bio.org/pipermail/open-bio-l/2009-July/000522.html I can confirm the perl script currently outputs a FASTQ file with TWO blank lines for the sequence, giving five lines in total for the zero length record. That does suggest a bug. What version of BioPerl are you running? Peter P.S. The script is throwing away any description after the identifier. From dan.bolser at gmail.com Thu Dec 3 08:07:27 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Thu, 3 Dec 2009 13:07:27 +0000 Subject: [Bioperl-l] Merging separate sequence and quality files to FASTQ ? In-Reply-To: <320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com> References: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com> <320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com> Message-ID: <2c8757af0912030507h1b3d12b8p60c8b405792d6579@mail.gmail.com> 2009/12/3 Peter : > On Thu, Dec 3, 2009 at 11:44 AM, Dan Bolser wrote: >> Hi, can someone test the script here on zero length fasta / qual files? >> >> http://www.bioperl.org/wiki/Merging_separate_sequence_and_quality_files_to_FASTQ >> >> It seems the output has an extra newline in the sequence part of the >> output (which throws off scripts that rely on the 'four lines per >> record' structure of the fastq (although I'm not sure if it's illegal >> fastq). > > Hi Dan, > > The OBF consensus was FASTQ records with a zero length > sequence might be useful, and should be output as exactly > four lines (one blank sequence line, one blank quality line). > However for parsing, any number of blank lines should be OK. > http://lists.open-bio.org/pipermail/open-bio-l/2009-July/000522.html > > I can confirm the perl script currently outputs a FASTQ file > with TWO blank lines for the sequence, giving five lines in > total for the zero length record. That does suggest a bug. > What version of BioPerl are you running? Hi Peter, Basically, I'm not running the 'latest' version of BP, which is why I asked this question of the list rather than filing a bug report. What version are you running? ;-) Sounds like 5 lines instead of the expected 4 is a minor bug. (Thanks for the info). > Peter > > P.S. The script is throwing away any description after the > identifier. That's probably bad. Feel free to edit the script on the wiki. Sadly, MediaWiki's diff features are less than optimal, so developing scripts on the wiki isn't ideal. Anyone know how to plug git-hub into a script apparently hosted on a wiki? Or is git-hub basically designed to be 'wiki for code'? I'm wondering, because with the FlaggedRevs extension you could basically build a whole release in the wiki. Which would be fun if nothing else! -- JHP: Biology is bioinformatics and bioinformatics is biology. From heyne at informatik.uni-freiburg.de Thu Dec 3 08:19:51 2009 From: heyne at informatik.uni-freiburg.de (Steffen Heyne) Date: Thu, 03 Dec 2009 14:19:51 +0100 Subject: [Bioperl-l] problem with alignments and sequence locations In-Reply-To: References: <4AF962AA.7060908@informatik.uni-freiburg.de> Message-ID: <4B17BAF7.2050604@informatik.uni-freiburg.de> Hello, so I tried to fix the problem with the location. Currently it works for me with the following changes: LocatableSeq.pm sub get_nse{ ... my $ret; if ($self->strand() >= 0) { $ret = $id . $v. $char1 . $st . $char2 . $end ; } else { $ret = $id . $v. $char1 . $end . $char2 . $st ; } return $ret; } Then I recognized during the usage of $aln->remove_seq() that it cannot remove a seq as it uses a wrong NSE to lookup sequences. I changed the following: SimpleAlign.pm sub remove_seq { ... $id = $seq->id(); $start = $seq->start(); $end = $seq->end(); ## changed code: my $v = $seq->version ? '.'.$seq->version : ''; if ($seq->strand >=0){ $name = sprintf("%s%s/%d-%d",$id,$v,$start,$end); } elsif ($seq->strand == -1){ $name = sprintf("%s%s/%d-%d",$id,$v,$end,$start); } ... } The above code in LocatableSeq.pm worked in the case if I read an alignment in stockholm format and write it out in clustalw format. But if I read an alignment in clustalw and write it out as stockholm (or something else) it didn't worked, as the strand is not correctly set in ClustalW::next_aln. It works with the following changes: ClustalW.pm sub next_aln{ ... my ( $sname, $start, $end, $strand ); ## strand added $strand = 0; ## new, standard = 0??? foreach my $name ( sort { $order{$a} <=> $order{$b} } keys %alignments ) { if ( $name =~ /(\S+):(\d+)-(\d+)/ ) { ( $sname, $start, $end ) = ( $1, $2, $3 ); $strand = 1; ## new if ($start > $end) { ## new ($start, $end, $strand) = ($end, $start, -1); ##new } ## new } else { ( $sname, $start ) = ( $name, 1 ); my $str = $alignments{$name}; $str =~ s/[^A-Za-z]//g; $end = length($str); } my $seq = Bio::LocatableSeq->new( -seq => $alignments{$name}, -id => $sname, -start => $start, -end => $end, -strand=> $strand ## new ); ... } So I don't know if I changed things at their correct position. And I found them only because I used certain functions. I dont know how broad the effect of a changed NSE in LocatableSeq.pm is to other Modules and functions. But I'm happy with my changes (so far :-)...). Do you will change this to your proposed way in bioperl trunk? Thanks! steffen Chris Fields schrieb: > On Nov 10, 2009, at 6:55 AM, Steffen Heyne wrote: > >> Hi, >> >> I'm using Bioperl for my research and it is very useful! Thank you! >> >> Currently I have a problem with locations tags of sequences. I read in >> seed alignments of Rfam (in stockholm format, but I think it is >> similar to other formats). >> >> If the location is like: >> >> AB194432.1/908-846 >> >> the start/end values are changed to >> >> $seq->start = 846 >> $seq->end = 908 >> >> and therefore the new location (e.g.$seq->get_nse) is: >> >> AB194432.1/846-908 >> >> The $seq->strand tag is correctly set to -1 in this case, but if the >> alignment is written out again (clustal, stockholm,...) this strand >> info is lost and the sequences have this "wrong" location. But this >> information is important in respect to the sequence accession number. >> >> Is there a way to set the location back to the original one or is this >> behavior desired? Any manually setting with $seq->start($val) failed >> due to automatic checking. >> >> I'm using bioperl 1.6.1 >> >> Thanks! >> >> steffen > > This is a definite bug. We recently discussed amending the NSE format > due to this (the subject came up over the last few months or so); it's > fallen through the cracks. Fortunaely it is very easy to fix (the > relevant method is in LocatableSeq). > > Does anyone have a problem with me adding this in? It will change > output for only those instances where the strand is -1, so > > AB194432.1/908-846 > > would be start = 846, end = 908, strand = -1 > > AB194432.1/846-908 > > would be start = 846, end = 908, strand = 1 > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- --- Steffen Heyne, Dipl.-Bioinf. Lehrstuhl f?r Bioinformatik Institut f?r Informatik Albert-Ludwigs-Universit?t Freiburg Georges-K?hler-Allee 106 79110 Freiburg, Germany Tel: (+49) 761 203 7465 Fax: (+49) 761 203 7462 Mail: heyne at informatik.uni-freiburg.de From cjfields at illinois.edu Thu Dec 3 08:47:32 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 3 Dec 2009 07:47:32 -0600 Subject: [Bioperl-l] Merging separate sequence and quality files to FASTQ ? In-Reply-To: <2c8757af0912030507h1b3d12b8p60c8b405792d6579@mail.gmail.com> References: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com> <320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com> <2c8757af0912030507h1b3d12b8p60c8b405792d6579@mail.gmail.com> Message-ID: <747E7628-2A72-4064-A7B9-65A904ACDFE7@illinois.edu> Dan, On Dec 3, 2009, at 7:07 AM, Dan Bolser wrote: > 2009/12/3 Peter : >> On Thu, Dec 3, 2009 at 11:44 AM, Dan Bolser wrote: >>> Hi, can someone test the script here on zero length fasta / qual files? >>> >>> http://www.bioperl.org/wiki/Merging_separate_sequence_and_quality_files_to_FASTQ >>> >>> It seems the output has an extra newline in the sequence part of the >>> output (which throws off scripts that rely on the 'four lines per >>> record' structure of the fastq (although I'm not sure if it's illegal >>> fastq). >> >> Hi Dan, >> >> The OBF consensus was FASTQ records with a zero length >> sequence might be useful, and should be output as exactly >> four lines (one blank sequence line, one blank quality line). >> However for parsing, any number of blank lines should be OK. >> http://lists.open-bio.org/pipermail/open-bio-l/2009-July/000522.html >> >> I can confirm the perl script currently outputs a FASTQ file >> with TWO blank lines for the sequence, giving five lines in >> total for the zero length record. That does suggest a bug. >> What version of BioPerl are you running? > > Hi Peter, > > Basically, I'm not running the 'latest' version of BP, which is why I > asked this question of the list rather than filing a bug report. What > version are you running? ;-) > > Sounds like 5 lines instead of the expected 4 is a minor bug. (Thanks > for the info). FASTQ parsing had undergone a major revision prior to 1.6.1 (the latest release in CPAN). Basically, it now parses all three FASTQ variants. However, Peter indicates there may still be a problem, and it's likely he's running 1.6.1. Peter can you confirm that? >> Peter >> >> P.S. The script is throwing away any description after the >> identifier. > > That's probably bad. Feel free to edit the script on the wiki. Sadly, > MediaWiki's diff features are less than optimal, so developing scripts > on the wiki isn't ideal. Anyone know how to plug git-hub into a script > apparently hosted on a wiki? > > Or is git-hub basically designed to be 'wiki for code'? It's more an integrated solution for hosting code via git, with a wiki, bug queue, etc. Think Soourceforge, but a lot nicer and with no ads ;> BitBucket/Hg is another (very nice) solution along the same lines, developed in Python (Github is Ruby-centric). > I'm wondering, because with the FlaggedRevs extension you could > basically build a whole release in the wiki. Which would be fun if > nothing else! I'm not following you there. Could you elaborate on why that would be beneficial? I could see ( chris From biopython at maubp.freeserve.co.uk Thu Dec 3 09:20:32 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 3 Dec 2009 14:20:32 +0000 Subject: [Bioperl-l] Merging separate sequence and quality files to FASTQ ? In-Reply-To: <747E7628-2A72-4064-A7B9-65A904ACDFE7@illinois.edu> References: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com> <320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com> <2c8757af0912030507h1b3d12b8p60c8b405792d6579@mail.gmail.com> <747E7628-2A72-4064-A7B9-65A904ACDFE7@illinois.edu> Message-ID: <320fb6e00912030620m6ce87fc6t310750969e320be7@mail.gmail.com> On Thu, Dec 3, 2009 at 1:47 PM, Chris Fields wrote: > > FASTQ parsing had undergone a major revision prior to > 1.6.1 (the latest release in CPAN). ?Basically, it now parses > all three FASTQ variants. ?However, Peter indicates there > may still be a problem, and it's likely he's running 1.6.1. > Peter can you confirm that? I had BioPerl from SVN circa 1.6.1 (not sure if this was before or after the release of 1.6.1 now): $ perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"' 1.0069 $ perl -MBio::SeqIO -e 'print $Bio::SeqIO::VERSION,"\n"' 1.0069 If the tuples mean anything to you: $ perl -MBio::Root::Version -e 'printf "%vd\n", $Bio::Root::Version::VERSION' 49.46.48.48.54.57 $ perl -MBio::SeqIO -e 'printf "%vd\n", $Bio::SeqIO::VERSION' 49.46.48.48.54.57 I just updated to revision 16435, and retested. I get the same BioPerl version numbers, and the same extra blank line in the sequence FASTQ output as Dan reported. Peter From cjfields at illinois.edu Thu Dec 3 09:39:35 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 3 Dec 2009 08:39:35 -0600 Subject: [Bioperl-l] Merging separate sequence and quality files to FASTQ ? In-Reply-To: <320fb6e00912030620m6ce87fc6t310750969e320be7@mail.gmail.com> References: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com> <320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com> <2c8757af0912030507h1b3d12b8p60c8b405792d6579@mail.gmail.com> <747E7628-2A72-4064-A7B9-65A904ACDFE7@illinois.edu> <320fb6e00912030620m6ce87fc6t310750969e320be7@mail.gmail.com> Message-ID: On Dec 3, 2009, at 8:20 AM, Peter wrote: > On Thu, Dec 3, 2009 at 1:47 PM, Chris Fields wrote: >> >> FASTQ parsing had undergone a major revision prior to >> 1.6.1 (the latest release in CPAN). Basically, it now parses >> all three FASTQ variants. However, Peter indicates there >> may still be a problem, and it's likely he's running 1.6.1. >> Peter can you confirm that? > > I had BioPerl from SVN circa 1.6.1 (not sure if this was before > or after the release of 1.6.1 now): > > $ perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"' > 1.0069 > $ perl -MBio::SeqIO -e 'print $Bio::SeqIO::VERSION,"\n"' > 1.0069 > > If the tuples mean anything to you: > > $ perl -MBio::Root::Version -e 'printf "%vd\n", $Bio::Root::Version::VERSION' > 49.46.48.48.54.57 > $ perl -MBio::SeqIO -e 'printf "%vd\n", $Bio::SeqIO::VERSION' > 49.46.48.48.54.57 > > I just updated to revision 16435, and retested. I get the same > BioPerl version numbers, and the same extra blank line in the > sequence FASTQ output as Dan reported. > > Peter Okay I will try to look into it today (it should be an easy fix). There are two issues, correct? 1) extra blank line. 2) missing description Dan, could you go ahead and submit this as a bug, just in case (so we don't lose track)? Otherwise it might get lost on the mail list or wiki. chris From biopython at maubp.freeserve.co.uk Thu Dec 3 09:56:39 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 3 Dec 2009 14:56:39 +0000 Subject: [Bioperl-l] Merging separate sequence and quality files to FASTQ ? In-Reply-To: References: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com> <320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com> <2c8757af0912030507h1b3d12b8p60c8b405792d6579@mail.gmail.com> <747E7628-2A72-4064-A7B9-65A904ACDFE7@illinois.edu> <320fb6e00912030620m6ce87fc6t310750969e320be7@mail.gmail.com> Message-ID: <320fb6e00912030656p5b75a566t22e1d2037d945338@mail.gmail.com> On Thu, Dec 3, 2009 at 2:39 PM, Chris Fields wrote: > Okay I will try to look into it today (it should be an easy fix). ?There are two issues, correct? > > 1) extra blank line. Which seems to be a bug in BioPerl SeqIO itself. > 2) missing description This is just a trivial bug/omission in the wiki example, http://www.bioperl.org/wiki/Merging_separate_sequence_and_quality_files_to_FASTQ You just need to replace this: my $bsq_obj = Bio::Seq::Quality-> new( -id => $seq_obj->id, -seq => $seq_obj->seq, -qual => $qual_obj->qual, ); With: my $bsq_obj = Bio::Seq::Quality-> new( -id => $seq_obj->id, -description => $seq_obj->description, -seq => $seq_obj->seq, -qual => $qual_obj->qual, ); Look - I seem to be learning Perl by osmosis ;) Peter From dan.bolser at gmail.com Thu Dec 3 11:29:11 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Thu, 3 Dec 2009 16:29:11 +0000 Subject: [Bioperl-l] Merging separate sequence and quality files to FASTQ ? In-Reply-To: <320fb6e00912030656p5b75a566t22e1d2037d945338@mail.gmail.com> References: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com> <320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com> <2c8757af0912030507h1b3d12b8p60c8b405792d6579@mail.gmail.com> <747E7628-2A72-4064-A7B9-65A904ACDFE7@illinois.edu> <320fb6e00912030620m6ce87fc6t310750969e320be7@mail.gmail.com> <320fb6e00912030656p5b75a566t22e1d2037d945338@mail.gmail.com> Message-ID: <2c8757af0912030829t54e87a4bmf166370ca10e966a@mail.gmail.com> 2009/12/3 Peter : > On Thu, Dec 3, 2009 at 2:39 PM, Chris Fields wrote: >> Okay I will try to look into it today (it should be an easy fix). ?There are two issues, correct? ... >> 2) missing description > > This is just a trivial bug/omission in the wiki example, ... > Look - I seem to be learning Perl by osmosis ;) Yay! From dan.bolser at gmail.com Thu Dec 3 11:30:44 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Thu, 3 Dec 2009 16:30:44 +0000 Subject: [Bioperl-l] Merging separate sequence and quality files to FASTQ ? In-Reply-To: <747E7628-2A72-4064-A7B9-65A904ACDFE7@illinois.edu> References: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com> <320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com> <2c8757af0912030507h1b3d12b8p60c8b405792d6579@mail.gmail.com> <747E7628-2A72-4064-A7B9-65A904ACDFE7@illinois.edu> Message-ID: <2c8757af0912030830n718f8cc7hc9e501919435e4a8@mail.gmail.com> 2009/12/3 Chris Fields : > Dan, > > On Dec 3, 2009, at 7:07 AM, Dan Bolser wrote: ... >> I'm wondering, because with the FlaggedRevs extension you could >> basically build a whole release in the wiki. Which would be fun if >> nothing else! > > I'm not following you there. ?Could you elaborate on why that would be beneficial? ?I could see ( I never said it would be beneficial, only that it would be fun. http://www.mediawiki.org/wiki/Flaggedrevs From florent.angly at gmail.com Thu Dec 3 13:26:57 2009 From: florent.angly at gmail.com (Florent Angly) Date: Thu, 03 Dec 2009 10:26:57 -0800 Subject: [Bioperl-l] problem with alignments and sequence locations In-Reply-To: <4B17BAF7.2050604@informatik.uni-freiburg.de> References: <4AF962AA.7060908@informatik.uni-freiburg.de> <4B17BAF7.2050604@informatik.uni-freiburg.de> Message-ID: <4B1802F1.1040304@gmail.com> Hi all, Like Steffen, I've had a few burning questions too regarding LocatableSeq lately. I've had an occasional issue with LocatableSeq. Most assembly-related modules use LocatableSeq objects. They specify the sequence start but not the sequence end. This works in most cases, but I've recently encountered very occasional error messages related to having not explicitely set the end of the sequence. I've been unable to put together a small test case to reproduce the bug easily. My question is. If the start of the sequence is set, is it mandatory to set the end of the sequence? If so, then maybe the documentation needs to be explicit about it and maybe there needs to be a check that enforces that the end is set. In fact, it seems like if I provide a sequence and its start position, the LocatableSeq code should be able to automatically calculate its end, no? Florent Steffen Heyne wrote: > Hello, > > so I tried to fix the problem with the location. Currently it works for > me with the following changes: > > LocatableSeq.pm > > sub get_nse{ > > ... > > my $ret; > if ($self->strand() >= 0) { > $ret = $id . $v. $char1 . $st . $char2 . $end ; > } else { > $ret = $id . $v. $char1 . $end . $char2 . $st ; > } > return $ret; > } > > Then I recognized during the usage of $aln->remove_seq() that it cannot > remove a seq as it uses a wrong NSE to lookup sequences. I changed the > following: > > SimpleAlign.pm > > sub remove_seq { > > ... > $id = $seq->id(); > $start = $seq->start(); > $end = $seq->end(); > > ## changed code: > > my $v = $seq->version ? '.'.$seq->version : ''; > if ($seq->strand >=0){ > $name = sprintf("%s%s/%d-%d",$id,$v,$start,$end); > } elsif ($seq->strand == -1){ > $name = sprintf("%s%s/%d-%d",$id,$v,$end,$start); > } > ... > > } > > The above code in LocatableSeq.pm worked in the case if I read an > alignment in stockholm format and write it out in clustalw format. But > if I read an alignment in clustalw and write it out as stockholm (or > something else) it didn't worked, as the strand is not correctly set in > ClustalW::next_aln. It works with the following changes: > > ClustalW.pm > > sub next_aln{ > > ... > > my ( $sname, $start, $end, $strand ); ## strand added > $strand = 0; ## new, standard = 0??? > foreach my $name ( sort { $order{$a} <=> $order{$b} } keys > %alignments ) { > if ( $name =~ /(\S+):(\d+)-(\d+)/ ) { > ( $sname, $start, $end ) = ( $1, $2, $3 ); > $strand = 1; ## new > if ($start > $end) { ## new > ($start, $end, $strand) = ($end, $start, -1); ##new > } ## new > > } > else { > ( $sname, $start ) = ( $name, 1 ); > my $str = $alignments{$name}; > $str =~ s/[^A-Za-z]//g; > $end = length($str); > } > > my $seq = Bio::LocatableSeq->new( > -seq => $alignments{$name}, > -id => $sname, > -start => $start, > -end => $end, > -strand=> $strand ## new > ); > > ... > > } > > So I don't know if I changed things at their correct position. And I > found them only because I used certain functions. I dont know how broad > the effect of a changed NSE in LocatableSeq.pm is to other Modules and > functions. But I'm happy with my changes (so far :-)...). > > Do you will change this to your proposed way in bioperl trunk? > > Thanks! > > steffen > > > Chris Fields schrieb: > >> On Nov 10, 2009, at 6:55 AM, Steffen Heyne wrote: >> >> >>> Hi, >>> >>> I'm using Bioperl for my research and it is very useful! Thank you! >>> >>> Currently I have a problem with locations tags of sequences. I read in >>> seed alignments of Rfam (in stockholm format, but I think it is >>> similar to other formats). >>> >>> If the location is like: >>> >>> AB194432.1/908-846 >>> >>> the start/end values are changed to >>> >>> $seq->start = 846 >>> $seq->end = 908 >>> >>> and therefore the new location (e.g.$seq->get_nse) is: >>> >>> AB194432.1/846-908 >>> >>> The $seq->strand tag is correctly set to -1 in this case, but if the >>> alignment is written out again (clustal, stockholm,...) this strand >>> info is lost and the sequences have this "wrong" location. But this >>> information is important in respect to the sequence accession number. >>> >>> Is there a way to set the location back to the original one or is this >>> behavior desired? Any manually setting with $seq->start($val) failed >>> due to automatic checking. >>> >>> I'm using bioperl 1.6.1 >>> >>> Thanks! >>> >>> steffen >>> >> This is a definite bug. We recently discussed amending the NSE format >> due to this (the subject came up over the last few months or so); it's >> fallen through the cracks. Fortunaely it is very easy to fix (the >> relevant method is in LocatableSeq). >> >> Does anyone have a problem with me adding this in? It will change >> output for only those instances where the strand is -1, so >> >> AB194432.1/908-846 >> >> would be start = 846, end = 908, strand = -1 >> >> AB194432.1/846-908 >> >> would be start = 846, end = 908, strand = 1 >> >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > From cjfields at illinois.edu Thu Dec 3 23:16:48 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 3 Dec 2009 22:16:48 -0600 Subject: [Bioperl-l] Merging separate sequence and quality files to FASTQ ? In-Reply-To: <2c8757af0912030830n718f8cc7hc9e501919435e4a8@mail.gmail.com> References: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com> <320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com> <2c8757af0912030507h1b3d12b8p60c8b405792d6579@mail.gmail.com> <747E7628-2A72-4064-A7B9-65A904ACDFE7@illinois.edu> <2c8757af0912030830n718f8cc7hc9e501919435e4a8@mail.gmail.com> Message-ID: <37058F8C-419E-4E88-AC4F-543FF9B563E1@illinois.edu> On Dec 3, 2009, at 10:30 AM, Dan Bolser wrote: > 2009/12/3 Chris Fields : >> Dan, >> >> On Dec 3, 2009, at 7:07 AM, Dan Bolser wrote: > > ... > >>> I'm wondering, because with the FlaggedRevs extension you could >>> basically build a whole release in the wiki. Which would be fun if >>> nothing else! >> >> I'm not following you there. Could you elaborate on why that would be beneficial? I could see ( > > I never said it would be beneficial, only that it would be fun. > > http://www.mediawiki.org/wiki/Flaggedrevs Ah, okay, that makes some sense. Just to stay on subject, committed a fix (r16439) to bioperl-live that addresses the additional newline issue. chris From rtbio.2009 at gmail.com Fri Dec 4 08:57:21 2009 From: rtbio.2009 at gmail.com (Roopa Raghuveer) Date: Fri, 4 Dec 2009 14:57:21 +0100 Subject: [Bioperl-l] Regarding Organism based search in Remote blast Message-ID: Hello all, I am working on Remote blast.Here,I am trying to get 2 parameters into the remote blast code.They are 1.The input sequence that has to be sent to blast 2.Organism (The organism which has to be searched for ex:-Trypanasoma brucei etc.,) When I tried to take the organism parameter as an input from the user,through a web page,the Remote blast was not giving any results i.e., it says that there are no alignments found. But,when I hard coded the organism in the code,it gives me the results i.e., 3hits. I could not understand this problem.Could any body please help me in this regard? My code is sub blastcode { $input1= $_[0]; $organ= $_[1]; open(NUC,'>',$nuc); print NUC $input1; close(NUC); my $prog = 'blastn'; my $db = 'refseq_rna'; my $e_val= '1e-10'; my $organism= $organ; $gb = new Bio::DB::GenBank; my @params = ( '-prog' => $prog, '-data' => $db, '-expect' => $e_val, '-readmethod' => 'SearchIO', '-Organism' => $organism ); open(OUTFILE,'>',$debugfile); print OUTFILE @params; close(OUTFILE); my $factory = Bio::Tools::Run::RemoteBlast->new(@params); #change a paramter $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$organism[ORGN]'; #change a paramter # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]'; my $v = 1; #$v is just to turn on and off the messages my $str = Bio::SeqIO->new(-file => $nuc , '-format' => 'fasta' , '-Organism' => $organism ); while (my $input = $str->next_seq()) { #Blast a sequence against a database: #Alternatively, you could pass in a file with many #sequences rather than loop through sequence one at a time #Remove the loop starting 'while (my $input = $str->next_seq())' #and swap the two lines below for an example of that. my $r = $factory->submit_blast($input); # my $r = $factory->submit_blast('amino.fa'); print STDERR "waiting...." if($v>0); while ( my @rids = $factory->each_rid ) { foreach my $rid ( @rids ) { my $rc = $factory->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $factory->remove_rid($rid); } print STDERR "." if ( $v > 0 ); sleep 5; } else { my $result = $rc->next_result(); #save the output $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; # open(BLASTDEBUGFILE,'>',$debugfile); # print BLASTDEBUGFILE $result->next_hit(); # close(BLASTDEBUGFILE); my $filename = $serverpath."/blastdata_".time().$result->query_name()."\.out"; # open(DEBUGFILE,'>',$debugfile); # open(new,'>',$filename); # @arra=; # print DEBUGFILE @arra; # close(DEBUGFILE); # close(new); $factory->save_output($filename); # open(BLASTDEBUGFILE,'>',$debugfile); # print BLASTDEBUGFILE "Hello $rid"; # close(BLASTDEBUGFILE); $factory->remove_rid($rid); open(BLASTDEBUGFILE,'>',$blastdebugfile); print BLASTDEBUGFILE $organism; close(BLASTDEBUGFILE); # open(OUTFILE,'>',$outfile); # print OUTFILE "Test2 $result->database_name()"; # close(OUTFILE); #$hit = $result->next_hit; #open(new,'>',$debugfile); #print $hit; #close(new); while ( my $hit = $result->next_hit ) { next unless ( $v > 0); # open(OUTFILE,'>',$debugfile); # print OUTFILE "$hit in while hits"; # close(OUTFILE); my $sequ = $gb->get_Seq_by_version($hit->name); my $dna = $sequ->seq(); # get the sequence as a string push(@seqs,$dna); } } } } } #open(OUTFILE,'>',$debugfile); #print OUTFILE $seqs[0]; #close(OUTFILE); return(@seqs); } Regards, Roopa. From cjfields at illinois.edu Fri Dec 4 09:59:17 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 4 Dec 2009 08:59:17 -0600 Subject: [Bioperl-l] Regarding Organism based search in Remote blast In-Reply-To: References: Message-ID: <77EDAB6B-68B5-460C-AD9F-EB45B9C3AFF7@illinois.edu> Roopa, At one point a couple of parameters differed between NCBI's web interface and our RemoteBlast-based BLAST interface to URLAPI (this should be indicated in your BLAST reports). See here: http://thread.gmane.org/gmane.comp.lang.perl.bio.general/14155 Also, are the returned hits specific for the genome? You shoudl double-check; in some cases you have to set both HEADER and RETRIEVALHEADER to get the expected results (not sure why): http://article.gmane.org/gmane.comp.lang.perl.bio.general/18737/match=remoteblast+ncbi chris On Dec 4, 2009, at 7:57 AM, Roopa Raghuveer wrote: > Hello all, > > I am working on Remote blast.Here,I am trying to get 2 parameters into the > remote blast code.They are > > 1.The input sequence that has to be sent to blast > > 2.Organism (The organism which has to be searched for ex:-Trypanasoma brucei > etc.,) > > When I tried to take the organism parameter as an input from the > user,through a web page,the Remote blast was not giving any results i.e., it > says that there are no alignments found. > > But,when I hard coded the organism in the code,it gives me the results i.e., > 3hits. > > I could not understand this problem.Could any body please help me in this > regard? > > My code is > > sub blastcode > { > > $input1= $_[0]; > > $organ= $_[1]; > > open(NUC,'>',$nuc); > print NUC $input1; > close(NUC); > > my $prog = 'blastn'; > my $db = 'refseq_rna'; > my $e_val= '1e-10'; > my $organism= $organ; > > $gb = new Bio::DB::GenBank; > > my @params = ( '-prog' => $prog, > '-data' => $db, > '-expect' => $e_val, > '-readmethod' => 'SearchIO', > '-Organism' => $organism ); > > open(OUTFILE,'>',$debugfile); > print OUTFILE @params; > close(OUTFILE); > > > my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > > #change a paramter > $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$organism[ORGN]'; > #change a paramter > # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]'; > > my $v = 1; > #$v is just to turn on and off the messages > > my $str = Bio::SeqIO->new(-file => $nuc , '-format' => 'fasta' , > '-Organism' => $organism ); > > while (my $input = $str->next_seq()) > > { > #Blast a sequence against a database: > #Alternatively, you could pass in a file with many > #sequences rather than loop through sequence one at a time > #Remove the loop starting 'while (my $input = $str->next_seq())' > #and swap the two lines below for an example of that. > > my $r = $factory->submit_blast($input); > > # my $r = $factory->submit_blast('amino.fa'); > > print STDERR "waiting...." if($v>0); > > while ( my @rids = $factory->each_rid ) { > > foreach my $rid ( @rids ) { > > my $rc = $factory->retrieve_blast($rid); > > if( !ref($rc) ) > { > if( $rc < 0 ) > { > $factory->remove_rid($rid); > } > print STDERR "." if ( $v > 0 ); > sleep 5; > } > else { > my $result = $rc->next_result(); > #save the output > $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; > > # open(BLASTDEBUGFILE,'>',$debugfile); > # print BLASTDEBUGFILE $result->next_hit(); > # close(BLASTDEBUGFILE); > > my $filename = > $serverpath."/blastdata_".time().$result->query_name()."\.out"; > > # open(DEBUGFILE,'>',$debugfile); > # open(new,'>',$filename); > # @arra=; > # print DEBUGFILE @arra; > # close(DEBUGFILE); > # close(new); > $factory->save_output($filename); > > # open(BLASTDEBUGFILE,'>',$debugfile); > # print BLASTDEBUGFILE "Hello $rid"; > # close(BLASTDEBUGFILE); > > $factory->remove_rid($rid); > > open(BLASTDEBUGFILE,'>',$blastdebugfile); > print BLASTDEBUGFILE $organism; > close(BLASTDEBUGFILE); > > # open(OUTFILE,'>',$outfile); > # print OUTFILE "Test2 $result->database_name()"; > # close(OUTFILE); > > #$hit = $result->next_hit; > #open(new,'>',$debugfile); > #print $hit; > #close(new); > > while ( my $hit = $result->next_hit ) { > > next unless ( $v > 0); > > # open(OUTFILE,'>',$debugfile); > # print OUTFILE "$hit in while hits"; > # close(OUTFILE); > > my $sequ = $gb->get_Seq_by_version($hit->name); > my $dna = $sequ->seq(); # get the sequence as a string > push(@seqs,$dna); > } > } > } > } > } > > #open(OUTFILE,'>',$debugfile); > #print OUTFILE $seqs[0]; > #close(OUTFILE); > > return(@seqs); > } > > Regards, > Roopa. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From robert.bradbury at gmail.com Fri Dec 4 13:27:38 2009 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Fri, 4 Dec 2009 13:27:38 -0500 Subject: [Bioperl-l] Gene critical region analysis -- visual display Message-ID: Background: I have been involved in aging research off and on for ~16 years. My initial focus was in the eventual decline of the "program" (because DNA has no ECC and only limited redundancy) therefore my initial work (in the early 1990's was focused on DNA repair genes (of which there about 150 in the human genome) [1,2]. Most recently I have focused in on the DNA double strand break repair processes (NHEJ) as a fundamental cause of aging because it may fundamentally corrupt the genomes of individual cells. (And as most programmers would agree -- break the code and you break the program). Michael Lieber at UCLA has estimated that by the time a human is ~70 on the order of several hundred genes in ones cells have been corrupted (which may be an indeterminate effect on the cells functioning). Problem: Just looking at the GenBank output for the human Artemis (DCLRE1C) gene there are on the order of 18 SNPs and 8 possible phosphorylation sites (not to mention other potential modification sites) -- this combined with the fact that Methionine and Tryptophan and to a lesser extent Cysteine are more susceptible to single base mutations (due the alteration of the codon->amino acid coding even involving single base mutations/repairs) . There are various programs to analyze such proteins for the critical sites -- SIFT and the various programs pointed to by their sites. Now it seems to me that one could attack this problem by integrating SNPs, mutations, etc. at the critical sites (where "critical" may or may not be at normal SNPs -- which presumably are primarily at non-critical sites -- and those proteins where if you change the coding sequence to non-synomonous amino acids you potentially break the protein (the real interpretation of which will not be understood until population studies are done). So, in the process of looking at the DCLRE1C protein I asked myself, "Why is there not a BioPerl function which simply enables a visual interpretation of the critical sites of the protein?" I.e. some color-coded representation of the protein (which presumably has some augmented functionality to determine things like probability or statistical information). I.e. hand the function a .fasta file and it will give you an visual (colored) analysis of the critical nature of specific a.a. -- i.e. something which could be used by genomic or SNP analysis (such as I presume that being done by 23andme -- as well as other organizations) to begin to separate out the variations in the human genome (e.g. SNPs) from the mutations which may effect individuals. I have the C programming and to a lesser extent Perl experience to contribute to this -- I lack the BioPerl wisdom to make it generally available. If anyone has some suggestions as to what functions/modules might be of use (in providing a "single-look" view of gene a.a. whose mutations may be more or less detrimental) I would appreciate hearing from them. Robert Bradbury 1. "DNA Repair and Mutagenesis", E.C. Friedberg et al, 2nd Ed., ASM Press (2006) 2. "Aging of the Genome", J. Vijg, Oxford University Press (2007) From maj at fortinbras.us Sun Dec 6 17:54:00 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sun, 6 Dec 2009 17:54:00 -0500 Subject: [Bioperl-l] bioperl-mode new feature: base class browsing Message-ID: <59494F4102D84535B3A5D05B595ACBF7@NewLife> Hi All, You can now browse pod of the base/parent classes of bioperl modules with one keystroke using the latest update of bioperl-mode. See http://bioperl.org/wiki/Emacs_bioperl-mode Press "B" or "P" while in pod view to get a completion list of the parent classes for the module whose pod you're viewing. cheers, MAJ From mmokrejs at ribosome.natur.cuni.cz Mon Dec 7 15:33:48 2009 From: mmokrejs at ribosome.natur.cuni.cz (=?UTF-8?B?TWFydGluIE1PS1JFSsWg?=) Date: Mon, 07 Dec 2009 21:33:48 +0100 Subject: [Bioperl-l] Generalized reciprocal blast In-Reply-To: References: Message-ID: <4B1D66AC.4080804@ribosome.natur.cuni.cz> Hi, I just stumbled across this older posting ... maybe you want to exploit SIMAP (http://webclu.bio.wzw.tum.de/portal/web/simap/). I think it has remote API available. Martin Robert Bradbury wrote: > I would like to know whether or not anyone has attempted to create a > "generalized" reciprocal blast component for BioPerl? > > One sees papers all the time where they discuss running reciprocal blasts to > compare a new species to an old "standard" species or a set of species or > running an all-to-all set of comparisons to match up all of the "known" > proteins from species and determine which are outliers (and therefore > "novel"). There are also accumulating merged sets in NCBI HomoloGene (which > seems to be a some strict subset (perhaps a dozen) "well sequenced" genomes) > and Ensembl (which seems to be working with a much larger set of 40-50 > genomes some of which may be somewhat incomplete and are certainly poorly > "explored". > > I have, I believe, seen code "fragments" from various authors, perhaps some > on the BioPerl list, which perform some major subset of a typical > "reciprocal blast". > > Now what I am looking for is a relatively generalizable some-to-some > reciprocal blast utility. I want to be able to specify the genes (or gene > family), e.g. some of the ~150 known DNA repair genes. It would be helpful > to also specify how "tolerant" the blast "true reciprocal" criteria are. > There are some genes where there is a very strict 1-to-1 relationship across > many genomes. But for genes which involve relatively standard domains, e.g. > "helicase" domains, the 1-to-1 relationship becomes cloudy -- in mammals for > example its more like 5-to-5 and it would be really nice to be able to > specify the strictness or quality level [1] for "matching" genes (and even > which genes are to be excluded because they are known to be false > homologues). > > Then to top this off I want to be able to combine known public e.g. > (HomoloGene / Uniigene / Ensembl) databases with perhaps local private > databases or database subsets (e.g. emerging or specialized genomes). > > The goal here of course to determine the precise phylogenetic relationships > between all of the DNA repair genes and how there may be gain / loss / > evolution of function that can be related to species characteristics (size, > longevity, etc.). > > Is there a generalized reciprocal blast component in BioPerl? Or is it a > "build-it-yourself" situation (that I have to believe has been built > probably a few dozen times by various researchers / organizations / > companies)? > > Thanks, > Robert Bradbury > > 1. This would be handled in BioPerl with a customizable user function which > could be tailored to handle specific cases -- for example a function which > when handed a set of 100 potential "matches" could go through those 100 > matches, identify common domains, and then "re-rate" matches based on > considerations such as the type and number of common domains, domains being > in the same order, etc. I.e. criteria which may be difficult to completely > generalize across entire genomes but are fairly obvious if you are looking > at a graphical replication of a gene set in HomoloGene. From robert.bradbury at gmail.com Mon Dec 7 15:41:54 2009 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Mon, 7 Dec 2009 15:41:54 -0500 Subject: [Bioperl-l] Remote blast fork errors / Process limit restrictions Message-ID: This comment could also have a subject line: "Why does Bioperl/get_sequence> fork at all! Why are not all operations sequential? And if this is a "default" mode that I'm unaware of -- How to I ever write a reliable BioPerl script if I have little or no capability of what the program uses when it runs? I may have days so I can bear the burden of relatively slow results (and so can use sequential processing rather than parallel). I've got a perl script that uses remote blast to blast a sequence against a subset of the NCBI sequences. It "mostly" works, in that it returns a seemingly complete .bls result file but when attempting to look at the sequences (so it can more accurately summarize the information from the results than a standard blast report allows) it terminates prematurely with errors. The error is: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Couldn't fork: Resource temporarily unavailable STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/vendor_perl/5.8.8/Bio/Root/Root.pm:368 STACK: Bio::DB::WebDBSeqI::_open_pipe /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:722 STACK: Bio::DB::WebDBSeqI::get_seq_stream /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:463 STACK: Bio::DB::NCBIHelper::get_Stream_by_acc /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/NCBIHelper.pm:479 STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:186 STACK: Bio::Perl::get_sequence /usr/lib/perl5/vendor_perl/5.8.8/Bio/Perl.pm:520 STACK: main::acc_2_desc /home/bradbury/Genomes/bin/RB.pl:182 STACK: /home/bradbury/Genomes/bin/RB.pl:155 ----------------------------------------------------------- The precise line (in my code) whcih appears to be generating the error is: $seq = get_sequence('GenBank', $accsn); Now this can be a problem if NCBI/Genbank fails due to load conditions -- but this specific failure (which is repeatable is due to most likely hitting the user process limit restrictions) -- but the small blast results work fine -- its only if the Blast has returned several hundred hits that it runs into this problem. Now what it sounds like to me is an attempt to do multiple asynchronous NCBI queries (to get a sequence) with complete disregard of the environment (process limits, NCBI limits, etc.). But I do not know enough about how this works to point a finger at some specific function. As a result get_sequence process results are accumulated, summarized, etc. without ever having issued to respect "wait-variant()) calls to collect former children [This IMO would clearly be a bug.] It could be adjusted to by allowing the BioPerl library to run in 3 modes. (1) completely synchronous -- if you fork you wait until its done -- and you collect "it" and any fork fails then one either collects the process or switches to the non-conservative mode. Robert From cjfields at illinois.edu Mon Dec 7 16:08:40 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 7 Dec 2009 15:08:40 -0600 Subject: [Bioperl-l] Remote blast fork errors / Process limit restrictions In-Reply-To: References: Message-ID: Robert, If you use the relative components directly (by that I mean use Bio::DB::GenBank and Bio::Tools::Run::RemoteBlast instead of Bio::Perl), you can control whether the process forks or not. All Bio::Perl does is wrap those modules for simple beginner tasks; if you want full control over the various parts of the pipeline you will need to use those tools directly. See the POD for those specific modules for more information. chris On Dec 7, 2009, at 2:41 PM, Robert Bradbury wrote: > This comment could also have a subject line: "Why does Bioperl/get_sequence> > fork at all! Why are not all operations sequential? And if this is a > "default" mode that I'm unaware of -- How to I ever write a reliable BioPerl > script if I have little or no capability of what the program uses when it > runs? I may have days so I can bear the burden of relatively slow results > (and so can use sequential processing rather than parallel). > > I've got a perl script that uses remote blast to blast a sequence against a > subset of the NCBI sequences. It "mostly" works, in that it returns a > seemingly complete .bls result file but when attempting to look at the > sequences (so it can more accurately summarize the information from the > results than a standard blast report allows) it terminates prematurely with > errors. > > The error is: > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Couldn't fork: Resource temporarily unavailable > STACK: Error::throw > STACK: Bio::Root::Root::throw > /usr/lib/perl5/vendor_perl/5.8.8/Bio/Root/Root.pm:368 > STACK: Bio::DB::WebDBSeqI::_open_pipe > /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:722 > STACK: Bio::DB::WebDBSeqI::get_seq_stream > /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:463 > STACK: Bio::DB::NCBIHelper::get_Stream_by_acc > /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/NCBIHelper.pm:479 > STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc > /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:186 > STACK: Bio::Perl::get_sequence > /usr/lib/perl5/vendor_perl/5.8.8/Bio/Perl.pm:520 > STACK: main::acc_2_desc /home/bradbury/Genomes/bin/RB.pl:182 > STACK: /home/bradbury/Genomes/bin/RB.pl:155 > ----------------------------------------------------------- > > The precise line (in my code) whcih appears to be generating the error is: > $seq = get_sequence('GenBank', $accsn); > > Now this can be a problem if NCBI/Genbank fails due to load conditions -- > but this specific failure (which is repeatable is due to most likely hitting > the user process limit restrictions) -- but the small blast results work > fine -- its only if the Blast has returned several hundred hits that it runs > into this problem. > > Now what it sounds like to me is an attempt to do multiple asynchronous NCBI > queries (to get a sequence) with complete disregard of the environment > (process limits, NCBI limits, etc.). But I do not know enough about how > this works to point a finger at some specific function. As a result > get_sequence process results are accumulated, summarized, etc. without ever > having issued to respect "wait-variant()) calls to collect former children > [This IMO would clearly be a bug.] > > It could be adjusted to by allowing the BioPerl library to run in 3 modes. > (1) completely synchronous -- if you fork you wait until its done -- and > you collect "it" and any fork fails then one either collects the process or > switches to the non-conservative mode. > > Robert > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason at bioperl.org Mon Dec 7 16:24:54 2009 From: jason at bioperl.org (Jason Stajich) Date: Mon, 7 Dec 2009 13:24:54 -0800 Subject: [Bioperl-l] Remote blast fork errors / Process limit restrictions In-Reply-To: References: Message-ID: <39420695-FC14-446F-A4C2-C187E2221E44@bioperl.org> Robert - You seem to be mixing the blast remote and the sequence query retrieval problems. These messages are related to the remote retrieval of sequences. It is hard to tell from your message specifically which modules you are using or how you are querying NCBI - there are several ways to do this either with the NCBI tools or the Bio::DB::GenBank. If you are using Bio::DB::Query::GenBank that allows for async access and has built in controls to adhere to the wait variant that NCBI requests but I don't think Bio::DB::GenBank get_Seq_by_acc method does any sort of thing (at least when it was originally written). I always advocate if you want highly available and reliable access to sequences you should download the nr or whichever DB and use the local indexing tools for the retrieval. Once you start doing hundreds of queries I don't see any good reason to be doing the query against NCBI directly given unreliabilities of the web and services. Local databases are faster and more reliable for most people so I urge you take advantage of the tools which provide local database access with the same APIs. I would like to comment that the tone of your posts to the list are not particularly helpful. I wonder if you are actually asking for help or just interested in complaining about when things don't work as you expect? This is a collaborative and volunteer-only project, with the principles of working together to make useful toolkit. We encourage you to build programs and applications from this base that suit your needs, but not all things will be directly implemented in the toolkit if they aren't generic enough (at least that is my feeling, the other Core devs help with these decisions). If there is a useful, generic, and reusable part we would like that to be part of the API. Otherwise we suggest the new application that fits a developer's vision. We encourage you to write (and publish) that application separately, but certainly encourage bug (and fixes) submissions and also code contributions for new features where they can be seen as generally useful. -jason On Dec 7, 2009, at 12:41 PM, Robert Bradbury wrote: > This comment could also have a subject line: "Why does Bioperl/ > get_sequence> > fork at all! Why are not all operations sequential? And if this is a > "default" mode that I'm unaware of -- How to I ever write a reliable > BioPerl > script if I have little or no capability of what the program uses > when it > runs? I may have days so I can bear the burden of relatively slow > results > (and so can use sequential processing rather than parallel). > > I've got a perl script that uses remote blast to blast a sequence > against a > subset of the NCBI sequences. It "mostly" works, in that it returns a > seemingly complete .bls result file but when attempting to look at the > sequences (so it can more accurately summarize the information from > the > results than a standard blast report allows) it terminates > prematurely with > errors. > > The error is: > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Couldn't fork: Resource temporarily unavailable > STACK: Error::throw > STACK: Bio::Root::Root::throw > /usr/lib/perl5/vendor_perl/5.8.8/Bio/Root/Root.pm:368 > STACK: Bio::DB::WebDBSeqI::_open_pipe > /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:722 > STACK: Bio::DB::WebDBSeqI::get_seq_stream > /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:463 > STACK: Bio::DB::NCBIHelper::get_Stream_by_acc > /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/NCBIHelper.pm:479 > STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc > /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:186 > STACK: Bio::Perl::get_sequence > /usr/lib/perl5/vendor_perl/5.8.8/Bio/Perl.pm:520 > STACK: main::acc_2_desc /home/bradbury/Genomes/bin/RB.pl:182 > STACK: /home/bradbury/Genomes/bin/RB.pl:155 > ----------------------------------------------------------- > > The precise line (in my code) whcih appears to be generating the > error is: > $seq = get_sequence('GenBank', $accsn); > > Now this can be a problem if NCBI/Genbank fails due to load > conditions -- > but this specific failure (which is repeatable is due to most likely > hitting > the user process limit restrictions) -- but the small blast results > work > fine -- its only if the Blast has returned several hundred hits that > it runs > into this problem. > > Now what it sounds like to me is an attempt to do multiple > asynchronous NCBI > queries (to get a sequence) with complete disregard of the environment > (process limits, NCBI limits, etc.). But I do not know enough about > how > this works to point a finger at some specific function. As a result > get_sequence process results are accumulated, summarized, etc. > without ever > having issued to respect "wait-variant()) calls to collect former > children > [This IMO would clearly be a bug.] > > It could be adjusted to by allowing the BioPerl library to run in 3 > modes. > (1) completely synchronous -- if you fork you wait until its done -- > and > you collect "it" and any fork fails then one either collects the > process or > switches to the non-conservative mode. > > Robert > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From Jonas_Schaer at gmx.de Tue Dec 8 10:21:58 2009 From: Jonas_Schaer at gmx.de (Jonas Schaer) Date: Tue, 8 Dec 2009 16:21:58 +0100 Subject: [Bioperl-l] fasta format Message-ID: <36E9C2F3282347918FD3B3ACA0EC8126@jonas> Hi there, I have a little question concerning bioperl. I have BioPerl-1.6.1.tar.gz installed and i use the fasta.pm module to read in some fasta files. first it worked fine, but now i have some fastafiles in slightly different format (not all lines have the same length!). ------------- EXCEPTION ------------- MSG: Each line of the fasta entry must be the same length except the last. Line above #49 ' ..' is 28 != 101 chars. STACK Bio::DB::Fasta::calculate_offsets C:/Perl/site/lib/Bio/DB/Fasta.pm:771 STACK Bio::DB::Fasta::index_file C:/Perl/site/lib/Bio/DB/Fasta.pm:681 STACK Bio::DB::Fasta::new C:/Perl/site/lib/Bio/DB/Fasta.pm:491 STACK Bio::DB::Fasta::newFh C:/Perl/site/lib/Bio/DB/Fasta.pm:513 STACK main::readfasta blast_eval.pm:174 STACK toplevel blast_eval.pm:83 ------------------------------------- indexing was interrupted, so unlinking test.fasta.index at C:/Perl/site/lib/Bio/ DB/Fasta.pm line 1054. Is there any way to use these fasta files with diffrent length of lines with this fasta.pm module or will i have to change the format of my fasta-files(big databases...) ? Thanks in advance for any help! Regards, Jonas From awitney at sgul.ac.uk Tue Dec 8 12:01:58 2009 From: awitney at sgul.ac.uk (Adam Witney) Date: Tue, 8 Dec 2009 17:01:58 +0000 Subject: [Bioperl-l] package to associate genes with branches on trees? Message-ID: Hi, I have been generating some trees with Phylip (pars) and then processing them with Bioperl. These trees are generated by comparing multiple strains of a bacterial organism by presence/absence (0/1) calls for each gene. I was wondering of there was any package in Bioperl to try to determine if any specific genes were associated with specific branches of the trees? Or if anyone knew of another tool that can do this? thanks for any help adam From jason at bioperl.org Tue Dec 8 12:44:43 2009 From: jason at bioperl.org (Jason Stajich) Date: Tue, 8 Dec 2009 09:44:43 -0800 Subject: [Bioperl-l] fasta format In-Reply-To: <36E9C2F3282347918FD3B3ACA0EC8126@jonas> References: <36E9C2F3282347918FD3B3ACA0EC8126@jonas> Message-ID: you can run sreformat (HMMER) or bp_sreformat.pl script in scripts/utilties (or that is installed when you install the Bioperl scripts) $ bp_sreformat.pl -if fasta -of fasta -i yourfile.fa -o yournewfile.fa # rename it back $ mv yournewfile.fa yourfile.fa or $ sreformat fasta yourfile.fa > yournewfile.fa $ mv yournewfile.fa yourfile.fa -jason On Dec 8, 2009, at 7:21 AM, Jonas Schaer wrote: > Hi there, > I have a little question concerning bioperl. I have > BioPerl-1.6.1.tar.gz installed and i use the fasta.pm module to read > in some fasta files. first it worked fine, but now i have some > fastafiles in slightly different format (not all lines have the same > length!). > > ------------- EXCEPTION ------------- > MSG: Each line of the fasta entry must be the same length except the > last. > Line above #49 ' > ..' is 28 != 101 chars. > STACK Bio::DB::Fasta::calculate_offsets C:/Perl/site/lib/Bio/DB/ > Fasta.pm:771 > STACK Bio::DB::Fasta::index_file C:/Perl/site/lib/Bio/DB/Fasta.pm:681 > STACK Bio::DB::Fasta::new C:/Perl/site/lib/Bio/DB/Fasta.pm:491 > STACK Bio::DB::Fasta::newFh C:/Perl/site/lib/Bio/DB/Fasta.pm:513 > STACK main::readfasta blast_eval.pm:174 > STACK toplevel blast_eval.pm:83 > ------------------------------------- > > indexing was interrupted, so unlinking test.fasta.index at C:/Perl/ > site/lib/Bio/ > DB/Fasta.pm line 1054. > > > Is there any way to use these fasta files with diffrent length of > lines with this fasta.pm module or will i have to change the format > of my fasta-files(big databases...) ? > > Thanks in advance for any help! > > Regards, Jonas > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From cjfields at illinois.edu Tue Dec 8 23:30:26 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 8 Dec 2009 22:30:26 -0600 Subject: [Bioperl-l] [ANNOUNCEMENT] BioPerl Meeting at the GMOD Conference Message-ID: <1BC089CD-75C3-437E-86A5-22220D724DF6@illinois.edu> All, For those interested, we will be holding a general BioPerl meeting, tentatively scheduled for January 13, 2010, just prior to the GMOD Community Meeting from Jan 14-15 in San Diego. This will be just following the Plant and Animal Genome (PAG) conference Jan 9-13. The exact day and time is somewhat flexible depending on attendees' schedules. For those interested, sign up here: http://www.bioperl.org/wiki/GMOD_2010_Meeting For those interested in attending the GMOD meeting or PAG: http://gmod.org/wiki/January_2010_GMOD_Meeting I can envision the following items popping up: * Refactoring of Alignment and GFF3/FeatureIO * Addressing BioPerl's monolithic nature * Moose and Perl 6 * Documentation Any others? chris From akarger at CGR.Harvard.edu Wed Dec 9 10:01:45 2009 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Wed, 9 Dec 2009 10:01:45 -0500 Subject: [Bioperl-l] fasta format In-Reply-To: <36E9C2F3282347918FD3B3ACA0EC8126@jonas> References: <36E9C2F3282347918FD3B3ACA0EC8126@jonas> Message-ID: <1B12003244CE894E85B4726023637888055929@FASXCH01.fasmail.priv> > Is there any way to use these fasta files with diffrent length of > lines with this fasta.pm module or will i have to change the format > of my fasta-files(big databases...) ? > Jonas, It's not Bioperl, but for a quick fix you can use the Scriptome. Use the change_fasta_to_tab script (http://sysbio.harvard.edu/csb/resources/computational/scriptome/Windows/Tools/Change.html#change_a_fasta_file_into_tabular_format__change_fasta_to_tab_) to change your FASTA into a tab-delimited file. Then use the next tool (change_tab_to_fasta) to change your files back. To use a tool: change the input and output file names on the website, then cut and paste the Perl script from the green box into a CMD window. The script works one sequence at a time, so it doesn't need a lot of memory. (As long as you have enough disk space to store the tab-delimited copy). The recreated FASTAs will be 60 characters per line (although you can hand-edit the line after you paste it to be whatever number of characters you'd like). Let me know if you have a problem. -Amir Karger Life Sciences Research Computing, FAS IT Harvard University From Kevin.M.Brown at asu.edu Wed Dec 9 10:26:22 2009 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Wed, 9 Dec 2009 08:26:22 -0700 Subject: [Bioperl-l] fasta format In-Reply-To: <1B12003244CE894E85B4726023637888055929@FASXCH01.fasmail.priv> References: <36E9C2F3282347918FD3B3ACA0EC8126@jonas> <1B12003244CE894E85B4726023637888055929@FASXCH01.fasmail.priv> Message-ID: <1A4207F8295607498283FE9E93B775B4066B4D53@EX02.asurite.ad.asu.edu> Even easier to accomplish in one step. Read in the fasta file and output it right to another fasta file with SeqIO my $in = Bio::SeqIO->new(-format=>'fasta',-file=>$file); my $out = Bio::SeqIO->new(-format=>'fasta',-file=>'>file.fasta'); while (my $seq = $in->next){$out->write_seq($seq);} Kevin Brown Center for Innovations in Medicine Biodesign Institute Arizona State University > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Amir Karger > Sent: Wednesday, December 09, 2009 8:02 AM > To: Jonas Schaer; bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] fasta format > > > Is there any way to use these fasta files with diffrent length of > > lines with this fasta.pm module or will i have to change the format > > of my fasta-files(big databases...) ? > > > > Jonas, > > It's not Bioperl, but for a quick fix you can use the > Scriptome. Use the change_fasta_to_tab script > (http://sysbio.harvard.edu/csb/resources/computational/scripto > me/Windows/Tools/Change.html#change_a_fasta_file_into_tabular_ > format__change_fasta_to_tab_) to change your FASTA into a > tab-delimited file. Then use the next tool > (change_tab_to_fasta) to change your files back. > > To use a tool: change the input and output file names on the > website, then cut and paste the Perl script from the green > box into a CMD window. The script works one sequence at a > time, so it doesn't need a lot of memory. (As long as you > have enough disk space to store the tab-delimited copy). > > The recreated FASTAs will be 60 characters per line (although > you can hand-edit the line after you paste it to be whatever > number of characters you'd like). > > Let me know if you have a problem. > > -Amir Karger > Life Sciences Research Computing, FAS IT > Harvard University > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From Russell.Smithies at agresearch.co.nz Wed Dec 9 14:44:41 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 10 Dec 2009 08:44:41 +1300 Subject: [Bioperl-l] fasta format In-Reply-To: <1A4207F8295607498283FE9E93B775B4066B4D53@EX02.asurite.ad.asu.edu> References: <36E9C2F3282347918FD3B3ACA0EC8126@jonas> <1B12003244CE894E85B4726023637888055929@FASXCH01.fasmail.priv> <1A4207F8295607498283FE9E93B775B4066B4D53@EX02.asurite.ad.asu.edu> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32B6603815F@exchsth.agresearch.co.nz> It's even easier as the script is already written for you :-) bp_seqconvert.pl --from fasta --to fasta < file.in.fa > file.out.fa --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Kevin Brown > Sent: Thursday, 10 December 2009 4:26 a.m. > To: bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] fasta format > > Even easier to accomplish in one step. Read in the fasta file and output > it right to another fasta file with SeqIO > > my $in = Bio::SeqIO->new(-format=>'fasta',-file=>$file); > my $out = Bio::SeqIO->new(-format=>'fasta',-file=>'>file.fasta'); > while (my $seq = $in->next){$out->write_seq($seq);} > > Kevin Brown > Center for Innovations in Medicine > Biodesign Institute > Arizona State University > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org > > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Amir Karger > > Sent: Wednesday, December 09, 2009 8:02 AM > > To: Jonas Schaer; bioperl-l at bioperl.org > > Subject: Re: [Bioperl-l] fasta format > > > > > Is there any way to use these fasta files with diffrent length of > > > lines with this fasta.pm module or will i have to change the format > > > of my fasta-files(big databases...) ? > > > > > > > Jonas, > > > > It's not Bioperl, but for a quick fix you can use the > > Scriptome. Use the change_fasta_to_tab script > > (http://sysbio.harvard.edu/csb/resources/computational/scripto > > me/Windows/Tools/Change.html#change_a_fasta_file_into_tabular_ > > format__change_fasta_to_tab_) to change your FASTA into a > > tab-delimited file. Then use the next tool > > (change_tab_to_fasta) to change your files back. > > > > To use a tool: change the input and output file names on the > > website, then cut and paste the Perl script from the green > > box into a CMD window. The script works one sequence at a > > time, so it doesn't need a lot of memory. (As long as you > > have enough disk space to store the tab-delimited copy). > > > > The recreated FASTAs will be 60 characters per line (although > > you can hand-edit the line after you paste it to be whatever > > number of characters you'd like). > > > > Let me know if you have a problem. > > > > -Amir Karger > > Life Sciences Research Computing, FAS IT > > Harvard University > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From maj at fortinbras.us Wed Dec 9 15:18:08 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 9 Dec 2009 15:18:08 -0500 Subject: [Bioperl-l] fasta format In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32B6603815F@exchsth.agresearch.co.nz> References: <36E9C2F3282347918FD3B3ACA0EC8126@jonas><1B12003244CE894E85B4726023637888055929@FASXCH01.fasmail.priv><1A4207F8295607498283FE9E93B775B4066B4D53@EX02.asurite.ad.asu.edu> <18DF7D20DFEC044098A1062202F5FFF32B6603815F@exchsth.agresearch.co.nz> Message-ID: <5C992E6556584BDFBF39604FDEA8ECE0@NewLife> $ perl -MPerlIO::via::SeqIO -e 'open($f, "<:via(SeqIO)", shift); open($g, ">:via(SeqIO::fasta)", shift); while (<$f>) { print $g $_; }' in.fas out.fas ----- Original Message ----- From: "Smithies, Russell" To: "'Kevin Brown'" ; Sent: Wednesday, December 09, 2009 2:44 PM Subject: Re: [Bioperl-l] fasta format > It's even easier as the script is already written for you :-) > > bp_seqconvert.pl --from fasta --to fasta < file.in.fa > file.out.fa > > > --Russell > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Kevin Brown >> Sent: Thursday, 10 December 2009 4:26 a.m. >> To: bioperl-l at bioperl.org >> Subject: Re: [Bioperl-l] fasta format >> >> Even easier to accomplish in one step. Read in the fasta file and output >> it right to another fasta file with SeqIO >> >> my $in = Bio::SeqIO->new(-format=>'fasta',-file=>$file); >> my $out = Bio::SeqIO->new(-format=>'fasta',-file=>'>file.fasta'); >> while (my $seq = $in->next){$out->write_seq($seq);} >> >> Kevin Brown >> Center for Innovations in Medicine >> Biodesign Institute >> Arizona State University >> >> > -----Original Message----- >> > From: bioperl-l-bounces at lists.open-bio.org >> > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Amir Karger >> > Sent: Wednesday, December 09, 2009 8:02 AM >> > To: Jonas Schaer; bioperl-l at bioperl.org >> > Subject: Re: [Bioperl-l] fasta format >> > >> > > Is there any way to use these fasta files with diffrent length of >> > > lines with this fasta.pm module or will i have to change the format >> > > of my fasta-files(big databases...) ? >> > > >> > >> > Jonas, >> > >> > It's not Bioperl, but for a quick fix you can use the >> > Scriptome. Use the change_fasta_to_tab script >> > (http://sysbio.harvard.edu/csb/resources/computational/scripto >> > me/Windows/Tools/Change.html#change_a_fasta_file_into_tabular_ >> > format__change_fasta_to_tab_) to change your FASTA into a >> > tab-delimited file. Then use the next tool >> > (change_tab_to_fasta) to change your files back. >> > >> > To use a tool: change the input and output file names on the >> > website, then cut and paste the Perl script from the green >> > box into a CMD window. The script works one sequence at a >> > time, so it doesn't need a lot of memory. (As long as you >> > have enough disk space to store the tab-delimited copy). >> > >> > The recreated FASTAs will be 60 characters per line (although >> > you can hand-edit the line after you paste it to be whatever >> > number of characters you'd like). >> > >> > Let me know if you have a problem. >> > >> > -Amir Karger >> > Life Sciences Research Computing, FAS IT >> > Harvard University >> > >> > >> > >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From kellert at ohsu.edu Wed Dec 9 19:36:13 2009 From: kellert at ohsu.edu (Tom Keller) Date: Wed, 9 Dec 2009 16:36:13 -0800 Subject: [Bioperl-l] how to map ensembl id to NCBI gi Message-ID: <435849B7-B66E-4553-988B-0645775E785E@ohsu.edu> Greetings, Is there a simple way to map a list of ensembl ids to the NCBI gis? thanks, Tom Thomas (Tom) Keller kellert at ohsu.edu 503.494.2442 6339b R Jones Hall (BSc/CROET) www.ohsu.edu/xd/research/research-cores/dna-analysis/ From cjfields at illinois.edu Wed Dec 9 20:59:37 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 9 Dec 2009 19:59:37 -0600 Subject: [Bioperl-l] how to map ensembl id to NCBI gi In-Reply-To: <435849B7-B66E-4553-988B-0645775E785E@ohsu.edu> References: <435849B7-B66E-4553-988B-0645775E785E@ohsu.edu> Message-ID: <14495B1F-911C-4FE7-8224-A3F050F7E03C@illinois.edu> Tom, Probably best to do this via BioMart: http://www.ensembl.org/biomart/ I would assume you can also do this via the ensembl perl API as well. Also, have a look at the UniProt ID Mapper: http://www.uniprot.org/?tab=mapping chris On Dec 9, 2009, at 6:36 PM, Tom Keller wrote: > Greetings, > Is there a simple way to map a list of ensembl ids to the NCBI gis? > > thanks, > Tom > > Thomas (Tom) Keller > kellert at ohsu.edu > 503.494.2442 > 6339b R Jones Hall (BSc/CROET) > www.ohsu.edu/xd/research/research-cores/dna-analysis/ > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From lovebaby39 at gmail.com Thu Dec 10 09:22:14 2009 From: lovebaby39 at gmail.com (Hsueh) Date: Thu, 10 Dec 2009 22:22:14 +0800 Subject: [Bioperl-l] about bioperl issue Message-ID: <5F281DC3E4514B3AAA8881169B240227@SHAPC> Dear The following is code. -------------------------------------------------------------------------------- my at params_rb = ( 'program' => 'blastn', 'database' => 'DB\\RB_GUS\\RB_GUS'); my $factory_rb = Bio::Tools::Run::StandAloneBlast->new(@params_rb); my $input_rb = Bio::Seq->new(-id =>"test_query", -seq => $testline2); my $blast_report_rb = $factory_rb->blastall($input_rb); while (my $result_rb = $blast_report_rb-> next_result ) { while (my $hit_rb = $result_rb->next_hit()){ while (my $hsp_rb = $hit_rb->next_hsp()){ print $hit_rb->name,"\nevalue = " , $hsp_rb->evalue , "\t score = " , $hsp_rb->score , "\n" ; #print " ",$hit->name,"\n"; } } } -------------------------------------------------------------------------------- I know how to get "name", "evalue" and "score", but I don't know how to get the word which is in red color. (or please see attachment.) ------------------------------------------------------------------------------------------------------------------ Query: 147 ctctttactcttaggtttacccgccggggtatcgtggcaaacaaggatagtttaaacaga 206 |||||| |||||||||||||||||| |||| || |||||| |||||||||||| || Sbjct: 114 ctcttttctcttaggtttacccgccaatatatcctgtcaaacactgatagtttaaactga 173 ------------------------------------------------------------------------------------------------------------------ I will appreciate if you could tell me how to do it. Thank you. Reginald Hsueh -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: R20080801-1.seq.txt URL: From SMarkel at accelrys.com Thu Dec 10 09:47:36 2009 From: SMarkel at accelrys.com (Scott Markel) Date: Thu, 10 Dec 2009 06:47:36 -0800 Subject: [Bioperl-l] about bioperl issue In-Reply-To: <5F281DC3E4514B3AAA8881169B240227@SHAPC> References: <5F281DC3E4514B3AAA8881169B240227@SHAPC> Message-ID: <5ACBA19439E77B43A06F4CAB897EC977067C6E@EXCH1-COLO.accelrys.net> Reginald, I didn't see anything highlighted in red but the three strings in the pairwise alignment display can be obtained from an HSP using $hsp->query_string() $hsp->hit_string() $hsp->homology_string() Scott Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel at accelrys.com Accelrys (SciTegic R&D) mobile: +1 858 205 3653 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 San Diego, CA 92121 fax: +1 858 799 5222 USA web: http://www.accelrys.com http://www.linkedin.com/in/smarkel Vice President, Board of Directors: International Society for Computational Biology Chair: ISCB Publications Committee Associate Editor: PLoS Computational Biology Editorial Board: Briefings in Bioinformatics -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hsueh Sent: Thursday, 10 December 2009 6:22 AM To: bioperl-l at bioperl.org Subject: [Bioperl-l] about bioperl issue Importance: High Dear The following is code. -------------------------------------------------------------------------------- my at params_rb = ( 'program' => 'blastn', 'database' => 'DB\\RB_GUS\\RB_GUS'); my $factory_rb = Bio::Tools::Run::StandAloneBlast->new(@params_rb); my $input_rb = Bio::Seq->new(-id =>"test_query", -seq => $testline2); my $blast_report_rb = $factory_rb->blastall($input_rb); while (my $result_rb = $blast_report_rb-> next_result ) { while (my $hit_rb = $result_rb->next_hit()){ while (my $hsp_rb = $hit_rb->next_hsp()){ print $hit_rb->name,"\nevalue = " , $hsp_rb->evalue , "\t score = " , $hsp_rb->score , "\n" ; #print " ",$hit->name,"\n"; } } } -------------------------------------------------------------------------------- I know how to get "name", "evalue" and "score", but I don't know how to get the word which is in red color. (or please see attachment.) ------------------------------------------------------------------------------------------------------------------ Query: 147 ctctttactcttaggtttacccgccggggtatcgtggcaaacaaggatagtttaaacaga 206 |||||| |||||||||||||||||| |||| || |||||| |||||||||||| || Sbjct: 114 ctcttttctcttaggtttacccgccaatatatcctgtcaaacactgatagtttaaactga 173 ------------------------------------------------------------------------------------------------------------------ I will appreciate if you could tell me how to do it. Thank you. Reginald Hsueh From David.Messina at sbc.su.se Thu Dec 10 10:09:31 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 10 Dec 2009 16:09:31 +0100 Subject: [Bioperl-l] about bioperl issue In-Reply-To: <5F281DC3E4514B3AAA8881169B240227@SHAPC> References: <5F281DC3E4514B3AAA8881169B240227@SHAPC> Message-ID: <107080B6-BC05-470C-B426-5DB69BD574C1@sbc.su.se> Hi Reginald, None of the words in your email or the attachment are colored red ? unfortunately any kind of formatting tends to get removed from emails send to mailing lists. Could you be more specific about what part of the blast report you are not able to get? You could even just copy and paste that particular bit of the report into your reply if it's not clear what to call it. Dave From David.Messina at sbc.su.se Thu Dec 10 10:36:49 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 10 Dec 2009 16:36:49 +0100 Subject: [Bioperl-l] about bioperl issue In-Reply-To: <9DEC7152C11A4F00B2F919B653E6D572@SHAPC> References: <5F281DC3E4514B3AAA8881169B240227@SHAPC> <107080B6-BC05-470C-B426-5DB69BD574C1@sbc.su.se> <9DEC7152C11A4F00B2F919B653E6D572@SHAPC> Message-ID: <15F92119-7625-4491-899A-0D49CE1BC861@sbc.su.se> Hi Reginald, Please keep all replies on the list so that everyone can follow the thread. In a separate email, Scott gave the answer you were looking for, I think. Namely: $hsp->query_string() OR $hsp->hit_string() Dave On Dec 10, 2009, at 16:31, Hsueh wrote: > Dear Dave Messina > > I need to get the string that is "ctctttactcttaggtttacccgccggggtatcgtggcaaacaaggatagtttaaacaga". > > Thank you > > Reginald Hsueh > > ------------------------------------------------------------------------------------------------------------------------------ > Query: 147 ctctttactcttaggtttacccgccggggtatcgtggcaaacaaggatagtttaaacaga 206 > |||||| |||||||||||||||||| |||| || |||||| |||||||||||| || > Sbjct: 114 ctcttttctcttaggtttacccgccaatatatcctgtcaaacactgatagtttaaactga 173 > ------------------------------------------------------------------------------------------------------------------------------ > > > > > -------------------------------------------------- > From: "Dave Messina" > Sent: Thursday, December 10, 2009 11:09 PM > To: "Hsueh" > Cc: > Subject: Re: [Bioperl-l] about bioperl issue > >> Hi Reginald, >> >> None of the words in your email or the attachment are colored red ? unfortunately any kind of formatting tends to get removed from emails send to mailing lists. >> >> Could you be more specific about what part of the blast report you are not able to get? You could even just copy and paste that particular bit of the report into your reply if it's not clear what to call it. >> >> >> Dave From lovebaby39 at gmail.com Thu Dec 10 10:53:00 2009 From: lovebaby39 at gmail.com (Hsueh) Date: Thu, 10 Dec 2009 23:53:00 +0800 Subject: [Bioperl-l] about bioperl issue In-Reply-To: <15F92119-7625-4491-899A-0D49CE1BC861@sbc.su.se> References: <5F281DC3E4514B3AAA8881169B240227@SHAPC> <107080B6-BC05-470C-B426-5DB69BD574C1@sbc.su.se> <9DEC7152C11A4F00B2F919B653E6D572@SHAPC> <15F92119-7625-4491-899A-0D49CE1BC861@sbc.su.se> Message-ID: Dear Dave Messina Thank you for your replies. Reginald Hsueh -------------------------------------------------- From: "Dave Messina" Sent: Thursday, December 10, 2009 11:36 PM To: "Hsueh" Cc: Subject: Re: [Bioperl-l] about bioperl issue > Hi Reginald, > > Please keep all replies on the list so that everyone can follow the > thread. > > In a separate email, Scott gave the answer you were looking for, I think. > > Namely: > $hsp->query_string() > OR > $hsp->hit_string() > > > > Dave > > > > > On Dec 10, 2009, at 16:31, Hsueh wrote: > >> Dear Dave Messina >> >> I need to get the string that is >> "ctctttactcttaggtttacccgccggggtatcgtggcaaacaaggatagtttaaacaga". >> >> Thank you >> >> Reginald Hsueh >> >> ------------------------------------------------------------------------------------------------------------------------------ >> Query: 147 ctctttactcttaggtttacccgccggggtatcgtggcaaacaaggatagtttaaacaga >> 206 >> |||||| |||||||||||||||||| |||| || |||||| >> |||||||||||| || >> Sbjct: 114 ctcttttctcttaggtttacccgccaatatatcctgtcaaacactgatagtttaaactga >> 173 >> ------------------------------------------------------------------------------------------------------------------------------ >> >> >> >> >> -------------------------------------------------- >> From: "Dave Messina" >> Sent: Thursday, December 10, 2009 11:09 PM >> To: "Hsueh" >> Cc: >> Subject: Re: [Bioperl-l] about bioperl issue >> >>> Hi Reginald, >>> >>> None of the words in your email or the attachment are colored red ? >>> unfortunately any kind of formatting tends to get removed from emails >>> send to mailing lists. >>> >>> Could you be more specific about what part of the blast report you are >>> not able to get? You could even just copy and paste that particular bit >>> of the report into your reply if it's not clear what to call it. >>> >>> >>> Dave >>>>Dear >>>> >>>>The following is code. >>>> >>>> >>>>-------------------------------------------------------------------------------- >>>> >>>>my at params_rb = ( 'program' => 'blastn', >>>> 'database' => 'DB\\RB_GUS\\RB_GUS'); >>>>my $factory_rb = Bio::Tools::Run::StandAloneBlast->new(@params_rb); >>>> >>>>my $input_rb = Bio::Seq->new(-id =>"test_query", >>>> -seq => $testline2); >>>>my $blast_report_rb = $factory_rb->blastall($input_rb); >>>> >>>> while (my $result_rb = $blast_report_rb-> next_result ) { >>>> while (my $hit_rb = $result_rb->next_hit()){ >>>> while (my $hsp_rb = $hit_rb->next_hsp()){ >>>> print $hit_rb->name,"\nevalue = " , $hsp_rb->evalue , "\t score = " >>>> , $hsp_rb->score , "\n" ; >>>> #print " ",$hit->name,"\n"; >>>> } >>>> } >>>> } >>>> >>>>-------------------------------------------------------------------------------- >>>> >>>> >>>>I know how to get "name", "evalue" and "score", but I don't know how >>>>to get the word which is in red color. (or please see attachment.) >>>>------------------------------------------------------------------------------------------------------------------ >>>>Query: 147 ctctttactcttaggtttacccgccggggtatcgtggcaaacaaggatagtttaaacaga >>>>206 >>>> |||||| |||||||||||||||||| |||| || |||||| >>>> |||||||||||| || >>>>Sbjct: 114 >>>>ctcttttctcttaggtttacccgccaatatatcctgtcaaacactgatagtttaaactga 173 >>>>------------------------------------------------------------------------------------------------------------------ >>>> >>>>I will appreciate if you could tell me how to do it. >>>>Thank you. >>>> >>>>Reginald Hsueh From pg4 at sanger.ac.uk Thu Dec 10 15:50:40 2009 From: pg4 at sanger.ac.uk (Pablo Marin-Garcia) Date: Thu, 10 Dec 2009 20:50:40 +0000 (GMT) Subject: [Bioperl-l] how to map ensembl id to NCBI gi In-Reply-To: References: Message-ID: If you are mapping ensembl genes to NCBI genes (via ensemblaPI or biomart) please read this recent thread at ensembl-dev: http://listserver.ebi.ac.uk/mailing-lists-archives/ensembl-dev/msg05417.html Seems that the ensembl gene mapping to NCBI is done through translation so the noncoding genes do not have the corresponding NCBI gene mapped. -Pablo > ------------------------------ > > Message: 4 > Date: Wed, 9 Dec 2009 19:59:37 -0600 > From: Chris Fields > Subject: Re: [Bioperl-l] how to map ensembl id to NCBI gi > To: Tom Keller > Cc: BioPerl-List > Message-ID: <14495B1F-911C-4FE7-8224-A3F050F7E03C at illinois.edu> > Content-Type: text/plain; charset=us-ascii > > Tom, > > Probably best to do this via BioMart: > > http://www.ensembl.org/biomart/ > > I would assume you can also do this via the ensembl perl API as well. > > Also, have a look at the UniProt ID Mapper: > > http://www.uniprot.org/?tab=mapping > > chris > > On Dec 9, 2009, at 6:36 PM, Tom Keller wrote: > >> Greetings, >> Is there a simple way to map a list of ensembl ids to the NCBI gis? >> >> thanks, >> Tom >> >> Thomas (Tom) Keller >> kellert at ohsu.edu >> 503.494.2442 >> 6339b R Jones Hall (BSc/CROET) >> www.ohsu.edu/xd/research/research-cores/dna-analysis/ >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > ==================================================================== Pablo Marin-Garcia, PhD \\// (Argiope bruennichi \/\/`(||>O:'\/\/ with stabilimentum) //\\ Sanger Institute | PostDoc / Computer Biologist Wellcome Trust Genome Campus | team : 128/108 (Human Genetics) Hinxton, Cambridge CB10 1HH | room : N333 United Kingdom | email: pablo.marin at sanger.ac.uk ==================================================================== -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From umjsm at leeds.ac.uk Fri Dec 11 11:44:42 2009 From: umjsm at leeds.ac.uk (Joan Segura Mora) Date: Fri, 11 Dec 2009 16:44:42 +0000 Subject: [Bioperl-l] extract and write a pdb chain Message-ID: <1260549882.6484.11.camel@limm-pc1254> Hello, I am trying to do a very easy think but I don't get it. I want to write in a file a chain of a pdb. I have try a lot of thinks but what I think that it should work is the next script: use Bio::Structure::IO; use strict; my $structio = Bio::Structure::IO->new(-file => "101m.pdb",'-format' => 'pdb'); my $struc = $structio->next_structure; my $new_entry = Bio::Structure::Entry->new( -id => 'structure_id'); for my $chain ($struc->get_chains) { if($chain->id eq "A"){ $new_entry->chain($chain); last; } } my $out = Bio::Structure::IO->new(-file => ">out.pdb",'-format' => 'pdb');# $out->write_structure($new_entry); it doesn't. I get the next error: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: add_chain: first argument needs to be a Model object () STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:368 STACK: Bio::Structure::Entry::add_chain /usr/local/share/perl/5.8.8/Bio/Structure/Entry.pm:335 STACK: Bio::Structure::Entry::get_chains /usr/local/share/perl/5.8.8/Bio/Structure/Entry.pm:391 STACK: Bio::Structure::Entry::chain /usr/local/share/perl/5.8.8/Bio/Structure/Entry.pm:304 STACK: read_pdb.pl:10 ----------------------------------------------------------- As far I understand the documentation, the method chain of the object Bio::Structure::Entry requires an as input an object of type Chain. Any solution will be very welcome. best regards, Joan From wkretzsch at gmail.com Fri Dec 11 14:22:31 2009 From: wkretzsch at gmail.com (Warren W. Kretzschmar) Date: Fri, 11 Dec 2009 14:22:31 -0500 Subject: [Bioperl-l] Proposed project: SeqIO module for msOUT files generated by Hudson's ms Message-ID: <5d2ac05c0912111122p1fea0961rfff0f1cf7aa8f97f@mail.gmail.com> Hi, I'm new to the bioperl community. I've created a perl module that reads in msOUT files generated by Hudson's ms. As far as I understand, there is no SeqIO module to read and output these files? If so, I propose to create a module that does this. Any suggestions? Thanks, Warren Kretzschmar From maj at fortinbras.us Fri Dec 11 14:59:53 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 11 Dec 2009 14:59:53 -0500 Subject: [Bioperl-l] Proposed project: SeqIO module for msOUT filesgenerated by Hudson's ms In-Reply-To: <5d2ac05c0912111122p1fea0961rfff0f1cf7aa8f97f@mail.gmail.com> References: <5d2ac05c0912111122p1fea0961rfff0f1cf7aa8f97f@mail.gmail.com> Message-ID: <07382508ED0B41F4B8289813B734239B@NewLife> Hi Warren, I say go for it. You'll want to have a look at http://bio.perl.org/wiki/Advanced_BioPerl which explains most of our tips and "policies" for prospective code contributors, as well as http://bio.perl.org/wiki/HOWTO:SeqIO which details SeqIO from the user's perspective. Look carefully at some Bio::SeqIO::* modules for implementation details. If you have code to propose, use http://bugzilla.bioperl.org and enter a new enhancement, where you can upload your module for us to review. MAJ ----- Original Message ----- From: "Warren W. Kretzschmar" To: Sent: Friday, December 11, 2009 2:22 PM Subject: [Bioperl-l] Proposed project: SeqIO module for msOUT filesgenerated by Hudson's ms > Hi, > I'm new to the bioperl community. I've created a perl module that > reads in msOUT files generated by Hudson's ms. As far as I > understand, there is no SeqIO module to read and output these files? > If so, I propose to create a module that does this. Any suggestions? > > Thanks, > Warren Kretzschmar > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From bosborne11 at verizon.net Fri Dec 11 15:37:45 2009 From: bosborne11 at verizon.net (Brian Osborne) Date: Fri, 11 Dec 2009 15:37:45 -0500 Subject: [Bioperl-l] extract and write a pdb chain In-Reply-To: <1260549882.6484.11.camel@limm-pc1254> References: <1260549882.6484.11.camel@limm-pc1254> Message-ID: Joan, It looks to me like the first argument to the add_chain() method has to be a Model object, the second is the Chain itself. See Structure/ Entry.pm, for example. However if you're seeing some documentation that says something else then tell us where, it needs to be corrected. In Bio::Structure an Entry consists of one or Models, each of which has one or more Chains. This allows you to build macromolecular complexes (an Entry), which could have more than one defined proteins or protein complexes (Models). Brian O. On Dec 11, 2009, at 11:44 AM, Joan Segura Mora wrote: > Hello, > > I am trying to do a very easy think but I don't get it. I want to > write > in a file a chain of a pdb. I have try a lot of thinks but what I > think > that it should work is the next script: > > use Bio::Structure::IO; > use strict; > > my $structio = Bio::Structure::IO->new(-file => "101m.pdb",'-format' > => > 'pdb'); > my $struc = $structio->next_structure; > > my $new_entry = Bio::Structure::Entry->new( -id => 'structure_id'); > > for my $chain ($struc->get_chains) { > if($chain->id eq "A"){ > $new_entry->chain($chain); > last; > } > } > > my $out = Bio::Structure::IO->new(-file => ">out.pdb",'-format' => > 'pdb');# > $out->write_structure($new_entry); > > it doesn't. I get the next error: > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: add_chain: first argument needs to be a Model object () > > STACK: Error::throw > STACK: > Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/Root.pm: > 368 > STACK: > Bio::Structure::Entry::add_chain /usr/local/share/perl/5.8.8/Bio/ > Structure/Entry.pm:335 > STACK: > Bio::Structure::Entry::get_chains /usr/local/share/perl/5.8.8/Bio/ > Structure/Entry.pm:391 > STACK: > Bio::Structure::Entry::chain /usr/local/share/perl/5.8.8/Bio/ > Structure/Entry.pm:304 > STACK: read_pdb.pl:10 > ----------------------------------------------------------- > > As far I understand the documentation, the method chain of the object > Bio::Structure::Entry requires an as input an object of type Chain. > > Any solution will be very welcome. > > best regards, > Joan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From awitney at sgul.ac.uk Sun Dec 13 16:48:13 2009 From: awitney at sgul.ac.uk (Adam Witney) Date: Sun, 13 Dec 2009 21:48:13 +0000 Subject: [Bioperl-l] combining tree image with heatmap Message-ID: <4B25611D.6050009@sgul.ac.uk> I am trying to draw a tree on the side of a heatmap image, much like you see after clustering data. I was wondering if anyone has managed to do this using bioperl? I can draw the two separately, but can't quite seem to work out how to put the two together and get the nodes to line up with the correct row of clustering data. Is there any particular module to look at? thanks for any help adam From dhwani1030 at gmail.com Sat Dec 12 15:04:01 2009 From: dhwani1030 at gmail.com (dhwani gandhi) Date: Sat, 12 Dec 2009 15:04:01 -0500 Subject: [Bioperl-l] Bioperl code help Message-ID: Hi, I am very new to Bioperl but I am somewhat familiar to perl though. I write my perl programs in Notepad++ and run them in cmd. Now, I want to run Bioperl programs. I just installed bioperl on my computer. And I have a program using bioperl modules in Notepad++. My question is how to run these programs? Can they be ran in cmd as well? or do I use ppm? Please help. Thanks, -Dhwani Gandhi. From eric_donaldson at med.unc.edu Sun Dec 13 18:15:24 2009 From: eric_donaldson at med.unc.edu (eric_donaldson at med.unc.edu) Date: Sun, 13 Dec 2009 18:15:24 -0500 Subject: [Bioperl-l] problem with install Message-ID: Hello, Today I downloaded bioperl 1.61 on my new macbook pro using fink.? I used the fink install bioperl.pm-588 as I could not get it to instal using the perl version 5.10. But now I get an error when trying to run a bioperl script. Here is the error: Can't locate Bio/Tools/BPlite.pm in @INC (@INC contains: /sw/lib/perl5/darwin-thread-multi-2level /sw/lib/perl5 /sw/lib/perl5/darwin /Library/Perl/Updates/5.10.0 /System/Library/Perl/5.10.0/darwin-thread-multi-2level /System/Library/Perl/5.10.0 /Library/Perl/5.10.0/darwin-thread-multi-2level /Library/Perl/5.10.0 /Network/Library/Perl/5.10.0/darwin-thread-multi-2level /Network/Library/Perl/5.10.0 /Network/Library/Perl /System/Library/Perl/Extras/5.10.0/darwin-thread-multi-2level /System/Library/Perl/Extras/5.10.0 .) at blastparser.pl line 8. BEGIN failed--compilation aborted at blastparser.pl line 8. I am a novice at unix and bioperl so I do not know how to troubleshoot this, would you please hleo me? Thank you, Eric Eric F. Donaldson, Ph.D. Research Assistant Professor, Ralph Baric Lab University of North Carolina Department of Epidemiology -------------- next part -------------- begin:vcard n:Donaldson;Eric fn:Eric F. Donaldson, PhD tel;work:919.966.3881 org:University of North Carolina, School of Medicine;Epidemiology adr:;;2107 McGavran-Greenberg Hall CB# 7435 ;Chapel Hill;NC;27599;USA email;internet:eric_donaldson at med.unc.edu email;home;internet:viralnerd at gmail.com title:Research Assistant Professor version:2.1 end:vcard From jason at bioperl.org Sun Dec 13 20:24:26 2009 From: jason at bioperl.org (Jason Stajich) Date: Sun, 13 Dec 2009 17:24:26 -0800 Subject: [Bioperl-l] problem with install In-Reply-To: References: Message-ID: <119F436D-D36D-4D28-BAE7-6EB17D665FC2@bioperl.org> Hi Eric - Bio::Tools::BPlite is no longer supported in Bioperl - it was deprecated several releases ago. It was replaced with Bio::SearchIO -jason On Dec 13, 2009, at 3:15 PM, eric_donaldson at med.unc.edu wrote: > Hello, > > Today I downloaded bioperl 1.61 on my new macbook pro using fink. I > used the > > fink install bioperl.pm-588 as I could not get it to instal using > the perl version 5.10. > > But now I get an error when trying to run a bioperl script. > > Here is the error: > > Can't locate Bio/Tools/BPlite.pm in @INC (@INC contains: /sw/lib/ > perl5/darwin-thread-multi-2level /sw/lib/perl5 /sw/lib/perl5/darwin / > Library/Perl/Updates/5.10.0 /System/Library/Perl/5.10.0/darwin- > thread-multi-2level /System/Library/Perl/5.10.0 /Library/Perl/5.10.0/ > darwin-thread-multi-2level /Library/Perl/5.10.0 /Network/Library/ > Perl/5.10.0/darwin-thread-multi-2level /Network/Library/Perl/5.10.0 / > Network/Library/Perl /System/Library/Perl/Extras/5.10.0/darwin- > thread-multi-2level /System/Library/Perl/Extras/5.10.0 .) at > blastparser.pl line 8. > BEGIN failed--compilation aborted at blastparser.pl line 8. > > > I am a novice at unix and bioperl so I do not know how to > troubleshoot this, would you please hleo me? > > Thank you, > > Eric > > > Eric F. Donaldson, Ph.D. > Research Assistant Professor, Ralph Baric Lab > University of North Carolina > Department of Epidemiology > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From jason at bioperl.org Sun Dec 13 23:09:45 2009 From: jason at bioperl.org (Jason Stajich) Date: Sun, 13 Dec 2009 20:09:45 -0800 Subject: [Bioperl-l] problem with install In-Reply-To: References: <119F436D-D36D-4D28-BAE7-6EB17D665FC2@bioperl.org> Message-ID: <404D2600-58D3-4491-834E-8C9F860D3ACC@bioperl.org> So you installed perl-5.10 or using system perl? I'm confused if you actually installed bioperl.pm or not via fink? It seems like since your @INC or $PERL5LIB points to /sw/lib/perl5 which is one of the dirs it would have installed in, but I don't think you actually installed bioperl. you can try and do: $ locate Bio/SearchIO.pm We'll see if any of the other osx/fink gurus are on the list that can help or you can install it via CPAN I guess. -jason On Dec 13, 2009, at 6:39 PM, eric_donaldson at med.unc.edu wrote: > > I actually tried a different blastparser that uses BIO::SearchIO and > got the same message: > > Can't locate Bio/SearchIO.pm in @INC (@INC contains: /sw/lib/perl5/ > darwin-thread-multi-2level /sw/lib/perl5 /sw/lib/perl5/darwin / > Library/Perl/Updates/5.10.0 /System/Library/Perl/5.10.0/darwin- > thread-multi-2level /System/Library/Perl/5.10.0 /Library/Perl/5.10.0/ > darwin-thread-multi-2level /Library/Perl/5.10.0 /Network/Library/ > Perl/5.10.0/darwin-thread-multi-2level /Network/Library/Perl/5.10.0 / > Network/Library/Perl /System/Library/Perl/Extras/5.10.0/darwin- > thread-multi-2level /System/Library/Perl/Extras/5.10.0 .) at > blastparser.new.pl line 8. > BEGIN failed--compilation aborted at blastparser.new.pl line 8. > > I suspect there is a path problem, but am not savvy enough to know > how to fix it. I am really just a hacker.... I have several scripts > that I use regularly and that I know how to modify, but am lost when > they don't work... > > Thanks for any help, > > Eric > > ----- Original Message ----- > From: Jason Stajich > Date: Sunday, December 13, 2009 8:24 pm > Subject: Re: [Bioperl-l] problem with install > To: eric_donaldson at med.unc.edu > Cc: bioperl-l at bioperl.org > >> Hi Eric - >> >> Bio::Tools::BPlite is no longer supported in Bioperl - it >> was >> deprecated several releases ago. >> It was replaced with Bio::SearchIO >> >> -jason >> On Dec 13, 2009, at 3:15 PM, eric_donaldson at med.unc.edu wrote: >> >>> Hello, >>> >>> Today I downloaded bioperl 1.61 on my new macbook pro using >> fink. I >>> used the >>> >>> fink install bioperl.pm-588 as I could not get it to instal >> using >>> the perl version 5.10. >>> >>> But now I get an error when trying to run a bioperl script. >>> >>> Here is the error: >>> >>> Can't locate Bio/Tools/BPlite.pm in @INC (@INC contains: >> /sw/lib/ >>> perl5/darwin-thread-multi-2level /sw/lib/perl5 >> /sw/lib/perl5/darwin / >>> Library/Perl/Updates/5.10.0 /System/Library/Perl/5.10.0/darwin- >> >>> thread-multi-2level /System/Library/Perl/5.10.0 >> /Library/Perl/5.10.0/ >>> darwin-thread-multi-2level /Library/Perl/5.10.0 >> /Network/Library/ >>> Perl/5.10.0/darwin-thread-multi-2level >> /Network/Library/Perl/5.10.0 / >>> Network/Library/Perl /System/Library/Perl/Extras/5.10.0/darwin- >> >>> thread-multi-2level /System/Library/Perl/Extras/5.10.0 .) >> at >>> blastparser.pl line 8. >>> BEGIN failed--compilation aborted at blastparser.pl line 8. >>> >>> >>> I am a novice at unix and bioperl so I do not know how >> to >>> troubleshoot this, would you please hleo me? >>> >>> Thank you, >>> >>> Eric >>> >>> >>> Eric F. Donaldson, Ph.D. >>> Research Assistant Professor, Ralph Baric Lab >>> University of North Carolina >>> Department of Epidemiology >>> >>> >>> >> < >> eric_donaldson.vcf>_______________________________________________> >> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Jason Stajich >> jason.stajich at gmail.com >> jason at bioperl.org >> >> > > Eric F. Donaldson, Ph.D. > Research Assistant Professor, Ralph Baric Lab > University of North Carolina > Department of Epidemiology > > > -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From jason at bioperl.org Mon Dec 14 00:10:54 2009 From: jason at bioperl.org (Jason Stajich) Date: Sun, 13 Dec 2009 21:10:54 -0800 Subject: [Bioperl-l] problem with install In-Reply-To: References: <119F436D-D36D-4D28-BAE7-6EB17D665FC2@bioperl.org> <404D2600-58D3-4491-834E-8C9F860D3ACC@bioperl.org> Message-ID: <7B2EBA9A-E9DF-49A5-ABC7-C42512BA9C9A@bioperl.org> Eric - please CC the bioperl list when responding so others can help - I can't be the only answerer. But since your @INC message doesn't include /sw/lib/perl5/5.8.8/ you would need to make sure that is added to your PERL5LIB. There are some help docs on the perl sites I expect on how to get your PATHs in order. Or you can just install via CPAN which will put it in the right path - there are docs on the bioperl website about installing via CPAN. -jason On Dec 13, 2009, at 9:02 PM, eric_donaldson at med.unc.edu wrote: > Hi Jason, > > The fink package did not have support for perl 5.10, so I attempted > to install the perl 5.8.6 package. > > When I attempted: locate Bio/SearchIO.pm > I got: -bash: $: command not found > > So even though I can find SearchIO.pm in sw/lib/perl5/5.8.8/Bio/ > SearchIO.pm I cannot access it. Do I need to use the older version > of perl? > > Would it be better to install with CPAN? If so, can you send me to > a page that has instructions? > > Thank you so much! > > ERic > > > ----- Original Message ----- > From: Jason Stajich > Date: Sunday, December 13, 2009 11:10 pm > Subject: Re: [Bioperl-l] problem with install > To: eric_donaldson at med.unc.edu > Cc: BioPerl List > >> So you installed perl-5.10 or using system perl? I'm >> confused if you >> actually installed bioperl.pm or not via fink? >> >> It seems like since your @INC or $PERL5LIB points to >> /sw/lib/perl5 >> which is one of the dirs it would have installed in, but I don't >> think >> you actually installed bioperl. >> >> you can try and do: >> $ locate Bio/SearchIO.pm >> >> We'll see if any of the other osx/fink gurus are on the list >> that can >> help or you can install it via CPAN I guess. >> >> -jason >> On Dec 13, 2009, at 6:39 PM, eric_donaldson at med.unc.edu wrote: >> >>> >>> I actually tried a different blastparser that uses >> BIO::SearchIO and >>> got the same message: >>> >>> Can't locate Bio/SearchIO.pm in @INC (@INC contains: >> /sw/lib/perl5/ >>> darwin-thread-multi-2level /sw/lib/perl5 /sw/lib/perl5/darwin >> / >>> Library/Perl/Updates/5.10.0 /System/Library/Perl/5.10.0/darwin- >> >>> thread-multi-2level /System/Library/Perl/5.10.0 >> /Library/Perl/5.10.0/ >>> darwin-thread-multi-2level /Library/Perl/5.10.0 >> /Network/Library/ >>> Perl/5.10.0/darwin-thread-multi-2level >> /Network/Library/Perl/5.10.0 / >>> Network/Library/Perl /System/Library/Perl/Extras/5.10.0/darwin- >> >>> thread-multi-2level /System/Library/Perl/Extras/5.10.0 .) >> at >>> blastparser.new.pl line 8. >>> BEGIN failed--compilation aborted at blastparser.new.pl line 8. >>> >>> I suspect there is a path problem, but am not savvy enough to >> know >>> how to fix it. I am really just a hacker.... I have >> several scripts >>> that I use regularly and that I know how to modify, but am >> lost when >>> they don't work... >>> >>> Thanks for any help, >>> >>> Eric >>> >>> ----- Original Message ----- >>> From: Jason Stajich >>> Date: Sunday, December 13, 2009 8:24 pm >>> Subject: Re: [Bioperl-l] problem with install >>> To: eric_donaldson at med.unc.edu >>> Cc: bioperl-l at bioperl.org >>> >>>> Hi Eric - >>>> >>>> Bio::Tools::BPlite is no longer supported in Bioperl - it >>>> was >>>> deprecated several releases ago. >>>> It was replaced with Bio::SearchIO >>>> >>>> -jason >>>> On Dec 13, 2009, at 3:15 PM, eric_donaldson at med.unc.edu wrote: >>>> >>>>> Hello, >>>>> >>>>> Today I downloaded bioperl 1.61 on my new macbook pro using >>>> fink. I >>>>> used the >>>>> >>>>> fink install bioperl.pm-588 as I could not get it to instal >>>> using >>>>> the perl version 5.10. >>>>> >>>>> But now I get an error when trying to run a bioperl script. >>>>> >>>>> Here is the error: >>>>> >>>>> Can't locate Bio/Tools/BPlite.pm in @INC (@INC contains: >>>> /sw/lib/ >>>>> perl5/darwin-thread-multi-2level /sw/lib/perl5 >>>> /sw/lib/perl5/darwin / >>>>> Library/Perl/Updates/5.10.0 >> /System/Library/Perl/5.10.0/darwin- >>>> >>>>> thread-multi-2level /System/Library/Perl/5.10.0 >>>> /Library/Perl/5.10.0/ >>>>> darwin-thread-multi-2level /Library/Perl/5.10.0 >>>> /Network/Library/ >>>>> Perl/5.10.0/darwin-thread-multi-2level >>>> /Network/Library/Perl/5.10.0 / >>>>> Network/Library/Perl >> /System/Library/Perl/Extras/5.10.0/darwin- >>>> >>>>> thread-multi-2level /System/Library/Perl/Extras/5.10.0 .) >>>> at >>>>> blastparser.pl line 8. >>>>> BEGIN failed--compilation aborted at blastparser.pl line 8. >>>>> >>>>> >>>>> I am a novice at unix and bioperl so I do not know how >>>> to >>>>> troubleshoot this, would you please hleo me? >>>>> >>>>> Thank you, >>>>> >>>>> Eric >>>>> >>>>> >>>>> Eric F. Donaldson, Ph.D. >>>>> Research Assistant Professor, Ralph Baric Lab >>>>> University of North Carolina >>>>> Department of Epidemiology >>>>> >>>>> >>>>> >>>> < >>>> >> eric_donaldson.vcf>_______________________________________________> >>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> -- >>>> Jason Stajich >>>> jason.stajich at gmail.com >>>> jason at bioperl.org >>>> >>>> >>> >>> Eric F. Donaldson, Ph.D. >>> Research Assistant Professor, Ralph Baric Lab >>> University of North Carolina >>> Department of Epidemiology >>> >>> >>> >> >> -- >> Jason Stajich >> jason.stajich at gmail.com >> jason at bioperl.org >> >> > > Eric F. Donaldson, Ph.D. > Research Assistant Professor, Ralph Baric Lab > University of North Carolina > Department of Epidemiology > > > -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From awitney at sgul.ac.uk Mon Dec 14 04:36:19 2009 From: awitney at sgul.ac.uk (Adam Witney) Date: Mon, 14 Dec 2009 09:36:19 +0000 Subject: [Bioperl-l] Bioperl code help In-Reply-To: References: Message-ID: <4B260713.3070402@sgul.ac.uk> bioperl programs are just perl programs so you should run them in exactly the same way as your perl prorgrams, from the command line HTH adam On 12/12/2009 20:04, dhwani gandhi wrote: > Hi, > I am very new to Bioperl but I am somewhat familiar to perl though. > > I write my perl programs in Notepad++ and run them in cmd. > > Now, I want to run Bioperl programs. I just installed bioperl on my > computer. And I have a program using bioperl modules in Notepad++. > > My question is how to run these programs? Can they be ran in cmd as well? or > do I use ppm? > > Please help. > > Thanks, > -Dhwani Gandhi. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From umjsm at leeds.ac.uk Mon Dec 14 05:39:32 2009 From: umjsm at leeds.ac.uk (Joan Segura Mora) Date: Mon, 14 Dec 2009 10:39:32 +0000 Subject: [Bioperl-l] extract and write a pdb chain In-Reply-To: References: <1260549882.6484.11.camel@limm-pc1254> Message-ID: <1260787172.7359.0.camel@limm-pc1254> Hi Brian, I am not calling the method add_chain, I am calling the method chain http://doc.bioperl.org/releases/bioperl-1.0.1/Bio/Structure/Entry.html#POD6 and if I don't use as an argument an object of type Bio::Structure::Chain I get an error like this (-->depends of the argument<--) ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Supplied a -->Bio::Structure::Residue=HASH(0x11be6a0)<-- to chain, we want a Bio::Structure::Chain or a list of these STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:368 STACK: Bio::Structure::Entry::chain /usr/local/share/perl/5.8.8/Bio/Structure/Entry.pm:314 STACK: read_pdb.pl:11 ----------------------------------------------------------- And if I use a Chain object I get the error that I told you. I have try this code: use Bio::Structure::IO; use strict; my $structio = Bio::Structure::IO->new(-file => "101m.pdb",'-format' => 'pdb'); my $struc = $structio->next_structure; my $new_entry = Bio::Structure::Entry->new( -id => 'structure_id'); my $model = Bio::Structure::Model->new( -id => '0'); for my $chain ($struc->get_chains) { if($chain->id eq "A"){ $new_entry->add_chain($model,$chain); last; } } $new_entry->add_model($model); my $out = Bio::Structure::IO->new(-file => ">out.pdb",'-format' => 'pdb'); $out->write_structure($new_entry); But I get an empty pdb HEADER DEFAULT CLASSIFICATION 24-JAN-70 stru REMARK 1 TER 1 A 0 MASTER END I am trying a lot of combinations, but I can't write a single chain into a file. I don't know what I am doing wrong. Thanks for helping regards, Joan On Fri, 2009-12-11 at 15:37 -0500, Brian Osborne wrote: > Joan, > > It looks to me like the first argument to the add_chain() method has > to be a Model object, the second is the Chain itself. See Structure/ > Entry.pm, for example. However if you're seeing some documentation > that says something else then tell us where, it needs to be corrected. > > In Bio::Structure an Entry consists of one or Models, each of which > has one or more Chains. This allows you to build macromolecular > complexes (an Entry), which could have more than one defined proteins > or protein complexes (Models). > > Brian O. > > On Dec 11, 2009, at 11:44 AM, Joan Segura Mora wrote: > > > Hello, > > > > I am trying to do a very easy think but I don't get it. I want to > > write > > in a file a chain of a pdb. I have try a lot of thinks but what I > > think > > that it should work is the next script: > > > > use Bio::Structure::IO; > > use strict; > > > > my $structio = Bio::Structure::IO->new(-file => "101m.pdb",'-format' > > => > > 'pdb'); > > my $struc = $structio->next_structure; > > > > my $new_entry = Bio::Structure::Entry->new( -id => 'structure_id'); > > > > for my $chain ($struc->get_chains) { > > if($chain->id eq "A"){ > > $new_entry->chain($chain); > > last; > > } > > } > > > > my $out = Bio::Structure::IO->new(-file => ">out.pdb",'-format' => > > 'pdb');# > > $out->write_structure($new_entry); > > > > it doesn't. I get the next error: > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: add_chain: first argument needs to be a Model object () > > > > STACK: Error::throw > > STACK: > > Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/Root.pm: > > 368 > > STACK: > > Bio::Structure::Entry::add_chain /usr/local/share/perl/5.8.8/Bio/ > > Structure/Entry.pm:335 > > STACK: > > Bio::Structure::Entry::get_chains /usr/local/share/perl/5.8.8/Bio/ > > Structure/Entry.pm:391 > > STACK: > > Bio::Structure::Entry::chain /usr/local/share/perl/5.8.8/Bio/ > > Structure/Entry.pm:304 > > STACK: read_pdb.pl:10 > > ----------------------------------------------------------- > > > > As far I understand the documentation, the method chain of the object > > Bio::Structure::Entry requires an as input an object of type Chain. > > > > Any solution will be very welcome. > > > > best regards, > > Joan > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From fs5 at sanger.ac.uk Mon Dec 14 07:18:17 2009 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Mon, 14 Dec 2009 12:18:17 +0000 Subject: [Bioperl-l] parse EMBL Feature Table only Message-ID: <1260793098.17180.184.camel@deskpro15336.dynamic.sanger.ac.uk> Hi, Maybe I'm really missing something here but I can't find how to parse a file that is basically just the Feature Table from an EMBL file, looking like this: FT CDS join(37467..37521,38078..38195,38312..38400,38859..38936,39067..39154,39379..39675,39818..39842) FT /colour=7 FT /product="RNA-binding protein, putative" FT CDS 213199..214812 FT /colour=7 FT /product="eukaryotic translation initiation factor 3 FT subunit 7, putative" ...[more of the same] So the file has no header and no actual sequence and it is used simply to annotate a chromosome in a genome assembly. I've always used GFF for that purpose but have been given this file now. BioSeqIO->new(-format=>"EMBL") complains about the missing header and if I stick in a fake ID line, it warns about the missing sequence and the fact that the features don't fit on the sequence (of length 0). Of course it's not difficult to write my own parser but I'm sure there must be a BioPerl way of doing that that I have just overlooked. Thanks for your help. -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From David.Messina at sbc.su.se Mon Dec 14 09:06:54 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 14 Dec 2009 15:06:54 +0100 Subject: [Bioperl-l] parse EMBL Feature Table only In-Reply-To: <1260793098.17180.184.camel@deskpro15336.dynamic.sanger.ac.uk> References: <1260793098.17180.184.camel@deskpro15336.dynamic.sanger.ac.uk> Message-ID: <0F8203F6-06D8-43EF-BB35-EB723F4B9DFA@sbc.su.se> Hi Frank, You will need to look at the feature table parsing code that Bio::SeqIO::embl itself uses to read those lines, probably the _read_FTHelper_EMBL method: http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/SeqIO/embl.html#POD12 Since you're trying to parse what is effectively a part of an EMBL record, and a somewhat complicated part at that, as you might imagine this could be a little hairy. It might be easier to go the route you started down: add a fake header and a (relatively long) fake sequence, and go through Bio::SeqIO in the normal way. Dave PS ? I suspect you may already be familiar with it, but for an overview on how to get at data in feature tables, look at the Feature Annotation HOWTO: http://www.bioperl.org/wiki/HOWTO:Feature-Annotation From eric_donaldson at med.unc.edu Mon Dec 14 09:22:40 2009 From: eric_donaldson at med.unc.edu (eric_donaldson at med.unc.edu) Date: Mon, 14 Dec 2009 09:22:40 -0500 Subject: [Bioperl-l] problem with install In-Reply-To: <7B2EBA9A-E9DF-49A5-ABC7-C42512BA9C9A@bioperl.org> References: <119F436D-D36D-4D28-BAE7-6EB17D665FC2@bioperl.org> <404D2600-58D3-4491-834E-8C9F860D3ACC@bioperl.org> <7B2EBA9A-E9DF-49A5-ABC7-C42512BA9C9A@bioperl.org> Message-ID: Thank you Jason.? I appreciate the help. Eric ----- Original Message ----- From: Jason Stajich Date: Monday, December 14, 2009 12:10 am Subject: Re: [Bioperl-l] problem with install To: eric_donaldson at med.unc.edu Cc: BioPerl List > Eric - > please CC the bioperl list when responding so others can help - > I? > can't be the only answerer. > > But since your @INC message doesn't include /sw/lib/perl5/5.8.8/ > you? > would need to make sure that is added to your PERL5LIB. > There are some help docs on the perl sites I expect on how to > get your? > PATHs in order. > > Or you can just install via CPAN which will put it in the right > path -? > there are docs on the bioperl website about installing via CPAN. > > -jason > On Dec 13, 2009, at 9:02 PM, eric_donaldson at med.unc.edu wrote: > > > Hi Jason, > > > > The fink package did not have support for perl 5.10, so I > attempted? > > to install the perl 5.8.6 package. > > > > When I attempted: locate Bio/SearchIO.pm > > I got: -bash: $: command not found > > > > So even though I can find SearchIO.pm in > sw/lib/perl5/5.8.8/Bio/ > > SearchIO.pm? I cannot access it.? Do I need to use > the older version? > > of perl? > > > > Would it be better to install with CPAN?? If so, can you > send me to? > > a page that has instructions? > > > > Thank you so much! > > > > ERic > > > > > > ----- Original Message ----- > > From: Jason Stajich > > Date: Sunday, December 13, 2009 11:10 pm > > Subject: Re: [Bioperl-l] problem with install > > To: eric_donaldson at med.unc.edu > > Cc: BioPerl List > > > >> So you installed perl-5.10 or using system perl?? I'm > >> confused if you > >> actually installed bioperl.pm or not via fink? > >> > >> It seems like since your @INC or $PERL5LIB points to > >> /sw/lib/perl5 > >> which is one of the dirs it would have installed in, but I don't > >> think > >> you actually installed bioperl. > >> > >> you can try and do: > >> $ locate Bio/SearchIO.pm > >> > >> We'll see if any of the other osx/fink gurus are on the list > >> that can > >> help or you can install it via CPAN I guess. > >> > >> -jason > >> On Dec 13, 2009, at 6:39 PM, eric_donaldson at med.unc.edu wrote: > >> > >>> > >>> I actually tried a different blastparser that uses > >> BIO::SearchIO and > >>> got the same message: > >>> > >>> Can't locate Bio/SearchIO.pm in @INC (@INC contains: > >> /sw/lib/perl5/ > >>> darwin-thread-multi-2level /sw/lib/perl5 /sw/lib/perl5/darwin > >> / > >>> Library/Perl/Updates/5.10.0 > /System/Library/Perl/5.10.0/darwin- > >> > >>> thread-multi-2level /System/Library/Perl/5.10.0 > >> /Library/Perl/5.10.0/ > >>> darwin-thread-multi-2level /Library/Perl/5.10.0 > >> /Network/Library/ > >>> Perl/5.10.0/darwin-thread-multi-2level > >> /Network/Library/Perl/5.10.0 / > >>> Network/Library/Perl > /System/Library/Perl/Extras/5.10.0/darwin- > >> > >>> thread-multi-2level /System/Library/Perl/Extras/5.10.0 .) > >> at > >>> blastparser.new.pl line 8. > >>> BEGIN failed--compilation aborted at blastparser.new.pl line 8. > >>> > >>> I suspect there is a path problem, but am not savvy enough to > >> know > >>> how to fix it.? I am really just a hacker.... I have > >> several scripts > >>> that I use regularly and that I know how to modify, but am > >> lost when > >>> they don't work... > >>> > >>> Thanks for any help, > >>> > >>> Eric > >>> > >>> ----- Original Message ----- > >>> From: Jason Stajich > >>> Date: Sunday, December 13, 2009 8:24 pm > >>> Subject: Re: [Bioperl-l] problem with install > >>> To: eric_donaldson at med.unc.edu > >>> Cc: bioperl-l at bioperl.org > >>> > >>>> Hi Eric - > >>>> > >>>> Bio::Tools::BPlite is no longer supported in Bioperl - it > >>>> was > >>>> deprecated several releases ago. > >>>> It was replaced with Bio::SearchIO > >>>> > >>>> -jason > >>>> On Dec 13, 2009, at 3:15 PM, eric_donaldson at med.unc.edu wrote: > >>>> > >>>>> Hello, > >>>>> > >>>>> Today I downloaded bioperl 1.61 on my new macbook pro using > >>>> fink.? I > >>>>> used the > >>>>> > >>>>> fink install bioperl.pm-588 as I could not get it to instal > >>>> using > >>>>> the perl version 5.10. > >>>>> > >>>>> But now I get an error when trying to run a bioperl script. > >>>>> > >>>>> Here is the error: > >>>>> > >>>>> Can't locate Bio/Tools/BPlite.pm in @INC (@INC contains: > >>>> /sw/lib/ > >>>>> perl5/darwin-thread-multi-2level /sw/lib/perl5 > >>>> /sw/lib/perl5/darwin / > >>>>> Library/Perl/Updates/5.10.0 > >> /System/Library/Perl/5.10.0/darwin- > >>>> > >>>>> thread-multi-2level /System/Library/Perl/5.10.0 > >>>> /Library/Perl/5.10.0/ > >>>>> darwin-thread-multi-2level /Library/Perl/5.10.0 > >>>> /Network/Library/ > >>>>> Perl/5.10.0/darwin-thread-multi-2level > >>>> /Network/Library/Perl/5.10.0 / > >>>>> Network/Library/Perl > >> /System/Library/Perl/Extras/5.10.0/darwin- > >>>> > >>>>> thread-multi-2level /System/Library/Perl/Extras/5.10.0 .) > >>>> at > >>>>> blastparser.pl line 8. > >>>>> BEGIN failed--compilation aborted at blastparser.pl line 8. > >>>>> > >>>>> > >>>>> I am a novice at unix and bioperl so I do not know how > >>>> to > >>>>> troubleshoot this, would you please hleo me? > >>>>> > >>>>> Thank you, > >>>>> > >>>>> Eric > >>>>> > >>>>> > >>>>> Eric F. Donaldson, Ph.D. > >>>>> Research Assistant Professor, Ralph Baric Lab > >>>>> University of North Carolina > >>>>> Department of Epidemiology > >>>>> > >>>>> > >>>>> > >>>> < > >>>> > >> eric_donaldson.vcf>_______________________________________________> > >>>> Bioperl-l mailing list > >>>>> Bioperl-l at lists.open-bio.org > >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>> > >>>> -- > >>>> Jason Stajich > >>>> jason.stajich at gmail.com > >>>> jason at bioperl.org > >>>> > >>>> > >>> > >>> Eric F. Donaldson, Ph.D. > >>> Research Assistant Professor, Ralph Baric Lab > >>> University of North Carolina > >>> Department of Epidemiology > >>> > >>> > >>> > >> > >> -- > >> Jason Stajich > >> jason.stajich at gmail.com > >> jason at bioperl.org > >> > >> > > > > Eric F. Donaldson, Ph.D. > > Research Assistant Professor, Ralph Baric Lab > > University of North Carolina > > Department of Epidemiology > > > > > > > > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > > Eric F. Donaldson, Ph.D. Research Assistant Professor, Ralph Baric Lab University of North Carolina Department of Epidemiology -------------- next part -------------- begin:vcard n:Donaldson;Eric fn:Eric F. Donaldson, PhD tel;work:919.966.3881 org:University of North Carolina, School of Medicine;Epidemiology adr:;;2107 McGavran-Greenberg Hall CB# 7435 ;Chapel Hill;NC;27599;USA email;internet:eric_donaldson at med.unc.edu email;home;internet:viralnerd at gmail.com title:Research Assistant Professor version:2.1 end:vcard From umjsm at leeds.ac.uk Mon Dec 14 11:58:03 2009 From: umjsm at leeds.ac.uk (Joan Segura Mora) Date: Mon, 14 Dec 2009 16:58:03 +0000 Subject: [Bioperl-l] extract and write a pdb chain In-Reply-To: <1260787172.7359.0.camel@limm-pc1254> References: <1260549882.6484.11.camel@limm-pc1254> <1260787172.7359.0.camel@limm-pc1254> Message-ID: <1260809883.7359.15.camel@limm-pc1254> Hi again, To extract a pdb chain in a file, I have had to do it adding atom by atom to a new structure. use Bio::Structure::IO; use strict; my $structio = Bio::Structure::IO->new(-file => "101m.pdb",'-format' => 'pdb'); my $struc = $structio->next_structure; my $new_struct = Bio::Structure::Entry->new( -id => 'structure_id'); for my $model ($struc->get_models){ $new_struct->add_model($model); for my $chain ($struc->get_chains) { $new_struct->add_chain($model,$chain); if($chain->id eq "A"){ foreach my $res ($struc->get_residues($chain)){ $new_struct->add_residue($chain,$res); foreach my $atom ($struc->get_atoms($res)){ $new_struct->add_atom($res,$atom); } } } last; } last; } my $out = Bio::Structure::IO->new(-file => ">out.pdb",'-format' => 'pdb'); $out->write_structure($new_struct); I suppose that there should be a more elegant way to do it. If someone knows it and can explain it I will be very grateful. kind regards, Joan On Mon, 2009-12-14 at 10:39 +0000, Joan Segura Mora wrote: > Hi Brian, > > I am not calling the method add_chain, I am calling the method chain > > http://doc.bioperl.org/releases/bioperl-1.0.1/Bio/Structure/Entry.html#POD6 > > and if I don't use as an argument an object of type > > Bio::Structure::Chain > > I get an error like this (-->depends of the argument<--) > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Supplied a -->Bio::Structure::Residue=HASH(0x11be6a0)<-- to chain, > we want a Bio::Structure::Chain or a list of these > > STACK: Error::throw > STACK: > Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:368 > STACK: > Bio::Structure::Entry::chain /usr/local/share/perl/5.8.8/Bio/Structure/Entry.pm:314 > STACK: read_pdb.pl:11 > ----------------------------------------------------------- > > > And if I use a Chain object I get the error that I told you. > > I have try this code: > > use Bio::Structure::IO; > use strict; > > my $structio = Bio::Structure::IO->new(-file => "101m.pdb",'-format' => > 'pdb'); > my $struc = $structio->next_structure; > my $new_entry = Bio::Structure::Entry->new( -id => 'structure_id'); > my $model = Bio::Structure::Model->new( -id => '0'); > > for my $chain ($struc->get_chains) { > if($chain->id eq "A"){ > $new_entry->add_chain($model,$chain); > > last; > } > } > $new_entry->add_model($model); > my $out = Bio::Structure::IO->new(-file => ">out.pdb",'-format' => > 'pdb'); > $out->write_structure($new_entry); > > > But I get an empty pdb > > HEADER DEFAULT CLASSIFICATION 24-JAN-70 > stru > REMARK > 1 > TER 1 A > 0 > MASTER > END > > I am trying a lot of combinations, but I can't write a single chain into > a file. I don't know what I am doing wrong. > > Thanks for helping > > regards, > Joan > > > On Fri, 2009-12-11 at 15:37 -0500, Brian Osborne wrote: > > Joan, > > > > It looks to me like the first argument to the add_chain() method has > > to be a Model object, the second is the Chain itself. See Structure/ > > Entry.pm, for example. However if you're seeing some documentation > > that says something else then tell us where, it needs to be corrected. > > > > In Bio::Structure an Entry consists of one or Models, each of which > > has one or more Chains. This allows you to build macromolecular > > complexes (an Entry), which could have more than one defined proteins > > or protein complexes (Models). > > > > Brian O. > > > > On Dec 11, 2009, at 11:44 AM, Joan Segura Mora wrote: > > > > > Hello, > > > > > > I am trying to do a very easy think but I don't get it. I want to > > > write > > > in a file a chain of a pdb. I have try a lot of thinks but what I > > > think > > > that it should work is the next script: > > > > > > use Bio::Structure::IO; > > > use strict; > > > > > > my $structio = Bio::Structure::IO->new(-file => "101m.pdb",'-format' > > > => > > > 'pdb'); > > > my $struc = $structio->next_structure; > > > > > > my $new_entry = Bio::Structure::Entry->new( -id => 'structure_id'); > > > > > > for my $chain ($struc->get_chains) { > > > if($chain->id eq "A"){ > > > $new_entry->chain($chain); > > > last; > > > } > > > } > > > > > > my $out = Bio::Structure::IO->new(-file => ">out.pdb",'-format' => > > > 'pdb');# > > > $out->write_structure($new_entry); > > > > > > it doesn't. I get the next error: > > > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > > MSG: add_chain: first argument needs to be a Model object () > > > > > > STACK: Error::throw > > > STACK: > > > Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/Root.pm: > > > 368 > > > STACK: > > > Bio::Structure::Entry::add_chain /usr/local/share/perl/5.8.8/Bio/ > > > Structure/Entry.pm:335 > > > STACK: > > > Bio::Structure::Entry::get_chains /usr/local/share/perl/5.8.8/Bio/ > > > Structure/Entry.pm:391 > > > STACK: > > > Bio::Structure::Entry::chain /usr/local/share/perl/5.8.8/Bio/ > > > Structure/Entry.pm:304 > > > STACK: read_pdb.pl:10 > > > ----------------------------------------------------------- > > > > > > As far I understand the documentation, the method chain of the object > > > Bio::Structure::Entry requires an as input an object of type Chain. > > > > > > Any solution will be very welcome. > > > > > > best regards, > > > Joan > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From gowthaman.ramasamy at sbri.org Mon Dec 14 14:16:32 2009 From: gowthaman.ramasamy at sbri.org (Gowthaman Ramasamy) Date: Mon, 14 Dec 2009 11:16:32 -0800 Subject: [Bioperl-l] GO::Parser / GO::Model::Term In-Reply-To: <67E6A22C-6968-460D-B192-E129773A0BA5@vecna.com> Message-ID: Hi All, I have a list of GO terms. And would like to pull GO accessions for them. I can easily do the revere of it using get_term("GO::00000051"). But can someone tell me how to get the GO accessions from GO Terms , for eg: retrive GO accession for "citrulline metabolic process". Thanks very much, Gowtham From lsbrath at gmail.com Mon Dec 14 14:41:39 2009 From: lsbrath at gmail.com (Mgavi Brathwaite) Date: Mon, 14 Dec 2009 14:41:39 -0500 Subject: [Bioperl-l] Issues with loading BioPerl-1.6.0 on to my Mac Message-ID: <69367b8f0912141141n5bf94978k61dc6e31e54a4a8a@mail.gmail.com> Hello, I have loaded BioPerl -1.6.0 onto my Mac. When I run my script I get the following error message: Can't locate Bio/SeqIO.pm in @INC (@INC contains: /sw/lib/perl5 /sw/lib/perl5/darwin /System/Library/Perl/5.8.8/darwin-thread-multi-2level /System/Library/Perl/5.8.8 /Library/Perl/5.8.8/darwin-thread-multi-2level /Library/Perl/5.8.8 /Library/Perl /Network/Library/Perl/5.8.8/darwin-thread-multi-2level /Network/Library/Perl/5.8.8 /Network/Library/Perl /System/Library/Perl/Extras/5.8.8/darwin-thread-multi-2level /System/Library/Perl/Extras/5.8.8 /Library/Perl/5.8.6 /Library/Perl/5.8.1 .) at project_example.pl line 4. BEGIN failed--compilation aborted at project_example.pl line 4. I moved the BioPerl dir to /sw/lib/perl5 and I still get the error message. Any ideas? MEB From scott at scottcain.net Mon Dec 14 14:47:05 2009 From: scott at scottcain.net (Scott Cain) Date: Mon, 14 Dec 2009 14:47:05 -0500 Subject: [Bioperl-l] Issues with loading BioPerl-1.6.0 on to my Mac In-Reply-To: <69367b8f0912141141n5bf94978k61dc6e31e54a4a8a@mail.gmail.com> References: <69367b8f0912141141n5bf94978k61dc6e31e54a4a8a@mail.gmail.com> Message-ID: <4536f7700912141147ld16d67av1a58bbf5c1fc5e9e@mail.gmail.com> Hi Mgavi, I think Jason may have already started helping, but the question is: is SeqIO.pm anywhere in those directories? If not, why not? If so, why can't the perl you are using find it? Do you have more than one instance of perl on your machine (fairly likely if you are using a fink-installed BioPerl)? When you execute your script, which perl are you using? Scott On Mon, Dec 14, 2009 at 2:41 PM, Mgavi Brathwaite wrote: > Hello, > > I have loaded BioPerl -1.6.0 onto my Mac. When I run my script I get the > following error message: > > Can't locate Bio/SeqIO.pm in @INC (@INC contains: /sw/lib/perl5 > /sw/lib/perl5/darwin /System/Library/Perl/5.8.8/darwin-thread-multi-2level > /System/Library/Perl/5.8.8 /Library/Perl/5.8.8/darwin-thread-multi-2level > /Library/Perl/5.8.8 /Library/Perl > /Network/Library/Perl/5.8.8/darwin-thread-multi-2level > /Network/Library/Perl/5.8.8 /Network/Library/Perl > /System/Library/Perl/Extras/5.8.8/darwin-thread-multi-2level > /System/Library/Perl/Extras/5.8.8 /Library/Perl/5.8.6 /Library/Perl/5.8.1 .) > at project_example.pl line 4. > BEGIN failed--compilation aborted at project_example.pl line 4. > > I moved the BioPerl dir to /sw/lib/perl5 and I still get the error message. > Any ideas? > > MEB > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From bosborne11 at verizon.net Mon Dec 14 14:45:35 2009 From: bosborne11 at verizon.net (Brian Osborne) Date: Mon, 14 Dec 2009 14:45:35 -0500 Subject: [Bioperl-l] Issues with loading BioPerl-1.6.0 on to my Mac In-Reply-To: <69367b8f0912141141n5bf94978k61dc6e31e54a4a8a@mail.gmail.com> References: <69367b8f0912141141n5bf94978k61dc6e31e54a4a8a@mail.gmail.com> Message-ID: <38104B41-104B-42D7-94FA-30016E110BFD@verizon.net> Mgavi, So there's a directory called /sw/lib/perl5/Bio? Or is it called something else? Brian O. On Dec 14, 2009, at 2:41 PM, Mgavi Brathwaite wrote: > Hello, > > I have loaded BioPerl -1.6.0 onto my Mac. When I run my script I get > the > following error message: > > Can't locate Bio/SeqIO.pm in @INC (@INC contains: /sw/lib/perl5 > /sw/lib/perl5/darwin /System/Library/Perl/5.8.8/darwin-thread- > multi-2level > /System/Library/Perl/5.8.8 /Library/Perl/5.8.8/darwin-thread- > multi-2level > /Library/Perl/5.8.8 /Library/Perl > /Network/Library/Perl/5.8.8/darwin-thread-multi-2level > /Network/Library/Perl/5.8.8 /Network/Library/Perl > /System/Library/Perl/Extras/5.8.8/darwin-thread-multi-2level > /System/Library/Perl/Extras/5.8.8 /Library/Perl/5.8.6 /Library/Perl/ > 5.8.1 .) > at project_example.pl line 4. > BEGIN failed--compilation aborted at project_example.pl line 4. > > I moved the BioPerl dir to /sw/lib/perl5 and I still get the error > message. > Any ideas? > > MEB > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason at bioperl.org Mon Dec 14 16:42:09 2009 From: jason at bioperl.org (Jason Stajich) Date: Mon, 14 Dec 2009 13:42:09 -0800 Subject: [Bioperl-l] fasta format In-Reply-To: References: <36E9C2F3282347918FD3B3ACA0EC8126@jonas> Message-ID: <614B8A2C-3B17-4E3B-AAC5-3210C7435BB5@bioperl.org> you can read the man page from sean Eddy or use it exactly as I showed you sreformat fasta filename > filename.new you can also use the 1st example which is a bioperl solution. -jason On Dec 13, 2009, at 7:02 AM, Jonas Schaer wrote: > Hi Jason, > thank you very much for your answer. > i am sorry to bother u again but i'm afraid i need some help with > that because i don't see how to use sreformat? > i dont get it managed to write a script that works. > > thank u again :) > jonas > > > ----- Original Message ----- From: "Jason Stajich" > To: "Jonas Schaer" > Cc: > Sent: Tuesday, December 08, 2009 6:44 PM > Subject: Re: [Bioperl-l] fasta format > > >> you can run >> sreformat (HMMER) or bp_sreformat.pl script in scripts/utilties (or >> that is installed when you install the Bioperl scripts) >> $ bp_sreformat.pl -if fasta -of fasta -i yourfile.fa -o >> yournewfile.fa >> # rename it back >> $ mv yournewfile.fa yourfile.fa >> >> or >> $ sreformat fasta yourfile.fa > yournewfile.fa >> $ mv yournewfile.fa yourfile.fa >> >> >> -jason >> On Dec 8, 2009, at 7:21 AM, Jonas Schaer wrote: >> >>> Hi there, >>> I have a little question concerning bioperl. I have >>> BioPerl-1.6.1.tar.gz installed and i use the fasta.pm module to read >>> in some fasta files. first it worked fine, but now i have some >>> fastafiles in slightly different format (not all lines have the same >>> length!). >>> >>> ------------- EXCEPTION ------------- >>> MSG: Each line of the fasta entry must be the same length except the >>> last. >>> Line above #49 ' >>> ..' is 28 != 101 chars. >>> STACK Bio::DB::Fasta::calculate_offsets C:/Perl/site/lib/Bio/DB/ >>> Fasta.pm:771 >>> STACK Bio::DB::Fasta::index_file C:/Perl/site/lib/Bio/DB/Fasta.pm: >>> 681 >>> STACK Bio::DB::Fasta::new C:/Perl/site/lib/Bio/DB/Fasta.pm:491 >>> STACK Bio::DB::Fasta::newFh C:/Perl/site/lib/Bio/DB/Fasta.pm:513 >>> STACK main::readfasta blast_eval.pm:174 >>> STACK toplevel blast_eval.pm:83 >>> ------------------------------------- >>> >>> indexing was interrupted, so unlinking test.fasta.index at C:/Perl/ >>> site/lib/Bio/ >>> DB/Fasta.pm line 1054. >>> >>> >>> Is there any way to use these fasta files with diffrent length of >>> lines with this fasta.pm module or will i have to change the format >>> of my fasta-files(big databases...) ? >>> >>> Thanks in advance for any help! >>> >>> Regards, Jonas >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Jason Stajich >> jason.stajich at gmail.com >> jason at bioperl.org > > > -------------------------------------------------------------------------------- > > > > No virus found in this incoming message. > Checked by AVG - www.avg.com > Version: 8.5.426 / Virus Database: 270.14.98/2552 - Release Date: > 12/08/09 07:34:00 > -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From cjfields at illinois.edu Mon Dec 14 20:23:05 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 14 Dec 2009 19:23:05 -0600 Subject: [Bioperl-l] LocatableSeq NSE format - suggested changes Message-ID: <579AEE4E-A79C-496B-881C-226616BF0A2B@illinois.edu> All, The current output for NSE format (Name/Start-End) via Bio::LocatableSeq::get_nse() currently doesn't allow for strandedness. I have seen two variations of NSE that incorporate strandedness: 1) Stockholm Rfam reverses start and end if the strand == -1 chrY/598-1 2) Sheldon McKay's Gbrowse_syn uses Name(strand)/start-end rice-3(+)/16598648-16600199 The former breaks fewer things within BioPerl, but the latter seems more explicit. Any preferences? Do we want a new method that creates this, and deprecate out simple non-stranded NSE? chris From bernd.web at gmail.com Tue Dec 15 03:37:44 2009 From: bernd.web at gmail.com (Bernd Web) Date: Tue, 15 Dec 2009 09:37:44 +0100 Subject: [Bioperl-l] GO::Parser / GO::Model::Term In-Reply-To: References: <67E6A22C-6968-460D-B192-E129773A0BA5@vecna.com> Message-ID: <716af09c0912150037k513c6efah442a236cb323e14e@mail.gmail.com> Dear Gowthaman, A non-BioPerl solution: the Ontology Lookup service at EBI. It also provides a web service interface. http://www.ebi.ac.uk/ontology-lookup/ citrulline metabolic process has to be selected from the pull-down list in the interactive page. This will return the ID (GO:0000052) and addional info: definition The chemical reactions and pathways involving citrulline, N5-carbamoyl-L-ornithine, an alpha amino acid not found in proteins. preferred name citrulline metabolic process exact synonym citrulline metabolism subset Prokaryotic GO subset xref_definition ISBN:209853"Oxford Dictionary of Biochemistry and Molecular Biology" The webservice is described at http://www.ebi.ac.uk/ontology-lookup/WSDLDocumentation.do Regards, Bernd On Mon, Dec 14, 2009 at 8:16 PM, Gowthaman Ramasamy wrote: > > Hi All, > I have a list of GO terms. And would like to pull GO accessions for them. > I can easily do the revere of it using get_term("GO::00000051"). > > But can someone tell me how to get the GO accessions from GO Terms , for eg: retrive GO accession for "citrulline metabolic process". > > > Thanks very much, > Gowtham > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From fs5 at sanger.ac.uk Tue Dec 15 05:38:40 2009 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Tue, 15 Dec 2009 10:38:40 +0000 Subject: [Bioperl-l] parse EMBL Feature Table only In-Reply-To: <0F8203F6-06D8-43EF-BB35-EB723F4B9DFA@sbc.su.se> References: <1260793098.17180.184.camel@deskpro15336.dynamic.sanger.ac.uk> <0F8203F6-06D8-43EF-BB35-EB723F4B9DFA@sbc.su.se> Message-ID: <1260873520.17180.215.camel@deskpro15336.dynamic.sanger.ac.uk> Thanks Dave, good to know that I haven't overlooked something bleedingly obvious in Bioperl that already does this :-) No problem, I have already implemented a simple parser to do it, which works fine for my files. Thanks Frank On Mon, 2009-12-14 at 15:06 +0100, Dave Messina wrote: > Hi Frank, > > You will need to look at the feature table parsing code that Bio::SeqIO::embl itself uses to read those lines, probably the _read_FTHelper_EMBL method: > http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/SeqIO/embl.html#POD12 > > Since you're trying to parse what is effectively a part of an EMBL record, and a somewhat complicated part at that, as you might imagine this could be a little hairy. > > It might be easier to go the route you started down: add a fake header and a (relatively long) fake sequence, and go through Bio::SeqIO in the normal way. > > > Dave > > > PS ? I suspect you may already be familiar with it, but for an overview on how to get at data in feature tables, look at the Feature Annotation HOWTO: > > http://www.bioperl.org/wiki/HOWTO:Feature-Annotation > > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From rmb32 at cornell.edu Tue Dec 15 10:09:43 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Tue, 15 Dec 2009 07:09:43 -0800 Subject: [Bioperl-l] AGI's fpc stuff: Bio::Map::Physical, Bio::MapIO::fpc, etc Message-ID: <4B27A6B7.6090709@cornell.edu> Hi all, Recently I caught an interesting thing related to making GFF files out of FPC maps built recently using Bio::MapIO;:fpc. All of the coordinates in the resulting GFF3 and the sizes of the contigs and clones seem to be dilated by 4x from where they should be. This didn't happen with some earlier FPC datasets I ran through these modules. I haven't gone through any of this very thoroughly, but I notice in Bio::Map::Physical::print_gffstyle() at line 765 there's a line like 'my $basepair = 4096', and the routine goes on to use $basepair as a sort of multiplier for converting the native physical map units into basepairs for GFF-style output. This makes me wonder if the newer FPC datasets coming out require a different $basepairs value, maybe 1024? Are the original authors of these modules still around on this list? Rob -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From tristan.lefebure at gmail.com Tue Dec 15 12:18:26 2009 From: tristan.lefebure at gmail.com (Tristan Lefebure) Date: Tue, 15 Dec 2009 12:18:26 -0500 Subject: [Bioperl-l] ncurses and bioperl? Message-ID: <200912151218.26357.tristan.lefebure@gmail.com> Hello, (Be careful: the following is a very naive question) Something that I find myself missing is a simple way to look at alignments and trees on remote machines where I don't have access to X. Since, (1) one can make wonderful terminal programs like screen and emacs by using ncurses, (2) that alignment and tree objects are already well handled in bioperl, and (3) that there is a CPAN Curses module; doing 1+2+3, may I dream of a curse/bioperl perl program to render alignment and trees? I suppose a plain C program would be much better, but well I am a biologist... Thanks, --Tristan From jason at bioperl.org Tue Dec 15 12:50:52 2009 From: jason at bioperl.org (Jason Stajich) Date: Tue, 15 Dec 2009 09:50:52 -0800 Subject: [Bioperl-l] ncurses and bioperl? In-Reply-To: <200912151218.26357.tristan.lefebure@gmail.com> References: <200912151218.26357.tristan.lefebure@gmail.com> Message-ID: not to say this isn't a good idea, but currently for curses I would use the treeviewing with retree from PHYLIP and for short read alignments the samtools tview or Gambit (MarthLab) works great or something like ralee for viewing MSA alignments (though targeted for RNA editing) http://personalpages.manchester.ac.uk/staff/sam.griffiths-jones/software/ralee/ http://dx.doi.org/10.1093/bioinformatics/bth489 Just that there are prior examples so would be able to learn from them if you still wanted to roll your own here. -jason On Dec 15, 2009, at 9:18 AM, Tristan Lefebure wrote: > Hello, > > (Be careful: the following is a very naive question) > > Something that I find myself missing is a simple way to look > at alignments and trees on remote machines where I don't > have access to X. Since, > (1) one can make wonderful terminal programs like screen > and emacs by using ncurses, > (2) that alignment and tree objects are already well > handled in bioperl, and > (3) that there is a CPAN Curses module; > > doing 1+2+3, may I dream of a curse/bioperl perl program to > render alignment and trees? I suppose a plain C program > would be much better, but well I am a biologist... > > Thanks, > > --Tristan > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From roy.chaudhuri at gmail.com Tue Dec 15 12:47:26 2009 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Tue, 15 Dec 2009 17:47:26 +0000 Subject: [Bioperl-l] ncurses and bioperl? In-Reply-To: <200912151218.26357.tristan.lefebure@gmail.com> References: <200912151218.26357.tristan.lefebure@gmail.com> Message-ID: <4B27CBAE.5000303@gmail.com> Hi Tristan, Not a Bioperl solution, but retree from the Phylip package displays trees in a terminal. Roy. On 15/12/2009 17:18, Tristan Lefebure wrote: > Hello, > > (Be careful: the following is a very naive question) > > Something that I find myself missing is a simple way to look > at alignments and trees on remote machines where I don't > have access to X. Since, > (1) one can make wonderful terminal programs like screen > and emacs by using ncurses, > (2) that alignment and tree objects are already well > handled in bioperl, and > (3) that there is a CPAN Curses module; > > doing 1+2+3, may I dream of a curse/bioperl perl program to > render alignment and trees? I suppose a plain C program > would be much better, but well I am a biologist... > > Thanks, > > --Tristan > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From nml5566 at gmail.com Tue Dec 15 16:37:30 2009 From: nml5566 at gmail.com (Nathan Liles) Date: Tue, 15 Dec 2009 15:37:30 -0600 Subject: [Bioperl-l] Bio::Ontology::OBOEngine for parsing obo files? Message-ID: <81a20b1e0912151337q786b6c35se18328173ec27abd@mail.gmail.com> Is the Bio::Ontology::OBOEngine module working or being currently maintained? I tried following the documentation in the module: * use Bio::Ontology::OBOEngine; my $parser = Bio::Ontology::OBOEngine->new ( -file => "gene_ontology.obo" ); my $engine = $parser->parse(); *But, it throws an error when I run the file saying 'Can't locate object method "parse" '. Does anyone have any experience getting this module working; or, is there any alternative bioperl module to extract terms and relationships out of sequence ontology files? From hlapp at drycafe.net Tue Dec 15 17:05:10 2009 From: hlapp at drycafe.net (Hilmar Lapp) Date: Tue, 15 Dec 2009 17:05:10 -0500 Subject: [Bioperl-l] Bio::Ontology::OBOEngine for parsing obo files? In-Reply-To: <81a20b1e0912151337q786b6c35se18328173ec27abd@mail.gmail.com> References: <81a20b1e0912151337q786b6c35se18328173ec27abd@mail.gmail.com> Message-ID: That shouldn't happen I suppose, but you're not supposed really to use the engine directly. Rather it will be used as a backing parser by the Bio::OntologyIO parser you choose. Have you tried that route and found it not to work? -hilmar On Dec 15, 2009, at 4:37 PM, Nathan Liles wrote: > Is the Bio::Ontology::OBOEngine module working or being currently > maintained? I tried following the documentation in the module: > > * use Bio::Ontology::OBOEngine; > > my $parser = Bio::Ontology::OBOEngine->new > ( -file => "gene_ontology.obo" ); > > my $engine = $parser->parse(); > > *But, it throws an error when I run the file saying 'Can't locate > object > method "parse" '. Does anyone have any experience getting this module > working; or, is there any alternative bioperl module to extract > terms and > relationships out of sequence ontology files? > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From David.Messina at sbc.su.se Wed Dec 16 04:58:16 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 16 Dec 2009 10:58:16 +0100 Subject: [Bioperl-l] LocatableSeq NSE format - suggested changes In-Reply-To: <579AEE4E-A79C-496B-881C-226616BF0A2B@illinois.edu> References: <579AEE4E-A79C-496B-881C-226616BF0A2B@illinois.edu> Message-ID: I'd tend to be inclined more towards option 1 over option 2 because option 2 pollutes the name field. (Although that's not a huge problem if the '(strand)' is always just before the '/'.) It's a question of whether to optimize human-readability over machine-readabilitiy: option 2 favors the former over the latter, and option 1 the reverse. Whichever way you go, I think > a new method that creates this, and deprecate[s] out simple non-stranded NSE would be great. Dave From maj at fortinbras.us Wed Dec 16 07:51:24 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 16 Dec 2009 07:51:24 -0500 Subject: [Bioperl-l] LocatableSeq NSE format - suggested changes In-Reply-To: <579AEE4E-A79C-496B-881C-226616BF0A2B@illinois.edu> References: <579AEE4E-A79C-496B-881C-226616BF0A2B@illinois.edu> Message-ID: <6723123C0ABD447190639AE1F5D1A6A7@NewLife> I'm with Dave; option 1 is cleaner. The only problem might be the automatic interpretation of older output as always plus strand, but presumably these would have had to record the strandedness explicitly elsewhere, so they would be updatable. I'm definitely for making strandedness part of the spec in some way. cheers MAJ ----- Original Message ----- From: "Chris Fields" To: "BioPerl List" Sent: Monday, December 14, 2009 8:23 PM Subject: [Bioperl-l] LocatableSeq NSE format - suggested changes > All, > > The current output for NSE format (Name/Start-End) via > Bio::LocatableSeq::get_nse() currently doesn't allow for strandedness. I have > seen two variations of NSE that incorporate strandedness: > > 1) Stockholm Rfam reverses start and end if the strand == -1 > > chrY/598-1 > > 2) Sheldon McKay's Gbrowse_syn uses Name(strand)/start-end > > rice-3(+)/16598648-16600199 > > The former breaks fewer things within BioPerl, but the latter seems more > explicit. Any preferences? Do we want a new method that creates this, and > deprecate out simple non-stranded NSE? > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From tuco at pasteur.fr Wed Dec 16 09:14:28 2009 From: tuco at pasteur.fr (Emmanuel Quevillon) Date: Wed, 16 Dec 2009 15:14:28 +0100 Subject: [Bioperl-l] Data missing into Annotation object using Bio::SeqIO (Genbank) Message-ID: <4B28EB44.3080006@pasteur.fr> Hi, I've wrote a small Genbank parser few months ago before BioPerl release 1.6.0. I tried to use my code once again but now the output of my parser is empty. It looks like Annotation from seqfeatures is not filled anymore. Here is the code I used previously: while(my $seq = $streamer->next_seq()){ #We only want to retrieve CDS features... foreach my $feat (grep { $_->primary_tag() eq 'CDS' } $seq->get_SeqFeatures()){ print $ofh join("#", $feat->annotation()->get_Annotations('locus_tag'), # Acc num $feat->annotation()->get_Annotations('gene') ? $feat->annotation()->get_Annotations('gene') # Gene name : $feat->annotation()->get_Annotations('locus_tag'), $feat->annotation()->get_Annotations('product'), # Description ),"\n"; } } $feat is a Bio::SeqFeature::Generic object If I print Dumper($feat->annotation()) here is the output : $VAR1 = bless( { '_typemap' => bless( { '_type' => { 'comment' => 'Bio::Annotation::Comment', 'reference' => 'Bio::Annotation::Reference', 'dblink' => 'Bio::Annotation::DBLink' } }, 'Bio::Annotation::TypeManager' ), '_annotation' => {} }, 'Bio::Annotation::Collection' ); Have some changes been made into the way annotation object is populated? Thanks for any clue and sorry if my question look stupid Regards Emmanuel -- ------------------------- Emmanuel Quevillon Biological Software and Databases Group Institut Pasteur +33 1 44 38 95 98 tuco at_ pasteur dot fr ------------------------- From cjfields at illinois.edu Wed Dec 16 10:09:56 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 16 Dec 2009 09:09:56 -0600 Subject: [Bioperl-l] Data missing into Annotation object using Bio::SeqIO (Genbank) In-Reply-To: <4B28EB44.3080006@pasteur.fr> References: <4B28EB44.3080006@pasteur.fr> Message-ID: <29CB0088-99C1-417E-BB3B-56FE7EC135F9@illinois.edu> Emmanuel, The previous behavior in the 1.5.x series was to store feature tags as Bio::Annotation. The problem had been the way this was implemented was considered unsatisfactory for various reasons, so we reverted back to using simple tag-value pairs as the default. You can get at the data this way (from the Feature/Annotation HOWTO): for my $feat_object ($seq_object->get_SeqFeatures) { print "primary tag: ", $feat_object->primary_tag, "\n"; for my $tag ($feat_object->get_all_tags) { print " tag: ", $tag, "\n"; for my $value ($feat_object->get_tag_values($tag)) { print " value: ", $value, "\n"; } } } You can also convert all the tag-value data into a Bio::Annotation::Collection using the Bio::SeqFeature::AnnotationAdaptor, but this is completely optional. chris On Dec 16, 2009, at 8:14 AM, Emmanuel Quevillon wrote: > Hi, > > I've wrote a small Genbank parser few months ago before BioPerl release 1.6.0. > I tried to use my code once again but now the output of my parser is empty. > It looks like Annotation from seqfeatures is not filled anymore. > > Here is the code I used previously: > > while(my $seq = $streamer->next_seq()){ > > #We only want to retrieve CDS features... > foreach my $feat (grep { $_->primary_tag() eq 'CDS' } $seq->get_SeqFeatures()){ > print $ofh join("#", > $feat->annotation()->get_Annotations('locus_tag'), # Acc num > $feat->annotation()->get_Annotations('gene') > ? $feat->annotation()->get_Annotations('gene') # Gene name > : $feat->annotation()->get_Annotations('locus_tag'), > $feat->annotation()->get_Annotations('product'), # Description > ),"\n"; > } > } > > $feat is a Bio::SeqFeature::Generic object > > If I print Dumper($feat->annotation()) here is the output : > > $VAR1 = bless( { > '_typemap' => bless( { > '_type' => { > 'comment' => 'Bio::Annotation::Comment', > 'reference' => 'Bio::Annotation::Reference', > 'dblink' => 'Bio::Annotation::DBLink' > } > }, 'Bio::Annotation::TypeManager' ), > '_annotation' => {} > }, 'Bio::Annotation::Collection' ); > > Have some changes been made into the way annotation object is populated? > > Thanks for any clue and sorry if my question look stupid > > Regards > > Emmanuel > > -- > ------------------------- > Emmanuel Quevillon > Biological Software and Databases Group > Institut Pasteur > +33 1 44 38 95 98 > tuco at_ pasteur dot fr > ------------------------- > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From tuco at pasteur.fr Wed Dec 16 10:37:45 2009 From: tuco at pasteur.fr (Emmanuel Quevillon) Date: Wed, 16 Dec 2009 16:37:45 +0100 Subject: [Bioperl-l] Data missing into Annotation object using Bio::SeqIO (Genbank) In-Reply-To: <29CB0088-99C1-417E-BB3B-56FE7EC135F9@illinois.edu> References: <4B28EB44.3080006@pasteur.fr> <29CB0088-99C1-417E-BB3B-56FE7EC135F9@illinois.edu> Message-ID: <4B28FEC9.1080509@pasteur.fr> On 12/16/2009 04:09 PM, Chris Fields wrote: > Emmanuel, > > The previous behavior in the 1.5.x series was to store feature tags as Bio::Annotation. The problem had been the way this was implemented was considered unsatisfactory for various reasons, so we reverted back to using simple tag-value pairs as the default. You can get at the data this way (from the Feature/Annotation HOWTO): > > for my $feat_object ($seq_object->get_SeqFeatures) { > print "primary tag: ", $feat_object->primary_tag, "\n"; > for my $tag ($feat_object->get_all_tags) { > print " tag: ", $tag, "\n"; > for my $value ($feat_object->get_tag_values($tag)) { > print " value: ", $value, "\n"; > } > } > } > > You can also convert all the tag-value data into a Bio::Annotation::Collection using the Bio::SeqFeature::AnnotationAdaptor, but this is completely optional. > > chris > > Hi Chris Thanks for the infos. I indeed revert back to using $feat->get_tag_values() and it works as previously. For my small problem I can keep this solution which far adapted for my problem. Regards Emmanuel -- ------------------------- Emmanuel Quevillon Biological Software and Databases Group Institut Pasteur +33 1 44 38 95 98 tuco at_ pasteur dot fr ------------------------- From sung at bio.cc Wed Dec 16 12:55:16 2009 From: sung at bio.cc (Sungsam Gong) Date: Wed, 16 Dec 2009 17:55:16 +0000 Subject: [Bioperl-l] pdb.pm and annotations Message-ID: <2dade3480912160955h4f77277dv8e6b47b7b0fda23a@mail.gmail.com> Hi, Wanted to get pubmed identifier from a PDB file using Bio::Structure, so hacked the code. Knew that Bio::Structure::IO::pdb.pm get relevant info from either 'JRNL' or 'REMARK 1'. However could not see any actual code parsing 'PMID'. >From pdb.pm, what I see: sub _read_PDB_jrnl { ... $auth = $self->_concatenate_lines($auth,$rol) if ($subr eq "AUTH"); $titl = $self->_concatenate_lines($titl,$rol) if ($subr eq "TITL"); $edit = $self->_concatenate_lines($edit,$rol) if ($subr eq "EDIT"); $ref = $self->_concatenate_lines($ref ,$rol) if ($subr eq "REF"); $publ = $self->_concatenate_lines($publ,$rol) if ($subr eq "PUBL"); $refn = $self->_concatenate_lines($refn,$rol) if ($subr eq "REFN"); ... } sub _read_PDB_remark_1 { ... $auth = $self->_concatenate_lines($auth,$rol) if ($subr eq "AUTH"); $titl = $self->_concatenate_lines($titl,$rol) if ($subr eq "TITL"); $edit = $self->_concatenate_lines($edit,$rol) if ($subr eq "EDIT"); $ref = $self->_concatenate_lines($ref ,$rol) if ($subr eq "REF"); $publ = $self->_concatenate_lines($publ,$rol) if ($subr eq "PUBL"); $refn = $self->_concatenate_lines($refn,$rol) if ($subr eq "REFN"); ... } >From my script, I did: ($struc->annotation->get_Annotations('reference'))[0]->authors ($struc->annotation->get_Annotations('reference'))[0]->title or my $hash_ref=($struc->annotation->get_Annotations('reference'))[0]->hash_tree for my $key (keys %{$hash_ref}) { print $key,": ",$hash_ref->{$key},"\n"; } Any plan to include a code chopping 'PMID' out? Or did I miss something? Cheers, Sung From nml5566 at gmail.com Wed Dec 16 14:42:57 2009 From: nml5566 at gmail.com (Nathan Liles) Date: Wed, 16 Dec 2009 13:42:57 -0600 Subject: [Bioperl-l] Bio::Ontology::OBOEngine for parsing obo files? In-Reply-To: References: <81a20b1e0912151337q786b6c35se18328173ec27abd@mail.gmail.com> Message-ID: <81a20b1e0912161142m77051529se59b4621a0add13b@mail.gmail.com> Actually, yes I did find that and it works very well. Now I'm wondering, is it possible to search for similar terms using a string instead of a Bio::Ontology term object? For examle, I'd like to search for the synonym: "transcription start site" and have it return all similar terms. But, it throws an error if I pass in a simple query like that. -Nathan On Tue, Dec 15, 2009 at 4:05 PM, Hilmar Lapp wrote: > That shouldn't happen I suppose, but you're not supposed really to use the > engine directly. Rather it will be used as a backing parser by the > Bio::OntologyIO parser you choose. Have you tried that route and found it > not to work? > > -hilmar > > > On Dec 15, 2009, at 4:37 PM, Nathan Liles wrote: > > Is the Bio::Ontology::OBOEngine module working or being currently >> maintained? I tried following the documentation in the module: >> >> * use Bio::Ontology::OBOEngine; >> >> my $parser = Bio::Ontology::OBOEngine->new >> ( -file => "gene_ontology.obo" ); >> >> my $engine = $parser->parse(); >> >> *But, it throws an error when I run the file saying 'Can't locate object >> method "parse" '. Does anyone have any experience getting this module >> working; or, is there any alternative bioperl module to extract terms and >> relationships out of sequence ontology files? >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > =========================================================== > > > > > From cjfields1 at gmail.com Wed Dec 16 19:53:50 2009 From: cjfields1 at gmail.com (Chris Fields) Date: Wed, 16 Dec 2009 16:53:50 -0800 (PST) Subject: [Bioperl-l] Test post from Google Groups Message-ID: Howdy from Google Groups From cjfields1 at gmail.com Wed Dec 16 20:01:38 2009 From: cjfields1 at gmail.com (Chris Fields) Date: Wed, 16 Dec 2009 17:01:38 -0800 (PST) Subject: [Bioperl-l] bioperl-l Google Groups mirror Message-ID: <87fe54c9-766e-458a-bc09-0a658d050453@m3g2000yqf.googlegroups.com> I would like to announce (with the tremendous help of Hilmar Lapp) the creation of a mirror for the BioPerl mail list, if the last post didn't already give it away. http://groups.google.com/group/bioperl-l One can join the group and submit posts via the Google Groups web interface or via email. Have fun! chris From ocarnorsk138 at gmail.com Wed Dec 16 20:12:21 2009 From: ocarnorsk138 at gmail.com (Ocar Campos) Date: Wed, 16 Dec 2009 17:12:21 -0800 (PST) Subject: [Bioperl-l] Test post from Google Groups In-Reply-To: References: Message-ID: <03416808-ec4b-44b3-8269-6743a26b5368@k4g2000yqb.googlegroups.com> testing back from google group! On Dec 16, 9:53?pm, Chris Fields wrote: > Howdy from Google Groups > _______________________________________________ > Bioperl-l mailing list > Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l From p.j.a.cock at googlemail.com Thu Dec 17 05:50:23 2009 From: p.j.a.cock at googlemail.com (Peter) Date: Thu, 17 Dec 2009 02:50:23 -0800 (PST) Subject: [Bioperl-l] bioperl-l Google Groups mirror In-Reply-To: <87fe54c9-766e-458a-bc09-0a658d050453@m3g2000yqf.googlegroups.com> References: <87fe54c9-766e-458a-bc09-0a658d050453@m3g2000yqf.googlegroups.com> Message-ID: On Dec 17, 1:01?am, Chris Fields wrote: > I would like to announce (with the tremendous help of Hilmar Lapp) the > creation of a mirror for the BioPerl mail list, if the last post > didn't already give it away. > > http://groups.google.com/group/bioperl-l > > One can join the group and submit posts via the Google Groups web > interface or via email. ?Have fun! > > chris Sounds particularly good in the long run (once there is enough of an archive on Google Groups to make searching there useful). Does this mean a Google Groups user doesn't have to be subscribed to the mailing list to post (since the mailing list normally only allows subscribers to post)? Peter From David.Messina at sbc.su.se Thu Dec 17 07:25:49 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 17 Dec 2009 13:25:49 +0100 Subject: [Bioperl-l] bioperl-l Google Groups mirror In-Reply-To: References: <87fe54c9-766e-458a-bc09-0a658d050453@m3g2000yqf.googlegroups.com> Message-ID: <1D13A126-0A51-4815-89D6-664AC062C2AD@sbc.su.se> Very nice, Chris and Hilmar! That'll be great. > Does this mean a Google Groups user doesn't have to be subscribed > to the mailing list to post (since the mailing list normally only > allows subscribers to post)? I think that's right. From the Google groups page: > You can join (and post to) the list either here through Google Groups, or at the BioPerl-l mailing list home, using the web-interface or email, respectively. Dave From cjfields at illinois.edu Thu Dec 17 08:21:46 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 17 Dec 2009 07:21:46 -0600 Subject: [Bioperl-l] bioperl-l Google Groups mirror In-Reply-To: <1D13A126-0A51-4815-89D6-664AC062C2AD@sbc.su.se> References: <87fe54c9-766e-458a-bc09-0a658d050453@m3g2000yqf.googlegroups.com> <1D13A126-0A51-4815-89D6-664AC062C2AD@sbc.su.se> Message-ID: <209F1321-37DD-4B6C-A153-8A5AA0EF3E0A@illinois.edu> On Dec 17, 2009, at 6:25 AM, Dave Messina wrote: > Very nice, Chris and Hilmar! That'll be great. > > > >> Does this mean a Google Groups user doesn't have to be subscribed >> to the mailing list to post (since the mailing list normally only >> allows subscribers to post)? > > > I think that's right. From the Google groups page: > >> You can join (and post to) the list either here through Google Groups, or at the BioPerl-l mailing list home, using the web-interface or email, respectively. > > > > > Dave It is moderated by user to deal with spam. Hilmar's already a manager/co-owner, and either of us can add more as needed. chris From hlapp at drycafe.net Thu Dec 17 09:52:33 2009 From: hlapp at drycafe.net (Hilmar Lapp) Date: Thu, 17 Dec 2009 09:52:33 -0500 Subject: [Bioperl-l] bioperl-l Google Groups mirror In-Reply-To: References: <87fe54c9-766e-458a-bc09-0a658d050453@m3g2000yqf.googlegroups.com> Message-ID: <56214506-9BE7-4761-9E87-3A43D3707A29@drycafe.net> On Dec 17, 2009, at 5:50 AM, Peter wrote: > Does this mean a Google Groups user doesn't have to be subscribed > to the mailing list to post Yes. They can post through the Google Groups web interface. The email address for mirrored groups is the one of the list being mirrored though, bioperl-l at bioperl.org in this case, and so in order to post by email you still have to be subscribed at the bioperl-l list. At least that's what the docs at Google say. I haven't tried yet posting to the group at the bioperl-l at googlegroups dot com email under an email address that isn't subscribed to bioperl-l at bioperl dot org. Maybe it actually would work, contrary to docs. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From jay at jays.net Thu Dec 17 12:05:24 2009 From: jay at jays.net (Jay Hannah) Date: Thu, 17 Dec 2009 11:05:24 -0600 Subject: [Bioperl-l] bioperl-l Google Groups mirror In-Reply-To: <56214506-9BE7-4761-9E87-3A43D3707A29@drycafe.net> References: <87fe54c9-766e-458a-bc09-0a658d050453@m3g2000yqf.googlegroups.com> <56214506-9BE7-4761-9E87-3A43D3707A29@drycafe.net> Message-ID: <9BDF08A3-67E0-4F5E-8429-11AE586F6504@jays.net> On Dec 17, 2009, at 8:52 AM, Hilmar Lapp wrote: > I haven't tried yet posting to the group at the bioperl-l at googlegroups dot com email under an email address that isn't subscribed to bioperl-l at bioperl dot org. Maybe it actually would work, contrary to docs. In my experience (and ignoring a brief glitch this summer) moderation of new members works great. Almost zero spam gets through. Not as convenient for the admin as MailMan self-service email verification, but perhaps easier for some users and not too much admin work if you don't have too many new legitimate members every month. Here is the configuration set I recommend: http://clab.ist.unomaha.edu/~jhannah/tmp/google_groups.png Your membership roles will end up with quite a few junk accounts, but those bots can't post, so it's not that big a deal. I purge mine manually once a year or so. HTH, j http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah From robert.bradbury at gmail.com Thu Dec 17 14:42:54 2009 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Thu, 17 Dec 2009 14:42:54 -0500 Subject: [Bioperl-l] Remote blast fork errors / Process limit restrictions In-Reply-To: <39420695-FC14-446F-A4C2-C187E2221E44@bioperl.org> References: <39420695-FC14-446F-A4C2-C187E2221E44@bioperl.org> Message-ID: Just to close out the issue of bioperl forking (in particular accesses to external databases through get_sequence) which involves individual database sub-modules and not collecting its children. As it turns out the code does do an explicit fork, it looks like so the child process can read from the database while the parent process manipulates the data as it becomes available. Now, one could argue that a threaded model might be better since now threads are fairly standard OS tools in current environments. But I couldn't find any functions which actually wait for the forked process (presumably because they are created for "future" use). But nor is there any indication in the pages I've found in most of the documentation (which is spread across the web) or Wiki that explain that "creating child processes" is how these functions work and one *needs* to collect those children after each use or else zombie processes will accumulate, which on "reasonable" systems with per-user process limits will create problems for proper program functioning. Nor (it would appear) does the parent process setup a SIGCHLD "catcher" which could collect the processes once they exit (which I expect in the case of "get_sequence" would be after closing of the socket which actually fetched the sequence from Genbank. It can be resolved easily enough by adding a call after each use of these functions: $kid = waitpid(-1, WNOHANG); But typically, as a programmer, I should not be responsible for having to clean up the leftovers of library calls (unless said cleanup requirements are clearly documented). But to a "newbie" using the functions, coming from a functional background (C), not an OO background (which at least I would tend to view as a wart on the otherwise robust Perl language), there are two problems 1. The lack of documentation and examples explaining how the functions work and how they must be handled at a higher level (by executing explicit wait system calls). 2. The lack of code in the BioPerl functions to deal with the forked processes which they create. Functional programmers have a perspective -- if you create it -- you have to clean it up. It would appear that in the transition to OO programming (or perhaps simply for expediency) that detail was left out of both (either/and) the documentation and the code. From this standpoint one could view garbage collectors as being fundamentally evil -- because they gloss over the fact that programmers should know what they are doing and when they are doing it. So, everywhere in the documentation where there is a get_sequence call (or anything which accesses an external database which causes a fork to occur) there should be a modification as I have outlined above -- or else the code should be corrected so orphaned children are always collected and not allowed to accumulate. From robert.bradbury at gmail.com Thu Dec 17 15:23:38 2009 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Thu, 17 Dec 2009 15:23:38 -0500 Subject: [Bioperl-l] Remote blast fork errors / Process limit restrictions In-Reply-To: References: <39420695-FC14-446F-A4C2-C187E2221E44@bioperl.org> Message-ID: Oh, yes, in case it was not clear, the fork calls which fails is in DB/WebDBSeqI.pm: line 722 defined(my $pid = fork) or $self->throw("'Couldn't fork: $!"); And of course that is because Linux has reached the process limits for the user (due to accumulated background processes which are uncollected). And they could be resolved by simply executing a simple waitpid call for prior orphaned children before forking [1] But such a succinct solution would violate "functional" programming rules -- clean up what you create -- instead they would tend to fall into the OO camp -- "Oh don't worry the garbage collector will take care of it". Green programming is a little less cavalier. Robert 1. IMO, a very very real problem with programming today is that there is no connection between programmers and the cost of their programs. How many programmers know the instruction cycle time of their computers, what does an instruction cost in terms of W consumed, W wasted (heat generation), fruitless scanning over uncollected zombie processes, etc. It may be that only that programmers who grew up in the era when CPU cycles were expensive (300 ns/cycle) who know what each instruction required in terms of cycles consider these perspectives. Now things (cpu use, processor use, etc) tend to be swept under the rug and it appears that that is the case with the standard implementation of bioper. The documentation does not clearly state that additional sub-processes may be created and need to be collected. You are providing a utility that only works "this much". And guess what -- I happen to have run into the "this". From cjfields at illinois.edu Thu Dec 17 15:25:56 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 17 Dec 2009 14:25:56 -0600 Subject: [Bioperl-l] Remote blast fork errors / Process limit restrictions In-Reply-To: References: <39420695-FC14-446F-A4C2-C187E2221E44@bioperl.org> Message-ID: Robert, I have previously outlined specifically why you are seeing the fork issue, and a possible solution. IIRC it primarily has to do with you trying to do something more advanced using the (very basic) Bio::Perl procedural interface, something along the lines of pulling a sequence and using RemoteBlast. Retrieving a sequence from a remote database is a forked process on most OS's (I think Win is the sole exception) and occurs internally in Bio::Perl via Bio::DB::GenBank. Setting up your own pipeline, using Bio::DB::GenBank (set to use temp files), followed by Bio::Tools::Run::RemoteBlast or Bio::Perl, are options in the meantime. Trying to catch signals can be notoriously flaky cross-platform and cross perl versions; I recall running into problems with CygWin and OS X. We can modify Bio::Perl to use a temp file instead, which avoids the whole use of forks altogether, and is probably the best long-term solution. My last bit: I don't usually say this, primarily b/c it's misconstrued by some, but 'patches are always welcome'. What doesn't work is just telling us to arbitrarily change code w/o indicating exactly where to do so. The tone you use, which comes off a tad condescending, can be abrasive and may not garner any response (or at least will get you one you don't expect). Please keep that in mind. chris On Dec 17, 2009, at 1:42 PM, Robert Bradbury wrote: > Just to close out the issue of bioperl forking (in particular accesses to > external databases through get_sequence) which involves individual database > sub-modules and not collecting its children. > > As it turns out the code does do an explicit fork, it looks like so the > child process can read from the database while the parent process > manipulates the data as it becomes available. Now, one could argue that a > threaded model might be better since now threads are fairly standard OS > tools in current environments. > > But I couldn't find any functions which actually wait for the forked process > (presumably because they are created for "future" use). But nor is there > any indication in the pages I've found in most of the documentation (which > is spread across the web) or Wiki that explain that "creating child > processes" is how these functions work and one *needs* to collect those > children after each use or else zombie processes will accumulate, which on > "reasonable" systems with per-user process limits will create problems for > proper program functioning. Nor (it would appear) does the parent process > setup a SIGCHLD "catcher" which could collect the processes once they exit > (which I expect in the case of "get_sequence" would be after closing of the > socket which actually fetched the sequence from Genbank. > > It can be resolved easily enough by adding a call after each use of these > functions: > $kid = waitpid(-1, WNOHANG); > But typically, as a programmer, I should not be responsible for having to > clean up the leftovers of library calls (unless said cleanup requirements > are clearly documented). > > > But to a "newbie" using the functions, coming from a functional background > (C), not an OO background (which at least I would tend to view as a wart on > the otherwise robust Perl language), there are two problems > 1. The lack of documentation and examples explaining how the functions work > and how they must be handled at a higher level (by executing explicit wait > system calls). > 2. The lack of code in the BioPerl functions to deal with the forked > processes which they create. Functional programmers have a perspective -- > if you create it -- you have to clean it up. It would appear that in the > transition to OO programming (or perhaps simply for expediency) that detail > was left out of both (either/and) the documentation and the code. From this > standpoint one could view garbage collectors as being fundamentally evil -- > because they gloss over the fact that programmers should know what they are > doing and when they are doing it. > > So, everywhere in the documentation where there is a get_sequence call (or > anything which accesses an external database which causes a fork to occur) > there should be a modification as I have outlined above -- or else the code > should be corrected so orphaned children are always collected and not > allowed to accumulate. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu Dec 17 15:29:10 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 17 Dec 2009 14:29:10 -0600 Subject: [Bioperl-l] Remote blast fork errors / Process limit restrictions In-Reply-To: References: <39420695-FC14-446F-A4C2-C187E2221E44@bioperl.org> Message-ID: On Dec 17, 2009, at 2:23 PM, Robert Bradbury wrote: > Oh, yes, in case it was not clear, the fork calls which fails is in > DB/WebDBSeqI.pm: line 722 > defined(my $pid = fork) > or $self->throw("'Couldn't fork: $!"); Okay, that's a bit more helpful. > And of course that is because Linux has reached the process limits for the > user (due to accumulated background processes which are uncollected). Right, but again, we need to check this in a cross-platform compatible way. > And they could be resolved by simply executing a simple waitpid call for > prior orphaned children before forking [1] But such a succinct solution > would violate "functional" programming rules -- clean up what you create -- > instead they would tend to fall into the OO camp -- "Oh don't worry the > garbage collector will take care of it". Green programming is a little less > cavalier. > > Robert > > 1. IMO, a very very real problem with programming today is that there is no > connection between programmers and the cost of their programs. How many > programmers know the instruction cycle time of their computers, what does an > instruction cost in terms of W consumed, W wasted (heat generation), > fruitless scanning over uncollected zombie processes, etc. It may be that > only that programmers who grew up in the era when CPU cycles were expensive > (300 ns/cycle) who know what each instruction required in terms of cycles > consider these perspectives. Now things (cpu use, processor use, etc) tend > to be swept under the rug and it appears that that is the case with the > standard implementation of bioper. The documentation does not clearly state > that additional sub-processes may be created and need to be collected. You > are providing a utility that only works "this much". And guess what -- I > happen to have run into the "this". Um, yeah. Okay. chris From robfsouza at gmail.com Fri Dec 18 13:07:34 2009 From: robfsouza at gmail.com (Robson Francisco de Souza) Date: Fri, 18 Dec 2009 13:07:34 -0500 Subject: [Bioperl-l] Fwd: blast.pm patch In-Reply-To: References: Message-ID: Hi, I've been dealing with an apparent bug in the output of NCBI's BLAST programs (blastall, blastpgp) which sometimes produces output like the one below. I think I've managed to produce a work around for Bioperl blast.pm parser and would like to contribute it to Bioperl. The fix is based on blast.pm from the CVS tree (downloaded some months ago...) and is attached to this message. Best, Robson PS: what happened to the bioperl-bugs mailing list? It does not seem to be working... >gi|156552846|ref|XP_001600053.1| PREDICTED: similar to conserved ? ? ? ? ? hypothetical protein [Nasonia vitripennis] ? ? ? ? ?Length = 1774 ?Score = 75.9 bits (185), Expect = 1e-11, ? Method: Compositional matrix adjust. ?Identities = 85/393 (21%), Positives = 175/393 (44%), Gaps = 28/393 (7%) Query: 0 ? - Sbjct: 328 P ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328 Query: 0 Sbjct: 328 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328 Query: 0 Sbjct: 328 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328 Query: 0 Sbjct: 328 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328 Query: 0 Sbjct: 328 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328 Query: 0 Sbjct: 328 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328 Query: 0 Sbjct: 328 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328 Query: 0 Sbjct: 328 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328 Query: 0 Sbjct: 328 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328 Query: 0 Sbjct: 328 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328 Query: 612 VPPPPGSGIPMPPGGGFFGMKTKLP-----KLPELKATKDTKKIHIAG ? ? ? ? ? ? 654 ? ? ? ? ? ?P PP + ? + P ? ? ? KTK+ ? ? ?K+P ?K ? ? ? ? + Sbjct: 329 TPEPPNNSAKLLPQQEIPTPKTKMKTINWNKIPNHKVIGKRNIWSLVA ? ? ? ? ? ? 376 Query: 655 DKINNKDIEGTGWMSILEENAEKMSKIFDKN-LFENNFQKKETRDAPSQEKENVPTLVSF 713 ? ? ? ? ? ++ ?N ?+ ? ?W ?+ ? ? +++ ?+ ? N ? ?NN ? ? ? D ? +E ? ?PT ++ Sbjct: 377 NEHQNSPMADLDWAEMEGLFCQQVPPMIPANTTCSNNLGNGVDTDKRRRE----PTEIAL 432 Query: 714 LDSKTSYQLALLLGFLKKNEREIRKHVIDLNEKELQKQTIHSLKDLCPEEDKFKEIESFV 773 ? ? ? ? ? LD K S ?+ + L ? + + ?+I + + D ? ?++ ?+ + ?L ?+ PE D+ + ++SF Sbjct: 433 LDGKRSLNVNIFLKQFRSSNEDIIQLIKDGGHDDIGAEKLRGLLKILPEVDELEMLKSF- 491 Query: 774 QKGDGYLEQLEPGDKLFYAMKDIPRLKQRFTAWSSQIYFEGSVISVEPDIESLNRACKNI 833 ? ? ? ? ? ? ?DG ? +L ? +K F ?+ ?+P ?K R ? ? ?+ ?F ?++ ?+EP I S+ ?A +++ Sbjct: 492 ---DGDKLKLGNAEKFFLQLIQVPNYKLRIECMLLKEEFAANMSYLEPSINSMILAGEDL 548 Query: 834 VQCKSLQRLMTLIVLLVNFLNKAKTDKDRVYGFKLNFLTKLGDIKSSSDPNRSMMNYLCE 893 ? ? ? ? ? + ?KSLQ ++ ++++ ?NFLN ? ? ?+ ? G KL+ L KL +I++ ? ?N+ ?MN L Sbjct: 549 MTNKSLQEVLYMVLVAGNFLNSGGYAGN-AAGVKLSSLQKLTEIRA----NKPGMN-LIH 602 Query: 894 FLLAKDDKLIPELLKELK--DYAEVGSRIELPELKKEIGKLNESLKVIQTELEFYKKEQK 951 ? ? ? ? ? ++ ?+ ++ ? +LL ? + ?+ ?+ ?++ ?+ +L ?E ? L+ ?+K I+++++ ? ?E Sbjct: 603 YVAMQAERKRKDLLNFARGMNALDSATKTTVEQLTNEFNALDTRIKKIRSQIQLPTTEA- 661 Query: 952 FINDKFPKQLDEFYQYAKSEMQKINKAQEKLEKILKEVAKFFGE 995 ? ? ? ? ? ? ? ? ?+Q+ +F Q A+ EM ++ + ?E+L+ + + +A+FF E Sbjct: 662 ----DIQEQMAQFLQMAEQEMSQLKRDMEELDGVRRTLAEFFCE 701 -------------- next part -------------- A non-text attachment was scrubbed... Name: blast_patched.pm Type: application/octet-stream Size: 91820 bytes Desc: not available URL: From cjfields at illinois.edu Fri Dec 18 13:33:44 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 18 Dec 2009 12:33:44 -0600 Subject: [Bioperl-l] Fwd: blast.pm patch In-Reply-To: References: Message-ID: Robson, Any chance you could check this against SVN? We haven't used the CVS tree for a few years (had a number of releases along the way as well). Not sure about bioperl-bugs, we have bugzilla still running though: http://bugzilla.open-bio.org/ chris On Dec 18, 2009, at 12:07 PM, Robson Francisco de Souza wrote: > Hi, > > I've been dealing with an apparent bug in the output of NCBI's BLAST > programs (blastall, blastpgp) which sometimes produces output like the > one below. > I think I've managed to produce a work around for Bioperl blast.pm > parser and would like to contribute it to Bioperl. > The fix is based on blast.pm from the CVS tree (downloaded some months > ago...) and is attached to this message. > Best, > Robson > > PS: what happened to the bioperl-bugs mailing list? It does not seem > to be working... > >> gi|156552846|ref|XP_001600053.1| PREDICTED: similar to conserved > hypothetical protein [Nasonia vitripennis] > Length = 1774 > > Score = 75.9 bits (185), Expect = 1e-11, Method: Compositional matrix adjust. > Identities = 85/393 (21%), Positives = 175/393 (44%), Gaps = 28/393 (7%) > > Query: 0 - > > Sbjct: 328 P 328 > > Query: 0 > > Sbjct: 328 328 > > Query: 0 > > Sbjct: 328 328 > > Query: 0 > > Sbjct: 328 328 > > Query: 0 > > Sbjct: 328 328 > > Query: 0 > > Sbjct: 328 328 > > Query: 0 > > Sbjct: 328 328 > > Query: 0 > > Sbjct: 328 328 > > Query: 0 > > Sbjct: 328 328 > > Query: 0 > > Sbjct: 328 328 > > Query: 612 VPPPPGSGIPMPPGGGFFGMKTKLP-----KLPELKATKDTKKIHIAG 654 > P PP + + P KTK+ K+P K + > Sbjct: 329 TPEPPNNSAKLLPQQEIPTPKTKMKTINWNKIPNHKVIGKRNIWSLVA 376 > > Query: 655 DKINNKDIEGTGWMSILEENAEKMSKIFDKN-LFENNFQKKETRDAPSQEKENVPTLVSF 713 > ++ N + W + +++ + N NN D +E PT ++ > Sbjct: 377 NEHQNSPMADLDWAEMEGLFCQQVPPMIPANTTCSNNLGNGVDTDKRRRE----PTEIAL 432 > > Query: 714 LDSKTSYQLALLLGFLKKNEREIRKHVIDLNEKELQKQTIHSLKDLCPEEDKFKEIESFV 773 > LD K S + + L + + +I + + D ++ + + L + PE D+ + ++SF > Sbjct: 433 LDGKRSLNVNIFLKQFRSSNEDIIQLIKDGGHDDIGAEKLRGLLKILPEVDELEMLKSF- 491 > > Query: 774 QKGDGYLEQLEPGDKLFYAMKDIPRLKQRFTAWSSQIYFEGSVISVEPDIESLNRACKNI 833 > DG +L +K F + +P K R + F ++ +EP I S+ A +++ > Sbjct: 492 ---DGDKLKLGNAEKFFLQLIQVPNYKLRIECMLLKEEFAANMSYLEPSINSMILAGEDL 548 > > Query: 834 VQCKSLQRLMTLIVLLVNFLNKAKTDKDRVYGFKLNFLTKLGDIKSSSDPNRSMMNYLCE 893 > + KSLQ ++ ++++ NFLN + G KL+ L KL +I++ N+ MN L > Sbjct: 549 MTNKSLQEVLYMVLVAGNFLNSGGYAGN-AAGVKLSSLQKLTEIRA----NKPGMN-LIH 602 > > Query: 894 FLLAKDDKLIPELLKELK--DYAEVGSRIELPELKKEIGKLNESLKVIQTELEFYKKEQK 951 > ++ + ++ +LL + + + ++ + +L E L+ +K I+++++ E > Sbjct: 603 YVAMQAERKRKDLLNFARGMNALDSATKTTVEQLTNEFNALDTRIKKIRSQIQLPTTEA- 661 > > Query: 952 FINDKFPKQLDEFYQYAKSEMQKINKAQEKLEKILKEVAKFFGE 995 > +Q+ +F Q A+ EM ++ + E+L+ + + +A+FF E > Sbjct: 662 ----DIQEQMAQFLQMAEQEMSQLKRDMEELDGVRRTLAEFFCE 701 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From biopython at maubp.freeserve.co.uk Fri Dec 18 18:00:47 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 18 Dec 2009 23:00:47 +0000 Subject: [Bioperl-l] Fwd: blast.pm patch In-Reply-To: References: Message-ID: <320fb6e00912181500r53c93284yc526ce654ca9050@mail.gmail.com> On Fri, Dec 18, 2009 at 6:07 PM, Robson Francisco de Souza wrote: > Hi, > > I've been dealing with an apparent bug in the output of NCBI's BLAST > programs (blastall, blastpgp) which sometimes produces output like the > one below. > I think I've managed to produce a work around for Bioperl blast.pm > parser and would like to contribute it to Bioperl. > The fix is based on blast.pm from the CVS tree (downloaded some months > ago...) and is attached to this message. > Best, > Robson Do you have a complete example of this kind of funny output? This problem has also been reported with blastpgp for the Biopython parser. I'd love an example for our unit tests (probably worth doing in BioPerl too). Could you upload a test case here?: http://bugzilla.open-bio.org/show_bug.cgi?id=2927 Thanks! Peter @ Biopython From biopython at maubp.freeserve.co.uk Sat Dec 19 06:19:53 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 19 Dec 2009 11:19:53 +0000 Subject: [Bioperl-l] Fwd: blast.pm patch In-Reply-To: References: <320fb6e00912181500r53c93284yc526ce654ca9050@mail.gmail.com> Message-ID: <320fb6e00912190319s75a0eb75m94dfbd7946a310e5@mail.gmail.com> On Sat, Dec 19, 2009 at 11:06 AM, Robson Francisco de Souza wrote: > > Hi Peter, > > I just upload my example. I also reported this bug to the NCBI > developers and I hope they can fix it, since it is easy to reproduce. > I just forgot to mention the blastpgp version: 2.2.18 > Best, > Robson Thank you, Peter From maj at fortinbras.us Sat Dec 19 14:52:45 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sat, 19 Dec 2009 14:52:45 -0500 Subject: [Bioperl-l] NCBI BlastPlus wrapper for your enjoyment Message-ID: Hi All, Your full-service BLAST wrapper, Bio::Tools::Run::StandAloneBlastPlus, is at beta in the bioperl-run trunk. It wraps all the programs of the NCBI's new blast+-2.2.22 suite ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ and integrates them, allowing you to create, mask, and query databases from within a single factory object. See the HOWTO http://www.bioperl.org/wiki/HOWTO:BlastPlus for the usual usage and implementation details. Happy coding-- MAJ From David.Messina at sbc.su.se Sat Dec 19 15:34:10 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Sat, 19 Dec 2009 21:34:10 +0100 Subject: [Bioperl-l] NCBI BlastPlus wrapper for your enjoyment In-Reply-To: References: Message-ID: <8F67673F-E71E-46A1-BD7C-6465C4D13398@sbc.su.se> Sweet! Thanks, Mark. Dave From cjfields at illinois.edu Sat Dec 19 17:44:46 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 19 Dec 2009 16:44:46 -0600 Subject: [Bioperl-l] NCBI BlastPlus wrapper for your enjoyment In-Reply-To: References: Message-ID: <3DC558C9-DD64-45F9-8A6F-EA4238D22EA5@illinois.edu> Very nice! We'll definitely give it a try here (along with the requisite feedback, of course). chris On Dec 19, 2009, at 1:52 PM, Mark A. Jensen wrote: > Hi All, > > Your full-service BLAST wrapper, Bio::Tools::Run::StandAloneBlastPlus, > is at beta in the bioperl-run trunk. It wraps all the programs of the > NCBI's new blast+-2.2.22 suite > ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ > and integrates them, allowing you to create, mask, and query > databases from within a single factory object. See the HOWTO > http://www.bioperl.org/wiki/HOWTO:BlastPlus > for the usual usage and implementation details. > > Happy coding-- > MAJ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Sat Dec 19 23:59:38 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 19 Dec 2009 22:59:38 -0600 Subject: [Bioperl-l] LocatableSeq NSE format - suggested changes In-Reply-To: <6723123C0ABD447190639AE1F5D1A6A7@NewLife> References: <579AEE4E-A79C-496B-881C-226616BF0A2B@illinois.edu> <6723123C0ABD447190639AE1F5D1A6A7@NewLife> Message-ID: <97DC7C2B-2433-4B8D-A16C-DF0507A29B22@illinois.edu> I think option 1 is cleaner as well; very easily added, so committed to main trunk as I consider this a bug, as one can potentially lose strand information when round-tripping data (original data with a -1 strand would be converted to +1). I'll work out the test fails on trunk along the way (ensure they're due to erroneous test data and not something else). chris On Dec 16, 2009, at 6:51 AM, Mark A. Jensen wrote: > I'm with Dave; option 1 is cleaner. The only problem might be the automatic interpretation of older output as always plus strand, but presumably these would have had to record the strandedness explicitly elsewhere, so they would be updatable. I'm definitely for making strandedness part of the spec in some way. cheers MAJ > ----- Original Message ----- From: "Chris Fields" > To: "BioPerl List" > Sent: Monday, December 14, 2009 8:23 PM > Subject: [Bioperl-l] LocatableSeq NSE format - suggested changes > > >> All, >> >> The current output for NSE format (Name/Start-End) via Bio::LocatableSeq::get_nse() currently doesn't allow for strandedness. I have seen two variations of NSE that incorporate strandedness: >> >> 1) Stockholm Rfam reverses start and end if the strand == -1 >> >> chrY/598-1 >> >> 2) Sheldon McKay's Gbrowse_syn uses Name(strand)/start-end >> >> rice-3(+)/16598648-16600199 >> >> The former breaks fewer things within BioPerl, but the latter seems more explicit. Any preferences? Do we want a new method that creates this, and deprecate out simple non-stranded NSE? >> >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From e.osimo at gmail.com Sun Dec 20 13:19:37 2009 From: e.osimo at gmail.com (Emanuele Osimo) Date: Sun, 20 Dec 2009 19:19:37 +0100 Subject: [Bioperl-l] Bio::Graphics and different Glyph sizes Message-ID: <2ac05d0f0912201019w278c1101q534749dd453fa1d1@mail.gmail.com> Hello everyone, I have a very particular problem: I'd like to draw in a single track different SNPs with a glyph that allows me to see graphically their importance. For example, if I have 10 SNPs 1 to 10 in importance, I'd like to have the first depicted small, and the last one big, with the ones in between with according sizes. I'd be satisfied also with a color gradient. What I cannot do is to set the option -height , for example, instead than in the add_track section, in the Bio::SeqFeature::Generic->new that I use for each of my objects. If I set it in the add_track section, all the glyphs are then of the same size (or color). If, otherwise, I add a different track for each object, my picture becomes too big. Please, help! Thanks Emanuele From ajmackey at gmail.com Sun Dec 20 13:41:14 2009 From: ajmackey at gmail.com (Aaron Mackey) Date: Sun, 20 Dec 2009 13:41:14 -0500 Subject: [Bioperl-l] Bio::Graphics and different Glyph sizes In-Reply-To: <2ac05d0f0912201019w278c1101q534749dd453fa1d1@mail.gmail.com> References: <2ac05d0f0912201019w278c1101q534749dd453fa1d1@mail.gmail.com> Message-ID: <24c96eca0912201041i37c32845k9e261414588b9bf4@mail.gmail.com> You can set the height as a callback sub, rather than a constant -- the callback will get passed the feature about to be drawn, from which you can calculate the "importance", and return the desired height, dynamically. -Aaron On Sun, Dec 20, 2009 at 1:19 PM, Emanuele Osimo wrote: > Hello everyone, > I have a very particular problem: I'd like to draw in a single track > different SNPs with a glyph that allows me to see graphically their > importance. > For example, if I have 10 SNPs 1 to 10 in importance, I'd like to have the > first depicted small, and the last one big, with the ones in between with > according sizes. > I'd be satisfied also with a color gradient. > What I cannot do is to set the option -height , for example, instead than > in > the add_track section, in the Bio::SeqFeature::Generic->new that I use for > each of my objects. > If I set it in the add_track section, all the glyphs are then of the same > size (or color). > If, otherwise, I add a different track for each object, my picture becomes > too big. > > Please, help! > Thanks > Emanuele > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From robfsouza at gmail.com Sat Dec 19 06:06:16 2009 From: robfsouza at gmail.com (Robson Francisco de Souza) Date: Sat, 19 Dec 2009 06:06:16 -0500 Subject: [Bioperl-l] Fwd: blast.pm patch In-Reply-To: <320fb6e00912181500r53c93284yc526ce654ca9050@mail.gmail.com> References: <320fb6e00912181500r53c93284yc526ce654ca9050@mail.gmail.com> Message-ID: Hi Peter, I just upload my example. I also reported this bug to the NCBI developers and I hope they can fix it, since it is easy to reproduce. I just forgot to mention the blastpgp version: 2.2.18 Best, Robson On Fri, Dec 18, 2009 at 6:00 PM, Peter wrote: > On Fri, Dec 18, 2009 at 6:07 PM, Robson Francisco de Souza > wrote: >> Hi, >> >> I've been dealing with an apparent bug in the output of NCBI's BLAST >> programs (blastall, blastpgp) which sometimes produces output like the >> one below. >> I think I've managed to produce a work around for Bioperl blast.pm >> parser and would like to contribute it to Bioperl. >> The fix is based on blast.pm from the CVS tree (downloaded some months >> ago...) and is attached to this message. >> Best, >> Robson > > Do you have a complete example of this kind of funny output? > This problem has also been reported with blastpgp for the > Biopython parser. I'd love an example for our unit tests > (probably worth doing in BioPerl too). Could you upload a > test case here?: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2927 > > Thanks! > > Peter @ Biopython > From biopython at maubp.freeserve.co.uk Mon Dec 21 10:27:47 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 21 Dec 2009 15:27:47 +0000 Subject: [Bioperl-l] Fwd: blast.pm patch In-Reply-To: References: <320fb6e00912181500r53c93284yc526ce654ca9050@mail.gmail.com> Message-ID: <320fb6e00912210727m522d2039if78891ab32fe0983@mail.gmail.com> On Sat, Dec 19, 2009 at 11:06 AM, Robson Francisco de Souza wrote: > > Hi Peter, > > I just upload my example. I also reported this bug to the NCBI > developers and I hope they can fix it, since it is easy to reproduce. > I just forgot to mention the blastpgp version: 2.2.18 > Best, > Robson Hi again Robson, Having a reproducible example to investigate this issue is incredibly helpful - thank you! I've been looking at the output, and while I can make sense of it "by hand", it would be very tricky to try and parse as a special case. It really does look like a bug in BLAST to me. The alignment includes an initial pair, a leading gap in the query (with a coordinate of zero), plus a residue from the match sequence (with a sensible coordinate). The alignment statistics include this (extra) pair in the alignment length. You said you were using blastpgp version 2.2.18, so I tried this with the latest (final?) version of the "legacy" BLAST suite, blastpgp 2.2.22, which I already had installed. It looks like my copy of NR is more recent (bigger), but the same odd output was produced: blastpgp -d nr -i Ngru1000013938.fa -o Ngru1000013938.fa.br -a 8 -j 1 -b 10000 I also tried what I think would be the equivalent command line on the new BLAST+ suite, using psiblast 2.2.22+ like this: psiblast -db nr -query Ngru1000013938.fa -out Ngru1000013938.fa.blast -num_threads 8 -parse_deflines -num_alignments 10000 This was much faster, and seems to output sensible alignments. I might therefore expect the NCBI so say "yes, this is a bug in the old blastpgp tool, just use the new psiblast tool instead". However, fingers crossed they will do another maintenance release of the "legacy" BLAST suite and fix this in blastpgp. Have you had any reply from the NCBI? Admittedly it is almost Christmas/New Year so we may not expect an answer until Jan. Peter From maj at fortinbras.us Mon Dec 21 13:52:01 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 21 Dec 2009 13:52:01 -0500 Subject: [Bioperl-l] test fail Message-ID: <5614E9FF133A47A694EF892D38A1717A@NewLife> fyi, getting following failure (Perl 5.10, GNU/Linux x86_64) t/SeqTools/SeqUtils..........................NOK 46/51# Failed test at t/SeqTools/SeqUtils.t line 275. # got: '1..4' # expected: 'complement(5..8)' t/SeqTools/SeqUtils..........................NOK 47/51# Failed test at t/SeqTools/SeqUtils.t line 276. # got: 'complement(5..8)' # expected: '1..4' # Looks like you failed 2 tests of 51. MAJ From cjfields at illinois.edu Mon Dec 21 14:20:32 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 21 Dec 2009 13:20:32 -0600 Subject: [Bioperl-l] test fail In-Reply-To: <5614E9FF133A47A694EF892D38A1717A@NewLife> References: <5614E9FF133A47A694EF892D38A1717A@NewLife> Message-ID: Saw that from the other day (LocatableSeq commit). I'll check it out. chris On Dec 21, 2009, at 12:52 PM, Mark A. Jensen wrote: > fyi, getting following failure (Perl 5.10, GNU/Linux x86_64) > > t/SeqTools/SeqUtils..........................NOK 46/51# Failed test at t/SeqTools/SeqUtils.t line 275. > # got: '1..4' > # expected: 'complement(5..8)' > > t/SeqTools/SeqUtils..........................NOK 47/51# Failed test at t/SeqTools/SeqUtils.t line 276. > # got: 'complement(5..8)' > # expected: '1..4' > # Looks like you failed 2 tests of 51. > > MAJ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From scott at scottcain.net Mon Dec 21 15:02:20 2009 From: scott at scottcain.net (Scott Cain) Date: Mon, 21 Dec 2009 15:02:20 -0500 Subject: [Bioperl-l] Bio::Graphics documentation Message-ID: <4536f7700912211202j4de81bb4k1e9039ed19b4ef97@mail.gmail.com> Hi All, Today it was pointed out to me that the Bio::Graphics documentation links on the BioPerl wiki are broken, no doubt because Bio::Graphics is no longer part of bioperl-core (is that how it should be referred to?). Anyway, the question is: what is the right way to rectify this problem? Since other things may get broken out in the future, I suppose we should get some sort of standard established. Can a release of Bio::Graphics be placed somewhere on the BioPerl wiki server to be processed? Thanks, Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From cjfields at illinois.edu Mon Dec 21 15:22:39 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 21 Dec 2009 14:22:39 -0600 Subject: [Bioperl-l] Bio::Graphics documentation In-Reply-To: <4536f7700912211202j4de81bb4k1e9039ed19b4ef97@mail.gmail.com> References: <4536f7700912211202j4de81bb4k1e9039ed19b4ef97@mail.gmail.com> Message-ID: <6FC2F08B-E902-449A-9E67-D1417A0BE20C@illinois.edu> We can come up with some standard wiki template for those modules no longer in svn, maybe with just CPAN links. Shouldn't be too hard to do. chris On Dec 21, 2009, at 2:02 PM, Scott Cain wrote: > Hi All, > > Today it was pointed out to me that the Bio::Graphics documentation > links on the BioPerl wiki are broken, no doubt because Bio::Graphics > is no longer part of bioperl-core (is that how it should be referred > to?). Anyway, the question is: what is the right way to rectify this > problem? Since other things may get broken out in the future, I > suppose we should get some sort of standard established. Can a > release of Bio::Graphics be placed somewhere on the BioPerl wiki > server to be processed? > > Thanks, > Scott > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Dec 21 16:12:45 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 21 Dec 2009 15:12:45 -0600 Subject: [Bioperl-l] test fail In-Reply-To: References: <5614E9FF133A47A694EF892D38A1717A@NewLife> Message-ID: T'was a bad test call. I basically changed the test to pull each feature directly by the primary tag, check it against the original sf prior to revcom, then check that the location was revcomp'ed correctly. chris On Dec 21, 2009, at 1:20 PM, Chris Fields wrote: > Saw that from the other day (LocatableSeq commit). I'll check it out. > > chris > > On Dec 21, 2009, at 12:52 PM, Mark A. Jensen wrote: > >> fyi, getting following failure (Perl 5.10, GNU/Linux x86_64) >> >> t/SeqTools/SeqUtils..........................NOK 46/51# Failed test at t/SeqTools/SeqUtils.t line 275. >> # got: '1..4' >> # expected: 'complement(5..8)' >> >> t/SeqTools/SeqUtils..........................NOK 47/51# Failed test at t/SeqTools/SeqUtils.t line 276. >> # got: 'complement(5..8)' >> # expected: '1..4' >> # Looks like you failed 2 tests of 51. >> >> MAJ >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Mon Dec 21 16:27:25 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 21 Dec 2009 16:27:25 -0500 Subject: [Bioperl-l] Bio::Graphics documentation In-Reply-To: <6FC2F08B-E902-449A-9E67-D1417A0BE20C@illinois.edu> References: <4536f7700912211202j4de81bb4k1e9039ed19b4ef97@mail.gmail.com> <6FC2F08B-E902-449A-9E67-D1417A0BE20C@illinois.edu> Message-ID: <1F54D94CE87E4238BC2C6128002FBC6A@NewLife> I've modified Template:Doclink ; if you now do {{Doclink|Bio::Graphics|cpan}} you'll get a page with only the cpan link. {{Doclink|Bio::SeqIO}} etc. works as usual. MAJ ----- Original Message ----- From: "Chris Fields" To: "Scott Cain" Cc: "BioPerl List" Sent: Monday, December 21, 2009 3:22 PM Subject: Re: [Bioperl-l] Bio::Graphics documentation > We can come up with some standard wiki template for those modules no longer in > svn, maybe with just CPAN links. Shouldn't be too hard to do. > > chris > > On Dec 21, 2009, at 2:02 PM, Scott Cain wrote: > >> Hi All, >> >> Today it was pointed out to me that the Bio::Graphics documentation >> links on the BioPerl wiki are broken, no doubt because Bio::Graphics >> is no longer part of bioperl-core (is that how it should be referred >> to?). Anyway, the question is: what is the right way to rectify this >> problem? Since other things may get broken out in the future, I >> suppose we should get some sort of standard established. Can a >> release of Bio::Graphics be placed somewhere on the BioPerl wiki >> server to be processed? >> >> Thanks, >> Scott >> >> >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D. scott at scottcain dot >> net >> GMOD Coordinator (http://gmod.org/) 216-392-3087 >> Ontario Institute for Cancer Research >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Mon Dec 21 16:34:40 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 21 Dec 2009 16:34:40 -0500 Subject: [Bioperl-l] Bio::Graphics documentation In-Reply-To: <6FC2F08B-E902-449A-9E67-D1417A0BE20C@illinois.edu> References: <4536f7700912211202j4de81bb4k1e9039ed19b4ef97@mail.gmail.com> <6FC2F08B-E902-449A-9E67-D1417A0BE20C@illinois.edu> Message-ID: <5081DC24D9AE46FF95075559898B2574@NewLife> Also, applied the new Doclink to Bio::Graphics on wiki. ----- Original Message ----- From: "Chris Fields" To: "Scott Cain" Cc: "BioPerl List" Sent: Monday, December 21, 2009 3:22 PM Subject: Re: [Bioperl-l] Bio::Graphics documentation > We can come up with some standard wiki template for those modules no longer in > svn, maybe with just CPAN links. Shouldn't be too hard to do. > > chris > > On Dec 21, 2009, at 2:02 PM, Scott Cain wrote: > >> Hi All, >> >> Today it was pointed out to me that the Bio::Graphics documentation >> links on the BioPerl wiki are broken, no doubt because Bio::Graphics >> is no longer part of bioperl-core (is that how it should be referred >> to?). Anyway, the question is: what is the right way to rectify this >> problem? Since other things may get broken out in the future, I >> suppose we should get some sort of standard established. Can a >> release of Bio::Graphics be placed somewhere on the BioPerl wiki >> server to be processed? >> >> Thanks, >> Scott >> >> >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D. scott at scottcain dot >> net >> GMOD Coordinator (http://gmod.org/) 216-392-3087 >> Ontario Institute for Cancer Research >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Mon Dec 21 21:51:32 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 21 Dec 2009 21:51:32 -0500 Subject: [Bioperl-l] pdb.pm and annotations In-Reply-To: <2dade3480912160955h4f77277dv8e6b47b7b0fda23a@mail.gmail.com> References: <2dade3480912160955h4f77277dv8e6b47b7b0fda23a@mail.gmail.com> Message-ID: <6292EDA0F05B48578AF7B7E5864C8707@NewLife> Hi Sung-- We didn't plan it, but we added it anyway: see revision 16559 of bioperl-live/trunk. You can then do $pmid = ($struct->annotation->get_Annotations('reference'))[0]->pubmed; and even $doi = ($struct->annotation->get_Annotations('reference'))[0]->doi; Thanks for the heads-up! cheers, MAJ ----- Original Message ----- From: "Sungsam Gong" To: Sent: Wednesday, December 16, 2009 12:55 PM Subject: [Bioperl-l] pdb.pm and annotations > Hi, > > Wanted to get pubmed identifier from a PDB file using Bio::Structure, > so hacked the code. > Knew that Bio::Structure::IO::pdb.pm get relevant info from either > 'JRNL' or 'REMARK 1'. > However could not see any actual code parsing 'PMID'. > >>From pdb.pm, what I see: > > sub _read_PDB_jrnl { > ... > $auth = $self->_concatenate_lines($auth,$rol) if ($subr eq "AUTH"); > $titl = $self->_concatenate_lines($titl,$rol) if ($subr eq "TITL"); > $edit = $self->_concatenate_lines($edit,$rol) if ($subr eq "EDIT"); > $ref = $self->_concatenate_lines($ref ,$rol) if ($subr eq "REF"); > $publ = $self->_concatenate_lines($publ,$rol) if ($subr eq "PUBL"); > $refn = $self->_concatenate_lines($refn,$rol) if ($subr eq "REFN"); > ... > } > > sub _read_PDB_remark_1 { > ... > $auth = $self->_concatenate_lines($auth,$rol) if > ($subr eq "AUTH"); > $titl = $self->_concatenate_lines($titl,$rol) if > ($subr eq "TITL"); > $edit = $self->_concatenate_lines($edit,$rol) if > ($subr eq "EDIT"); > $ref = $self->_concatenate_lines($ref ,$rol) if > ($subr eq "REF"); > $publ = $self->_concatenate_lines($publ,$rol) if > ($subr eq "PUBL"); > $refn = $self->_concatenate_lines($refn,$rol) if > ($subr eq "REFN"); > ... > } > >>From my script, I did: > > ($struc->annotation->get_Annotations('reference'))[0]->authors > ($struc->annotation->get_Annotations('reference'))[0]->title > > or > > my $hash_ref=($struc->annotation->get_Annotations('reference'))[0]->hash_tree > for my $key (keys %{$hash_ref}) { > print $key,": ",$hash_ref->{$key},"\n"; > } > > Any plan to include a code chopping 'PMID' out? > Or did I miss something? > > Cheers, > Sung > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From dan.kortschak at adelaide.edu.au Mon Dec 21 22:24:04 2009 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Tue, 22 Dec 2009 13:54:04 +1030 Subject: [Bioperl-l] call for help and comments on module Message-ID: <1261452244.9520.86.camel@zoidberg.mbs.adelaide.edu.au> Hi, I've been working on a Bio::Tools::Run module to handle the bowtie rapid alignment tool (and associated tools): Bio::Tools::Run::Bowtie (in bioperl-run tree). I have 90% of what I want included in the module and would like some advice from more experienced bioperlers. Feedback on approach is also welcomed (this is my first significant wrapper, and after a long gap from writing module, so I am rusty). The module has ended up being significantly more complicated than I had hoped. There are a few issues I'm having, so I apologise for the list: 1. Informal tests run correctly (outside the t/ tree and Test harness), but formal Test harness tests fail for reasons I cannot understand. (The module is still lacking a lot of tests, but since things were failing in the harness I have placed them as a lower priority and have been working to my micro-script tests - yes, bad form. 2. I am having a big problem with IPC::Run for one of the executables (the module can call 5 different excutables for 7 commands), bowtie-maptool (module command 'map'). All the other commands tested (this excludes bowtie-maqconvert [convert command]) work fine, but maptool fails with an illegal seek - presumably due to the redirection handling? I have no idea how to resolve this, so help would be greatly appreciated (a small script that demonstrates the use that results in the failure is below). There will be provision for returning a Bio::Assembly::IO object through samtools in the finished module, but currently the Bio::Assembly::IO::sam builder doesn't like what bowtie can provide. Thanks for any help. Dan #!/usr/bin/perl use strict; use warnings; use Bio::Tools::Run::Bowtie; # These files are in the bioperl-run t/data/ tree my $rdq = '/usr/local/src/bioperl-run/t/data/bowtie/reads/e_coli_1000.fq'; my $refseq = '/usr/local/src/bioperl-run/t/data/bowtie/indexes/e_coli'; my $bowtiefac = Bio::Tools::Run::Bowtie->new( -command => 'single', -max_seed_mismatches => 2, -seed_length => 28, -max_qual_mismatch => 70, -sam_format => 0 ); my $align = $bowtiefac->run($rdq,$refseq); # this runs fine my $bowtiemap = Bio::Tools::Run::Bowtie->new( -command => 'map' ); my $map = $bowtiemap->run($align); # throws Illegal seek print "$map\n"; open (IN,$map); my $lines =(my @lines)= ; print @lines; print "\n\n$lines\n"; close IN; From maj at fortinbras.us Tue Dec 22 00:19:35 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 22 Dec 2009 00:19:35 -0500 Subject: [Bioperl-l] call for help and comments on module In-Reply-To: <1261452244.9520.86.camel@zoidberg.mbs.adelaide.edu.au> References: <1261452244.9520.86.camel@zoidberg.mbs.adelaide.edu.au> Message-ID: Hey Dan, It looks like if the outfile isn't specified on the commandline for maptool, then the align is written to stdout. So, you could try this workaround in in Bowtie/Config.pm: our %command_files = ( 'single' => [qw( ind seq #out )], 'paired' => [qw( ind seq seq2 #out )], 'crossbow' => [qw( ind seq #out )], 'build' => [qw( ref out )], 'inspect' => [qw( ind >#out )], 'convert' => [qw( bwt out bfa )], - 'map' => [qw( bwt #out )] + 'map' => [qw( bwt >#out )] ); which should be transparent to the user. If this works, then there is probably something funky going on with IPC::Run + maptool; if it doesn't, then the funkiness is prob. in my code. I notice, however, that both bowtie-maptool and bowtie-maqconvert have been removed from the 0.12.0-beta release (http://bowtie-bio.sourceforge.net/index.shtml)... cheers MAJ ----- Original Message ----- From: "Dan Kortschak" To: Sent: Monday, December 21, 2009 10:24 PM Subject: [Bioperl-l] call for help and comments on module > Hi, > > I've been working on a Bio::Tools::Run module to handle the bowtie rapid > alignment tool (and associated tools): Bio::Tools::Run::Bowtie (in > bioperl-run tree). > > I have 90% of what I want included in the module and would like some > advice from more experienced bioperlers. Feedback on approach is also > welcomed (this is my first significant wrapper, and after a long gap > from writing module, so I am rusty). The module has ended up being > significantly more complicated than I had hoped. > > There are a few issues I'm having, so I apologise for the list: > > 1. Informal tests run correctly (outside the t/ tree and Test > harness), but formal Test harness tests fail for reasons I > cannot understand. (The module is still lacking a lot of tests, > but since things were failing in the harness I have placed them > as a lower priority and have been working to my micro-script > tests - yes, bad form. > 2. I am having a big problem with IPC::Run for one of the > executables (the module can call 5 different excutables for 7 > commands), bowtie-maptool (module command 'map'). All the other > commands tested (this excludes bowtie-maqconvert [convert > command]) work fine, but maptool fails with an illegal seek - > presumably due to the redirection handling? I have no idea how > to resolve this, so help would be greatly appreciated (a small > script that demonstrates the use that results in the failure is > below). > > There will be provision for returning a Bio::Assembly::IO object through > samtools in the finished module, but currently the > Bio::Assembly::IO::sam builder doesn't like what bowtie can provide. > > Thanks for any help. > Dan > > > #!/usr/bin/perl > > use strict; > use warnings; > > use Bio::Tools::Run::Bowtie; > > # These files are in the bioperl-run t/data/ tree > my $rdq = '/usr/local/src/bioperl-run/t/data/bowtie/reads/e_coli_1000.fq'; > my $refseq = '/usr/local/src/bioperl-run/t/data/bowtie/indexes/e_coli'; > > my $bowtiefac = Bio::Tools::Run::Bowtie->new( > -command => 'single', > -max_seed_mismatches => 2, > -seed_length => 28, > -max_qual_mismatch => 70, > -sam_format => 0 > ); > > my $align = $bowtiefac->run($rdq,$refseq); # this runs fine > > my $bowtiemap = Bio::Tools::Run::Bowtie->new( > -command => 'map' > ); > > my $map = $bowtiemap->run($align); # throws Illegal seek > > print "$map\n"; > > open (IN,$map); > my $lines =(my @lines)= ; > print @lines; > print "\n\n$lines\n"; > close IN; > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From dan.kortschak at adelaide.edu.au Tue Dec 22 00:51:30 2009 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Tue, 22 Dec 2009 16:21:30 +1030 Subject: [Bioperl-l] call for help and comments on module In-Reply-To: References: <1261452244.9520.86.camel@zoidberg.mbs.adelaide.edu.au> Message-ID: <1261461090.4411.13.camel@epistle> Hi Mark, maptool either outputs to stdout or a specified file - I chose to use a specified file and run it that way, but I've tried the redirect a you suggest, with the same failure result. I think it's a strangeness of maptool (which may well be a reason for it being dropped - also note the maptool output doesn't seem reasonable for the test data provided even when run from the command line). It's probably a result of difficult interaction between IPC::Run and maptool. Any funkiness in your code is not likely to be a cause as I've deeply analysed what is being passed to IPC::Run, and I've quite extensively modified the IPC run handling method from your code to take into account the differences between a single executable with many commands as the base code managed from a cluster of executables each taking a small subset of different filespecs as bowtie needs. My funkiness will undoubtedly swamp yours. Resolution: Will drop bowtie-maptool from module. (Should test maqconvert - if it fails, this will be dropped also unless someone asks otherwise). When the module copes with 0.11.* properly I'll start thinking about 0.12.* which has colourspace handling to deal with. cheers Dan On Tue, 2009-12-22 at 00:19 -0500, Mark A. Jensen wrote: > Hey Dan, > It looks like if the outfile isn't specified on the commandline for > maptool, then the align is written to stdout. So, you could > try this workaround in in Bowtie/Config.pm: > > our %command_files = ( > 'single' => [qw( ind seq #out )], > 'paired' => [qw( ind seq seq2 #out )], > 'crossbow' => [qw( ind seq #out )], > 'build' => [qw( ref out )], > 'inspect' => [qw( ind >#out )], > 'convert' => [qw( bwt out bfa )], > - 'map' => [qw( bwt #out )] > + 'map' => [qw( bwt >#out )] > ); > > which should be transparent to the user. If this works, then > there is probably something funky going on with IPC::Run > + maptool; if it doesn't, then the funkiness is prob. in my code. > > I notice, however, that both bowtie-maptool and bowtie-maqconvert > have been removed from the 0.12.0-beta release > (http://bowtie-bio.sourceforge.net/index.shtml)... > > cheers MAJ From lovebaby39 at gmail.com Wed Dec 23 05:48:55 2009 From: lovebaby39 at gmail.com (Hsueh) Date: Wed, 23 Dec 2009 18:48:55 +0800 Subject: [Bioperl-l] About bioperl issue: get string In-Reply-To: <15F92119-7625-4491-899A-0D49CE1BC861@sbc.su.se> References: <5F281DC3E4514B3AAA8881169B240227@SHAPC> <107080B6-BC05-470C-B426-5DB69BD574C1@sbc.su.se> <9DEC7152C11A4F00B2F919B653E6D572@SHAPC> <15F92119-7625-4491-899A-0D49CE1BC861@sbc.su.se> Message-ID: <52CDD8F61DDC48B9BBADD020EF18E9E0@SHAPC> Dear all I use "$hit_u->name" to get "gnl|uv|Z46234.1:664-3444", but I don't know how to get "P.pastoris DNA for pPIC9K expression vector". while (my $result_u = $blast_report_u-> next_result ) { while (my $hit_u = $result_u->next_hit()){ while (my $hsp_u = $hit_u->next_hsp()){ $hit_u->name; $hsp_u->evalue; $hsp_u->score; } } } I will appreciate if you could tell me how to do it. P.S. How can I download the BioPerl's Manual? (BioPerl's Manual download link?) The flow is BLAST result: ------------------------------------------------------------------------------------------------------------------------------------- BLASTN 2.2.16 [Mar-25-2007] Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. Query= (458 letters) Database: UniVec (build 4.0) 2416 sequences; 597,480 total letters Searching..................................................done Score E Sequences producing significant alignments: (bits) Value gnl|uv|Z46234.1:664-3444 P.pastoris DNA for pPIC9K expression ve... 26 3.1 gnl|uv|U89673.1:863-1946 Cloning vector pIRES1neo 26 3.1 gnl|uv|U13843.1:1887-9923 pBPV cloning vector 26 3.1 >gnl|uv|Z46234.1:664-3444 P.pastoris DNA for pPIC9K expression vector Length = 2781 Score = 26.3 bits (13), Expect = 3.1 Identities = 13/13 (100%) Strand = Plus / Plus Query: 352 tactaccgccatt 364 ||||||||||||| Sbjct: 2209 tactaccgccatt 2221 ------------------------------------------------------------------------------------------------------------------------------------- Reginald Hsueh From hrh at fmi.ch Wed Dec 23 10:14:06 2009 From: hrh at fmi.ch (Hotz, Hans-Rudolf) Date: Wed, 23 Dec 2009 16:14:06 +0100 Subject: [Bioperl-l] About bioperl issue: get string In-Reply-To: <52CDD8F61DDC48B9BBADD020EF18E9E0@SHAPC> Message-ID: Hi Assuming you are using "SearchIO", try: $hit_u->description for more details see: http://www.bioperl.org/wiki/HOWTO:SearchIO Regards, Hans On 12/23/09 11:48 AM, "Hsueh" wrote: > Dear all > > I use "$hit_u->name" to get "gnl|uv|Z46234.1:664-3444", but I don't know how > to get "P.pastoris DNA for pPIC9K expression vector". > > while (my $result_u = $blast_report_u-> next_result ) { > while (my $hit_u = $result_u->next_hit()){ > while (my $hsp_u = $hit_u->next_hsp()){ > $hit_u->name; > $hsp_u->evalue; > $hsp_u->score; > } > } > } > > I will appreciate if you could tell me how to do it. > > P.S. How can I download the BioPerl's Manual? (BioPerl's Manual download > link?) > > > > The flow is BLAST result: > ------------------------------------------------------------------------------ > ------------------------------------------------------- > BLASTN 2.2.16 [Mar-25-2007] > Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, > Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), > "Gapped BLAST and PSI-BLAST: a new generation of protein database search > programs", Nucleic Acids Res. 25:3389-3402. > Query= > (458 letters) > > Database: UniVec (build 4.0) > 2416 sequences; 597,480 total letters > Searching..................................................done > > Score E > Sequences producing significant alignments: > (bits) Value > > gnl|uv|Z46234.1:664-3444 P.pastoris DNA for pPIC9K expression ve... > 26 3.1 > gnl|uv|U89673.1:863-1946 Cloning vector pIRES1neo > 26 3.1 > gnl|uv|U13843.1:1887-9923 pBPV cloning vector > 26 3.1 > >> gnl|uv|Z46234.1:664-3444 P.pastoris DNA for pPIC9K expression vector > Length = 2781 > > Score = 26.3 bits (13), Expect = 3.1 > Identities = 13/13 (100%) > Strand = Plus / Plus > > Query: 352 tactaccgccatt 364 > ||||||||||||| > Sbjct: 2209 tactaccgccatt 2221 > ------------------------------------------------------------------------------ > ------------------------------------------------------- > > Reginald Hsueh > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From pkuonline at gmail.com Wed Dec 23 13:36:49 2009 From: pkuonline at gmail.com (pkuonline) Date: Wed, 23 Dec 2009 12:36:49 -0600 Subject: [Bioperl-l] PAML parser failed for PAML 4.3b and 4.1 Message-ID: <200912231236490784820@gmail.com> Hi Everyone, I used the latest Bioperl build, http://www.bioperl.org/DIST/nightly_builds/bioperl-live.tar.gz and tried to parse CODEML result. I searched the mail list and found current PAML parser is compatible with PAML 4.3a, http://lists.open-bio.org/pipermail/bioperl-l/2009-November/031602.html. However, recently, Ziheng Yang updates his PAML to 4.3b. I found the parser does not work. More strangely, I tested it on the old PAML 4.1 result and also failed. I attached my CODEML outputs here to see whether you guys have some idea. Many thanks ahead! Best regards, ------------------------------------------------------------- Yong Zhang Ph.D, Research Scholar Manyuan Long's Lab University of Chicago -------------- next part -------------- A non-text attachment was scrubbed... Name: rst4.1 Type: application/octet-stream Size: 60616 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: mlc4.1 Type: application/octet-stream Size: 11635 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: mlc4.3b Type: application/octet-stream Size: 11330 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: rst4.3b Type: application/octet-stream Size: 60616 bytes Desc: not available URL: From cjfields at illinois.edu Wed Dec 23 16:19:48 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 23 Dec 2009 15:19:48 -0600 Subject: [Bioperl-l] PAML parser failed for PAML 4.3b and 4.1 In-Reply-To: <200912231236490784820@gmail.com> References: <200912231236490784820@gmail.com> Message-ID: Well, not completely unexpected, but very frustrating nonetheless. Changes to PAML output have broken in just about every PAML parser revision. Not sure when this will be addressed unfortunately, my hope is sooner than later. Can you file a bioperl bug report for this? It's the best place to keep track. http://bugzilla.open-bio.org/ chris On Dec 23, 2009, at 12:36 PM, pkuonline wrote: > Hi Everyone, > > I used the latest Bioperl build, http://www.bioperl.org/DIST/nightly_builds/bioperl-live.tar.gz and tried to parse CODEML result. I searched the mail list and found current PAML parser is compatible with PAML 4.3a, http://lists.open-bio.org/pipermail/bioperl-l/2009-November/031602.html. However, recently, Ziheng Yang updates his PAML to 4.3b. I found the parser does not work. More strangely, I tested it on the old PAML 4.1 result and also failed. > > I attached my CODEML outputs here to see whether you guys have some idea. > > Many thanks ahead! > > Best regards, > ------------------------------------------------------------- > Yong Zhang > Ph.D, Research Scholar > Manyuan Long's Lab > University of Chicago_______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From pkuonline at gmail.com Wed Dec 23 17:45:54 2009 From: pkuonline at gmail.com (pkuonline) Date: Wed, 23 Dec 2009 16:45:54 -0600 Subject: [Bioperl-l] PAML parser failed for PAML 4.3b and 4.1 References: <200912231236490784820@gmail.com>, Message-ID: <200912231645536094087@gmail.com> Hi Chris, Thanks for your reply and I just submitted this bug to bugzilla. Have a nice holiday! ------------------------------------------------------------- Yong Zhang Ph.D, Research Scholar Manyuan Long's Lab University of Chicago >------------------------------------------------------------- >From: Chris Fields >Time: 2009-12-23 15:19:50 >To: pkuonline bioperl-l >Subject: Re: [Bioperl-l] PAML parser failed for PAML 4.3b and 4.1 >Well, not completely unexpected, but very frustrating nonetheless. Changes to PAML output have broken in just about every PAML parser revision. Not sure when this will be addressed unfortunately, my hope is sooner than later. > >Can you file a bioperl bug report for this? It's the best place to keep track. > >http://bugzilla.open-bio.org/ > >chris > >On Dec 23, 2009, at 12:36 PM, pkuonline wrote: > >> Hi Everyone, >> >> I used the latest Bioperl build, http://www.bioperl.org/DIST/nightly_builds/bioperl-live.tar.gz and tried to parse CODEML result. I searched the mail list and found current PAML parser is compatible with PAML 4.3a, http://lists.open-bio.org/pipermail/bioperl-l/2009-November/031602.html. However, recently, Ziheng Yang updates his PAML to 4.3b. I found the parser does not work. More strangely, I tested it on the old PAML 4.1 result and also failed. >> >> I attached my CODEML outputs here to see whether you guys have some idea. >> >> Many thanks ahead! >> >> Best regards, >> ------------------------------------------------------------- >> Yong Zhang >> Ph.D, Research Scholar >> Manyuan Long's Lab >> University of Chicago_______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From David.Messina at sbc.su.se Wed Dec 23 18:23:44 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 24 Dec 2009 00:23:44 +0100 Subject: [Bioperl-l] PAML parser failed for PAML 4.3b and 4.1 In-Reply-To: <200912231645536094087@gmail.com> References: <200912231236490784820@gmail.com>, <200912231645536094087@gmail.com> Message-ID: <08E748F4-1398-4543-AB77-0640441BC323@sbc.su.se> Hi Yong, Could you attach your codeml output to the bug report, too? I'll take a look at this as soon as I can. Dave From maj at fortinbras.us Thu Dec 24 00:47:10 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 24 Dec 2009 00:47:10 -0500 Subject: [Bioperl-l] PAML parser failed for PAML 4.3b and 4.1 In-Reply-To: <200912231645536094087@gmail.com> References: <200912231236490784820@gmail.com>, <200912231645536094087@gmail.com> Message-ID: <2DF45CDC2BE44A85ADCD865A98CD13D6@NewLife> Yong-- say 'ni hao' to Manyuan for me --- cheers MAJ ----- Original Message ----- From: "pkuonline" To: "Chris Fields" Cc: "bioperl-l" Sent: Wednesday, December 23, 2009 5:45 PM Subject: Re: [Bioperl-l] PAML parser failed for PAML 4.3b and 4.1 > Hi Chris, > > Thanks for your reply and I just submitted this bug to bugzilla. > > Have a nice holiday! > ------------------------------------------------------------- > Yong Zhang > Ph.D, Research Scholar > Manyuan Long's Lab > University of Chicago > >>------------------------------------------------------------- >>From: Chris Fields >>Time: 2009-12-23 15:19:50 >>To: pkuonline bioperl-l >>Subject: Re: [Bioperl-l] PAML parser failed for PAML 4.3b and 4.1 > >>Well, not completely unexpected, but very frustrating nonetheless. Changes to >>PAML output have broken in just about every PAML parser revision. Not sure >>when this will be addressed unfortunately, my hope is sooner than later. >> >>Can you file a bioperl bug report for this? It's the best place to keep >>track. >> >>http://bugzilla.open-bio.org/ >> >>chris >> >>On Dec 23, 2009, at 12:36 PM, pkuonline wrote: >> >>> Hi Everyone, >>> >>> I used the latest Bioperl build, >>> http://www.bioperl.org/DIST/nightly_builds/bioperl-live.tar.gz and tried to >>> parse CODEML result. I searched the mail list and found current PAML parser >>> is compatible with PAML 4.3a, >>> http://lists.open-bio.org/pipermail/bioperl-l/2009-November/031602.html. >>> However, recently, Ziheng Yang updates his PAML to 4.3b. I found the parser >>> does not work. More strangely, I tested it on the old PAML 4.1 result and >>> also failed. >>> >>> I attached my CODEML outputs here to see whether you guys have some idea. >>> >>> Many thanks ahead! >>> >>> Best regards, >>> ------------------------------------------------------------- >>> Yong Zhang >>> Ph.D, Research Scholar >>> Manyuan Long's Lab >>> University of >>> Chicago_______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > -------------------------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bhakti.dwivedi at gmail.com Fri Dec 25 21:46:51 2009 From: bhakti.dwivedi at gmail.com (Bhakti Dwivedi) Date: Fri, 25 Dec 2009 21:46:51 -0500 Subject: [Bioperl-l] how to retrieve organism name from accession number? Message-ID: Hi, Does anyone know how to retrieve the "Source" or the "Species name" given the accession number using Bioperl. I have these 30,000 accession numbers for which I need to get the source organisms. Any kind of help will be appreciated. Thanks BD From maj at fortinbras.us Fri Dec 25 22:52:10 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 25 Dec 2009 22:52:10 -0500 Subject: [Bioperl-l] how to retrieve organism name from accession number? In-Reply-To: References: Message-ID: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife> Bhakti, The following example (using EUtilities) may serve your purpose: use Bio::DB::EUtilities; my (%taxa, @taxa); my (%names, %idmap); # these are protein ids; nuc ids will work by changing -dbfrom => 'nucleotide', # (probably) my @ids = qw(1621261 89318838 68536103 20807972 730439); my $factory = Bio::DB::EUtilities->new(-eutil => 'elink', -db => 'taxonomy', -dbfrom => 'protein', -correspondence => 1, -id => \@ids); # iterate through the LinkSet objects while (my $ds = $factory->next_LinkSet) { $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0] } @taxa = @taxa{@ids}; $factory = Bio::DB::EUtilities->new(-eutil => 'esummary', -db => 'taxonomy', -id => \@taxa ); while (local $_ = $factory->next_DocSum) { $names{($_->get_contents_by_name('TaxId'))[0]} = ($_->get_contents_by_name('ScientificName'))[0]; } foreach (@ids) { $idmap{$_} = $names{$taxa{$_}}; } # %idmap is # 1621261 => 'Mycobacterium tuberculosis H37Rv' # 20807972 => 'Thermoanaerobacter tengcongensis MB4' # 68536103 => 'Corynebacterium jeikeium K411' # 730439 => 'Bacillus caldolyticus' # 89318838 => undef (this record has been removed from the db) 1; You probably will need to break up your 30000 into chunks (say, 1000-3000 each), and do the above on each chunk with a sleep 3; or so separating the queries. MAJ ----- Original Message ----- From: "Bhakti Dwivedi" To: Sent: Friday, December 25, 2009 9:46 PM Subject: [Bioperl-l] how to retrieve organism name from accession number? > Hi, > > Does anyone know how to retrieve the "Source" or the "Species name" given > the accession number using Bioperl. I have these 30,000 accession numbers > for which I need to get the source organisms. Any kind of help will be > appreciated. > > Thanks > > BD > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Sat Dec 26 06:47:29 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 26 Dec 2009 05:47:29 -0600 Subject: [Bioperl-l] how to retrieve organism name from accession number? In-Reply-To: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife> References: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife> Message-ID: On Dec 25, 2009, at 9:52 PM, Mark A. Jensen wrote: > Bhakti, > The following example (using EUtilities) may serve your purpose: > > use Bio::DB::EUtilities; > > ... > You probably will need to break up your 30000 into chunks > (say, 1000-3000 each), and do the above on each chunk with a > > sleep 3; > > or so separating the queries. > MAJ The 'sleep 3' is built-in and now (on main trunk) matches NCBI's current spec of 3 queries/sec. chris From arpm9 at charter.net Sun Dec 27 16:42:09 2009 From: arpm9 at charter.net (arpm9) Date: Sun, 27 Dec 2009 16:42:09 -0500 Subject: [Bioperl-l] Should Bio::Tools::BPlite be deprecated? In-Reply-To: 4533A8D3.90709@sendu.me.uk Message-ID: <867A36FEE0244EF2950108C42BD2BE58@paulb0d5af35b9> hi chris, I was trying to make sense of this backpacking lite and just simply wanted to view the light...and got nowhere and very frustrated...please help if you can...or whoever can...thanks Pm From pengyu.ut at gmail.com Tue Dec 29 11:08:09 2009 From: pengyu.ut at gmail.com (Peng Yu) Date: Tue, 29 Dec 2009 10:08:09 -0600 Subject: [Bioperl-l] Comparison between bioperl and biopython? Message-ID: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com> May I ask somebody who are versitile in both bioperl and biopython comment on the pros and cons of bioperl and biopython? I'm sending this email to both bioperl and biopython mailing lists. But I hope that it will not result in any contention. I assume that the functionality between bioperl or biopython is the same, i.e., tasks can be done in bioperl can be done biopython and vice versa, as both libraries have been out there over 10 years. Please correct me if my understanding is not true. Given that a task that can be done with either bioperl or biopython, I, in particularly, want to know how long it will take to write the code for the task in bioperl and biopython, with the same readability requirement (see below) and the assumption that users have the same fluency in perl and python. python is claimed to be good for maintainability. But perl is criticized for there-are-many-ways-for-a-given-task. Since there are multiple ways in perl, let us assume that we always use perl in a readable way. From jason at bioperl.org Tue Dec 29 11:49:20 2009 From: jason at bioperl.org (Jason Stajich) Date: Tue, 29 Dec 2009 08:49:20 -0800 Subject: [Bioperl-l] Comparison between bioperl and biopython? In-Reply-To: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com> References: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com> Message-ID: <2B85EF86-8A84-491B-8C33-7EC16CCB8CBC@bioperl.org> Are you asking for the purposes of choosing a toolkit for your work or just curious about the advantages/disadvantages of language choice? -jason On Dec 29, 2009, at 8:08 AM, Peng Yu wrote: > May I ask somebody who are versitile in both bioperl and biopython > comment on the pros and cons of bioperl and biopython? I'm sending > this email to both bioperl and biopython mailing lists. But I hope > that it will not result in any contention. > > I assume that the functionality between bioperl or biopython is the > same, i.e., tasks can be done in bioperl can be done biopython and > vice versa, as both libraries have been out there over 10 years. > Please correct me if my understanding is not true. > > Given that a task that can be done with either bioperl or biopython, > I, in particularly, want to know how long it will take to write the > code for the task in bioperl and biopython, with the same readability > requirement (see below) and the assumption that users have the same > fluency in perl and python. > > python is claimed to be good for maintainability. But perl is > criticized for there-are-many-ways-for-a-given-task. Since there are > multiple ways in perl, let us assume that we always use perl in a > readable way. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ From ak at ebi.ac.uk Tue Dec 29 11:57:18 2009 From: ak at ebi.ac.uk (Andreas =?iso-8859-1?B?S+Ro5HJp?=) Date: Tue, 29 Dec 2009 16:57:18 +0000 Subject: [Bioperl-l] Comparison between bioperl and biopython? In-Reply-To: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com> References: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com> Message-ID: <20091229165718.GB30356@quux.windows.ebi.ac.uk> On Tue, Dec 29, 2009 at 10:08:09AM -0600, Peng Yu wrote: > May I ask somebody who are versitile in both bioperl and biopython > comment on the pros and cons of bioperl and biopython? I'm sending > this email to both bioperl and biopython mailing lists. But I hope > that it will not result in any contention. > > I assume that the functionality between bioperl or biopython is the > same, i.e., tasks can be done in bioperl can be done biopython and > vice versa, as both libraries have been out there over 10 years. > Please correct me if my understanding is not true. > > Given that a task that can be done with either bioperl or biopython, > I, in particularly, want to know how long it will take to write the > code for the task in bioperl and biopython, with the same readability > requirement (see below) and the assumption that users have the same > fluency in perl and python. > > python is claimed to be good for maintainability. But perl is > criticized for there-are-many-ways-for-a-given-task. Since there are > multiple ways in perl, let us assume that we always use perl in a > readable way. Assuming, as you do, that the functionality of BioPerl and BioPython is the same: Which of the two programming languages are you (or your team) most proficient in? Use that language. Regards, Andreas -- Andreas K?h?ri, Ensembl Software Developer European Bioinformatics Institute (EMBL-EBI) Wellcome Trust Genome Campus, Hinxton Cambridge CB10 1SD, United Kingdom From sdavis2 at mail.nih.gov Tue Dec 29 12:03:40 2009 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Tue, 29 Dec 2009 12:03:40 -0500 Subject: [Bioperl-l] [Biopython] Comparison between bioperl and biopython? In-Reply-To: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com> References: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com> Message-ID: <264855a00912290903m213d7cc4l607e8fa0bad55571@mail.gmail.com> On Tue, Dec 29, 2009 at 11:08 AM, Peng Yu wrote: > May I ask somebody who are versitile in both bioperl and biopython > comment on the pros and cons of bioperl and biopython? I'm sending > this email to both bioperl and biopython mailing lists. But I hope > that it will not result in any contention. > > I assume that the functionality between bioperl or biopython is the > same, i.e., tasks can be done in bioperl can be done biopython and > vice versa, as both libraries have been out there over 10 years. > Please correct me if my understanding is not true. The two projects have similar goals, but saying that the functionality is the same would be an extreme oversimplification. You will need to define what you want to do and then check to see what the two projects have to offer. This will, in general, require perusing the websites for both projects as well as the relevant documentation. > Given that a task that can be done with either bioperl or biopython, > I, in particularly, want to know how long it will take to write the > code for the task in bioperl and biopython, with the same readability > requirement (see below) and the assumption that users have the same > fluency in perl and python. Again, you will want to define the task(s) to be accomplished and then weigh the pros and cons of each project combined with local expertise. If you don't know what you want to do, then you can certainly read some examples on the websites and see which project strikes you as a "winner" for you. > python is claimed to be good for maintainability. But perl is > criticized for there-are-many-ways-for-a-given-task. Since there are > multiple ways in perl, let us assume that we always use perl in a > readable way. These two statements are generalizations that provide little insight into the strengths or weaknesses of the languages. In other words, one can write good or bad code in both languages. Hope that helps. Sean From wenzhiwang1983 at yahoo.com.cn Tue Dec 29 13:30:02 2009 From: wenzhiwang1983 at yahoo.com.cn (WangWenzhi) Date: Wed, 30 Dec 2009 02:30:02 +0800 (CST) Subject: [Bioperl-l] Comparison between bioperl and biopython? In-Reply-To: <2B85EF86-8A84-491B-8C33-7EC16CCB8CBC@bioperl.org> Message-ID: <658770.25534.qm@web15204.mail.cnb.yahoo.com> Dear Jason, Plink is a very useful program in the population genetics, especially in the Genome-Wide SNP scan era. Is there any plan to add the Plink (ped or tped) format to Bio::PopGen::IO? Thanks. Wenzhi Wang State Key Laboratory of Genetic Resources and Evolution Kunming Institute of Zoology, Chinese Academy of Sciences Kunming, Yunnan 650223 P. R. China Tel: 86 871 5198 993 Fax: 86 871 5195 430 E-mail: wenzhiwang1983 at yahoo.com.cn ___________________________________________________________ ?????????????????????????????????? http://card.mail.cn.yahoo.com/ From pengyu.ut at gmail.com Tue Dec 29 13:58:59 2009 From: pengyu.ut at gmail.com (Peng Yu) Date: Tue, 29 Dec 2009 12:58:59 -0600 Subject: [Bioperl-l] Comparison between bioperl and biopython? In-Reply-To: <2B85EF86-8A84-491B-8C33-7EC16CCB8CBC@bioperl.org> References: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com> <2B85EF86-8A84-491B-8C33-7EC16CCB8CBC@bioperl.org> Message-ID: <366c6f340912291058t6c601e57re0c35e69fe81e09d@mail.gmail.com> To choose a toolkit for my work. On Tue, Dec 29, 2009 at 10:49 AM, Jason Stajich wrote: > Are you asking for the purposes of choosing a toolkit for your work or just > curious about the advantages/disadvantages of language choice? > > -jason > On Dec 29, 2009, at 8:08 AM, Peng Yu wrote: > >> May I ask somebody who are versitile in both bioperl and biopython >> comment on the pros and cons of bioperl and biopython? I'm sending >> this email to both bioperl and biopython mailing lists. But I hope >> that it will not result in any contention. >> >> I assume that the functionality between bioperl or biopython is the >> same, i.e., tasks can be done in bioperl can be done biopython and >> vice versa, as both libraries have been out there over 10 years. >> Please correct me if my understanding is not true. >> >> Given that a task that can be done with either bioperl or biopython, >> I, in particularly, want to know how long it will take to write the >> code for the task in bioperl and biopython, with the same readability >> requirement (see below) and the assumption that users have the same >> fluency in perl and python. >> >> python is claimed to be good for maintainability. But perl is >> criticized for there-are-many-ways-for-a-given-task. Since there are >> multiple ways in perl, let us assume that we always use perl in a >> readable way. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > http://fungalgenomes.org/ > > From pengyu.ut at gmail.com Tue Dec 29 14:15:14 2009 From: pengyu.ut at gmail.com (Peng Yu) Date: Tue, 29 Dec 2009 13:15:14 -0600 Subject: [Bioperl-l] [Biopython] Comparison between bioperl and biopython? In-Reply-To: <264855a00912290903m213d7cc4l607e8fa0bad55571@mail.gmail.com> References: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com> <264855a00912290903m213d7cc4l607e8fa0bad55571@mail.gmail.com> Message-ID: <366c6f340912291115o58ba0b82kce74e18fecd833c8@mail.gmail.com> On Tue, Dec 29, 2009 at 11:03 AM, Sean Davis wrote: > On Tue, Dec 29, 2009 at 11:08 AM, Peng Yu wrote: >> May I ask somebody who are versitile in both bioperl and biopython >> comment on the pros and cons of bioperl and biopython? I'm sending >> this email to both bioperl and biopython mailing lists. But I hope >> that it will not result in any contention. >> >> I assume that the functionality between bioperl or biopython is the >> same, i.e., tasks can be done in bioperl can be done biopython and >> vice versa, as both libraries have been out there over 10 years. >> Please correct me if my understanding is not true. > > The two projects have similar goals, but saying that the functionality > is the same would be an extreme oversimplification. ?You will need to > define what you want to do and then check to see what the two projects > have to offer. ?This will, in general, require perusing the websites > for both projects as well as the relevant documentation. According to your experience, are there some tasks that are easier with one than with another? >> Given that a task that can be done with either bioperl or biopython, >> I, in particularly, want to know how long it will take to write the >> code for the task in bioperl and biopython, with the same readability >> requirement (see below) and the assumption that users have the same >> fluency in perl and python. > > Again, you will want to define the task(s) to be accomplished and then > weigh the pros and cons of each project combined with local expertise. > ?If you don't know what you want to do, then you can certainly read > some examples on the websites and see which project strikes you as a > "winner" for you. > >> python is claimed to be good for maintainability. But perl is >> criticized for there-are-many-ways-for-a-given-task. Since there are >> multiple ways in perl, let us assume that we always use perl in a >> readable way. > > These two statements are generalizations that provide little insight > into the strengths or weaknesses of the languages. ?In other words, > one can write good or bad code in both languages. > > Hope that helps. > > Sean > From alperyilmaz at gmail.com Tue Dec 29 14:36:03 2009 From: alperyilmaz at gmail.com (Alper Yilmaz) Date: Tue, 29 Dec 2009 14:36:03 -0500 Subject: [Bioperl-l] Bio::TreeIO, Bio::Tree::Draw::Cladogram and phyloxml issues.. Message-ID: Hello, I have a tree in phyloxml format, and am trying to draw a subtree by using a spefic node as the root. I used Bio::Tree::Draw::Cladogram for drawing and encountered some problems. When I use whole tree and draw it, everything is fine; but, when I pick a particular node and construct the subtree from that node's ancestor by using "my $subtree = Bio::Tree::Tree->new(-root => $new_root, -nodelete => 1);", Bio::Tree::Draw::Cladogram creates a faulty EPS file, which contains extra lines added in the middle of the file. For instance: . . . 72.0820393261372 126 moveto (OsIBCD006509) show 30 81.25 moveto 81.25 lineto lineto 48.5410196630686 120 moveto 30 120 lineto . . . Should read: 72.0820393261372 126 moveto (OsIBCD006509) show 48.5410196630686 120 moveto 30 120 lineto Also, I tried to write the subtree into a new phyloxml file first, then draw it. The code is shown as follows: my $savefile = "save.phyloxml"; my $treeout = Bio::TreeIO->new(-format =>'phyloxml', -file => ">$savefile"); $treeout->write_tree($subtree); my $tree2 = Bio::TreeIO->new(-format =>'phyloxml', -file => "save.phyloxml"); my $t1 = $tree2->next_tree; my $image_output = "test.eps"; my $obj1 = Bio::Tree::Draw::Cladogram->new(-tree => $t1, -top => 10, -bottom => 10,); $obj1->print(-file => $image_output); The generated phyloxml file, which is named save.phyloxml, has an additional new line between "" and "" at the end of the file. And this additional new line lead an error when doing the parsing(open file and draw eps). I removed the new line, manually, then Bio::Tree::Draw::Cladogram gave me the eps file successfully. Anyone knows how to fix these problems: 1- faulty eps file generation 2- additional newline character in phyloxml output Is it the problem about the way I create the subtree? The phyloxml file I used can be downloaded from: http://grassius.org/download/HSF.phyloxml Run this code with the phyloxml file to see newline character problem: http://pastebin.com/f87ee1ee Run this code with the phyloxml file to see faulty eps file problem: http://pastebin.com/fc4715a1 Alper Yilmaz Post-doctoral Researcher Plant Biotechnology Center The Ohio State University 1060 Carmack Rd Columbus, OH 43210 (614)688-4954 From pengyu.ut at gmail.com Tue Dec 29 16:32:17 2009 From: pengyu.ut at gmail.com (Peng Yu) Date: Tue, 29 Dec 2009 15:32:17 -0600 Subject: [Bioperl-l] Document missing on Core/Latest/modules.html Message-ID: <366c6f340912291332i257f8061r6fa277324137033b@mail.gmail.com> http://bioperl.org/Core/Latest/modules.html Many links if not all are broken on the above pages. Could somebody fix it? For example, on http://www.bioperl.org/wiki/HOWTOs/txt/Beginners.txt, I see the following error. There is currently no text in this page. You can search for this page title in other pages, search the related logs, or edit this page. From jason at bioperl.org Tue Dec 29 16:49:00 2009 From: jason at bioperl.org (Jason Stajich) Date: Tue, 29 Dec 2009 13:49:00 -0800 Subject: [Bioperl-l] Document missing on Core/Latest/modules.html In-Reply-To: <366c6f340912291332i257f8061r6fa277324137033b@mail.gmail.com> References: <366c6f340912291332i257f8061r6fa277324137033b@mail.gmail.com> Message-ID: That is an outdated URL I am not sure where you are linking it from. We can probably now disable all old '/Core' URLs. All documentation links are in the /wiki/ The beginner's howto is here for example http://bioperl.org/wiki/HOWTO:Beginners > http://www.bioperl.org/wiki/HOWTOs On Dec 29, 2009, at 1:32 PM, Peng Yu wrote: > http://bioperl.org/Core/Latest/modules.html > > Many links if not all are broken on the above pages. Could somebody > fix it? > > For example, on http://www.bioperl.org/wiki/HOWTOs/txt/Beginners.txt, > I see the following error. > > There is currently no text in this page. You can search for this page > title in other pages, search the related logs, or edit this page. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ From jason at bioperl.org Tue Dec 29 16:50:26 2009 From: jason at bioperl.org (Jason Stajich) Date: Tue, 29 Dec 2009 13:50:26 -0800 Subject: [Bioperl-l] Comparison between bioperl and biopython? In-Reply-To: <658770.25534.qm@web15204.mail.cnb.yahoo.com> References: <658770.25534.qm@web15204.mail.cnb.yahoo.com> Message-ID: yep - be great if someone were to write it. This being a volunteer project we welcome your contribution. No I don't specifically have plans to do it, but maybe you can give it a try or another population genetics interested bioperl user/developer? -jason On Dec 29, 2009, at 10:30 AM, WangWenzhi wrote: > Dear Jason, > > Plink is a very useful program in the population genetics, > especially in the Genome-Wide SNP scan era. Is there any plan to add > the Plink (ped or tped) format to Bio::PopGen::IO? > > Thanks. > > Wenzhi Wang > State Key Laboratory of Genetic Resources and Evolution > Kunming Institute of Zoology, Chinese Academy of Sciences > Kunming, Yunnan 650223 P. R. China > Tel: 86 871 5198 993 > Fax: 86 871 5195 430 > E-mail: wenzhiwang1983 at yahoo.com.cn > > > ___________________________________________________________ > ????????????????? > http://card.mail.cn.yahoo.com/ -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ From jason at bioperl.org Tue Dec 29 16:57:49 2009 From: jason at bioperl.org (Jason Stajich) Date: Tue, 29 Dec 2009 13:57:49 -0800 Subject: [Bioperl-l] [Biopython] Comparison between bioperl and biopython? In-Reply-To: <366c6f340912291115o58ba0b82kce74e18fecd833c8@mail.gmail.com> References: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com> <264855a00912290903m213d7cc4l607e8fa0bad55571@mail.gmail.com> <366c6f340912291115o58ba0b82kce74e18fecd833c8@mail.gmail.com> Message-ID: <02851B8A-E74E-453E-9725-6FA8F3995F82@bioperl.org> On Dec 29, 2009, at 11:15 AM, Peng Yu wrote: > On Tue, Dec 29, 2009 at 11:03 AM, Sean Davis > wrote: >> On Tue, Dec 29, 2009 at 11:08 AM, Peng Yu >> wrote: >>> May I ask somebody who are versitile in both bioperl and biopython >>> comment on the pros and cons of bioperl and biopython? I'm sending >>> this email to both bioperl and biopython mailing lists. But I hope >>> that it will not result in any contention. >>> >>> I assume that the functionality between bioperl or biopython is the >>> same, i.e., tasks can be done in bioperl can be done biopython and >>> vice versa, as both libraries have been out there over 10 years. >>> Please correct me if my understanding is not true. >> >> The two projects have similar goals, but saying that the >> functionality >> is the same would be an extreme oversimplification. You will need to >> define what you want to do and then check to see what the two >> projects >> have to offer. This will, in general, require perusing the websites >> for both projects as well as the relevant documentation. > > According to your experience, are there some tasks that are easier > with one than with another? As you have still failed to give much insight into the 'tasks' it is hard to give you a better answer. If there is a module or set of routines already written then yes one might be easier than the other. Otherwise it just depends on your strengths in the programming language. We discussed the strengths of the different toolkits briefly on the podcast last month. http://twit.tv/floss96 I echo Sean. Use whichever language you are a better programmer in. BioPerl is more mature in some facets than is BioPython, but BioPython has some components that are more heavily developed and supported than BioPerl (structures being one of those and interfacing that to pyMol would be a strength). I personally think the Gbrowse, Bio-Graphics, and Bio::DB::GFF/Bio::DB::SeqFeature::Store interface to Sequence databases and Features is a critical aspect of mining genomic data and features and use these heavily in my work, making BioPerl easy and powerful for my tasks. That and sequence and alignment parsing and reformatting. But there are comparable tools written in python with and without BioPython that you can also use so mainly it is about building up an expertise in a toolkit and going forward. The BioPerl faithful will probably say it is more useful toolkit to us, but we are of course a biased sample. Both projects can benefit from more users and developers contributing code and documentation so I would just jump in and give it a try if you are unsure which will be easier for you. > >>> Given that a task that can be done with either bioperl or biopython, >>> I, in particularly, want to know how long it will take to write the >>> code for the task in bioperl and biopython, with the same >>> readability >>> requirement (see below) and the assumption that users have the same >>> fluency in perl and python. >> >> Again, you will want to define the task(s) to be accomplished and >> then >> weigh the pros and cons of each project combined with local >> expertise. >> If you don't know what you want to do, then you can certainly read >> some examples on the websites and see which project strikes you as a >> "winner" for you. >> >>> python is claimed to be good for maintainability. But perl is >>> criticized for there-are-many-ways-for-a-given-task. Since there are >>> multiple ways in perl, let us assume that we always use perl in a >>> readable way. >> >> These two statements are generalizations that provide little insight >> into the strengths or weaknesses of the languages. In other words, >> one can write good or bad code in both languages. >> >> Hope that helps. >> >> Sean >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ From pengyu.ut at gmail.com Tue Dec 29 17:01:05 2009 From: pengyu.ut at gmail.com (Peng Yu) Date: Wed, 30 Dec 2009 16:01:05 +1800 Subject: [Bioperl-l] How to download the exon sequences, and the exon and CDS boundary for a RefSeq ID? Message-ID: <366c6f340912291401t3ff173fbrc44fe0d4078be148@mail.gmail.com> I see the following example. But it is not clear to me how to get the exon sequences. I also want to get the exon boundaries and associated CDS boundaries. Although, I can get the boundary information from ucsc table browser, but it would be convenient if I can get it in bioperl along with the sequence. Could somebody let me know how do it? http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/DB/RefSeq.html From sdavis2 at mail.nih.gov Tue Dec 29 17:13:30 2009 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Tue, 29 Dec 2009 17:13:30 -0500 Subject: [Bioperl-l] Document missing on Core/Latest/modules.html In-Reply-To: <366c6f340912291332i257f8061r6fa277324137033b@mail.gmail.com> References: <366c6f340912291332i257f8061r6fa277324137033b@mail.gmail.com> Message-ID: <264855a00912291413r7ce37e2h673dec7c2624db6@mail.gmail.com> On Tue, Dec 29, 2009 at 4:32 PM, Peng Yu wrote: > http://bioperl.org/Core/Latest/modules.html > > Many links if not all are broken on the above pages. Could somebody fix it? > > For example, on http://www.bioperl.org/wiki/HOWTOs/txt/Beginners.txt, > I see the following error. > > There is currently no text in this page. You can search for this page > title in other pages, search the related logs, or edit this page. It is unfortunate that the links are broken on that page. However, I believe that page is somewhat outdated, anyway. Here are the HOWTO pages: http://www.bioperl.org/wiki/HOWTOs Sean From pengyu.ut at gmail.com Tue Dec 29 17:21:16 2009 From: pengyu.ut at gmail.com (Peng Yu) Date: Wed, 30 Dec 2009 16:21:16 +1800 Subject: [Bioperl-l] Document missing on Core/Latest/modules.html In-Reply-To: References: <366c6f340912291332i257f8061r6fa277324137033b@mail.gmail.com> Message-ID: <366c6f340912291421m38bb8348oe6b224f29208f9f4@mail.gmail.com> On Wed, Dec 30, 2009 at 3:49 PM, Jason Stajich wrote: > That is an outdated URL I am not sure where you are linking it from. We can > probably now disable all old '/Core' URLs. I'm linked from here. http://www.bioperl.org/wiki/BioPerl_Tutorial Since those URLs are outdated. Could you please fix the links on the above link? > All documentation links are in the /wiki/ > > The beginner's howto is here for example > ?http://bioperl.org/wiki/HOWTO:Beginners > >> http://www.bioperl.org/wiki/HOWTOs > > > On Dec 29, 2009, at 1:32 PM, Peng Yu wrote: > >> http://bioperl.org/Core/Latest/modules.html >> >> Many links if not all are broken on the above pages. Could somebody fix >> it? >> >> For example, on http://www.bioperl.org/wiki/HOWTOs/txt/Beginners.txt, >> I see the following error. >> >> There is currently no text in this page. You can search for this page >> title in other pages, search the related logs, or edit this page. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > http://fungalgenomes.org/ > > From sdavis2 at mail.nih.gov Tue Dec 29 18:06:17 2009 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Tue, 29 Dec 2009 18:06:17 -0500 Subject: [Bioperl-l] How to download the exon sequences, and the exon and CDS boundary for a RefSeq ID? In-Reply-To: <366c6f340912291401t3ff173fbrc44fe0d4078be148@mail.gmail.com> References: <366c6f340912291401t3ff173fbrc44fe0d4078be148@mail.gmail.com> Message-ID: <264855a00912291506s13c32d5dg7b46f0cc34c20f94@mail.gmail.com> On Tue, Dec 29, 2009 at 5:01 PM, Peng Yu wrote: > I see the following example. But it is not clear to me how to get the > exon sequences. I also want to get the exon boundaries and associated > CDS boundaries. Although, I can get the boundary information from ucsc > table browser, but it would be convenient if I can get it in bioperl > along with the sequence. > > Could somebody let me know how do it? > > http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/DB/RefSeq.html Hi, Peng. There may be some confusion, as the UCSC database aligns RefSeq sequence to a genome to generate exon start and end coordinates. However, the RefSeq records retrieved by Bio::DB::RefSeq are not in genomic context and so do not have start and end locations on the genome. That is, if you want the starts and ends along the genome, that information is not available from the RefSeq record itself, I don't think. If that is what you need (genomic coordinates), you can download the information directly from UCSC, download flat files from NCBI mapview, or even from ensembl (using biomart, for instance). If you are looking for a bioperl-compliant way of doing this, look at the Ensembl Perl API. Sean From jkhilmer at gmail.com Tue Dec 29 14:55:18 2009 From: jkhilmer at gmail.com (Jonathan Hilmer) Date: Tue, 29 Dec 2009 12:55:18 -0700 Subject: [Bioperl-l] [Biopython] Comparison between bioperl and biopython? In-Reply-To: <366c6f340912291115o58ba0b82kce74e18fecd833c8@mail.gmail.com> References: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com> <264855a00912290903m213d7cc4l607e8fa0bad55571@mail.gmail.com> <366c6f340912291115o58ba0b82kce74e18fecd833c8@mail.gmail.com> Message-ID: <81277ce10912291155x6dde10ewe2055b9692d077c1@mail.gmail.com> Personally, I think that the differences between Python and Perl (although substantial) are not large enough to make the language itself the deciding factor. Instead, consider the larger community of software. I haven't yet found a situation in which Python cannot be applied: it can be used with R (statistics); lower-level code C or fortran; visualization software such as PyMol, Chimera, Blender, VTK; plotting with matplotlib; and scipy/numpy or sage, which provide innumerable benefits for computation, data-processing, etc. Although I don't claim to have a great deal of experience with Perl, I haven't seen the same integration with that language: I'm assuming it can be used with R and VTK (not sure about C or fortran?). For this reason, unless your work is highly targeted and you have no use programming language integration with other software, I would recommend Python. For perl experts, I would truly appreciate any corrections you could offer to these observations of mine, since I wouldn't mind using perl if it offers benefits either in general or for specific applications. Jonathan On Tue, Dec 29, 2009 at 12:15 PM, Peng Yu wrote: > On Tue, Dec 29, 2009 at 11:03 AM, Sean Davis wrote: >> On Tue, Dec 29, 2009 at 11:08 AM, Peng Yu wrote: >>> May I ask somebody who are versitile in both bioperl and biopython >>> comment on the pros and cons of bioperl and biopython? I'm sending >>> this email to both bioperl and biopython mailing lists. But I hope >>> that it will not result in any contention. >>> >>> I assume that the functionality between bioperl or biopython is the >>> same, i.e., tasks can be done in bioperl can be done biopython and >>> vice versa, as both libraries have been out there over 10 years. >>> Please correct me if my understanding is not true. >> >> The two projects have similar goals, but saying that the functionality >> is the same would be an extreme oversimplification. ?You will need to >> define what you want to do and then check to see what the two projects >> have to offer. ?This will, in general, require perusing the websites >> for both projects as well as the relevant documentation. > > According to your experience, are there some tasks that are easier > with one than with another? > >>> Given that a task that can be done with either bioperl or biopython, >>> I, in particularly, want to know how long it will take to write the >>> code for the task in bioperl and biopython, with the same readability >>> requirement (see below) and the assumption that users have the same >>> fluency in perl and python. >> >> Again, you will want to define the task(s) to be accomplished and then >> weigh the pros and cons of each project combined with local expertise. >> ?If you don't know what you want to do, then you can certainly read >> some examples on the websites and see which project strikes you as a >> "winner" for you. >> >>> python is claimed to be good for maintainability. But perl is >>> criticized for there-are-many-ways-for-a-given-task. Since there are >>> multiple ways in perl, let us assume that we always use perl in a >>> readable way. >> >> These two statements are generalizations that provide little insight >> into the strengths or weaknesses of the languages. ?In other words, >> one can write good or bad code in both languages. >> >> Hope that helps. >> >> Sean >> > > _______________________________________________ > Biopython mailing list ?- ?Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From wgheath at gmail.com Tue Dec 29 15:16:39 2009 From: wgheath at gmail.com (William Heath) Date: Tue, 29 Dec 2009 12:16:39 -0800 Subject: [Bioperl-l] [Biopython] Comparison between bioperl and biopython? In-Reply-To: <81277ce10912291155x6dde10ewe2055b9692d077c1@mail.gmail.com> References: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com> <264855a00912290903m213d7cc4l607e8fa0bad55571@mail.gmail.com> <366c6f340912291115o58ba0b82kce74e18fecd833c8@mail.gmail.com> <81277ce10912291155x6dde10ewe2055b9692d077c1@mail.gmail.com> Message-ID: The biggest reason to go with python is the ease of use. Biologists are not programmers and the learning curve for python is much smaller than that of perl. I like perl but choose python because of this issue. Perl 6 does address some of these issues however but this has not been fully implemented as of yet. -Tim P.S. I love, love, love cpan though which is only for perl right now :( On Tue, Dec 29, 2009 at 11:55 AM, Jonathan Hilmer wrote: > Personally, I think that the differences between Python and Perl > (although substantial) are not large enough to make the language > itself the deciding factor. > > Instead, consider the larger community of software. I haven't yet > found a situation in which Python cannot be applied: it can be used > with R (statistics); lower-level code C or fortran; visualization > software such as PyMol, Chimera, Blender, VTK; plotting with > matplotlib; and scipy/numpy or sage, which provide innumerable > benefits for computation, data-processing, etc. > > Although I don't claim to have a great deal of experience with Perl, I > haven't seen the same integration with that language: I'm assuming it > can be used with R and VTK (not sure about C or fortran?). For this > reason, unless your work is highly targeted and you have no use > programming language integration with other software, I would > recommend Python. > > For perl experts, I would truly appreciate any corrections you could > offer to these observations of mine, since I wouldn't mind using perl > if it offers benefits either in general or for specific applications. > > > Jonathan > > On Tue, Dec 29, 2009 at 12:15 PM, Peng Yu wrote: > > On Tue, Dec 29, 2009 at 11:03 AM, Sean Davis > wrote: > >> On Tue, Dec 29, 2009 at 11:08 AM, Peng Yu wrote: > >>> May I ask somebody who are versitile in both bioperl and biopython > >>> comment on the pros and cons of bioperl and biopython? I'm sending > >>> this email to both bioperl and biopython mailing lists. But I hope > >>> that it will not result in any contention. > >>> > >>> I assume that the functionality between bioperl or biopython is the > >>> same, i.e., tasks can be done in bioperl can be done biopython and > >>> vice versa, as both libraries have been out there over 10 years. > >>> Please correct me if my understanding is not true. > >> > >> The two projects have similar goals, but saying that the functionality > >> is the same would be an extreme oversimplification. You will need to > >> define what you want to do and then check to see what the two projects > >> have to offer. This will, in general, require perusing the websites > >> for both projects as well as the relevant documentation. > > > > According to your experience, are there some tasks that are easier > > with one than with another? > > > >>> Given that a task that can be done with either bioperl or biopython, > >>> I, in particularly, want to know how long it will take to write the > >>> code for the task in bioperl and biopython, with the same readability > >>> requirement (see below) and the assumption that users have the same > >>> fluency in perl and python. > >> > >> Again, you will want to define the task(s) to be accomplished and then > >> weigh the pros and cons of each project combined with local expertise. > >> If you don't know what you want to do, then you can certainly read > >> some examples on the websites and see which project strikes you as a > >> "winner" for you. > >> > >>> python is claimed to be good for maintainability. But perl is > >>> criticized for there-are-many-ways-for-a-given-task. Since there are > >>> multiple ways in perl, let us assume that we always use perl in a > >>> readable way. > >> > >> These two statements are generalizations that provide little insight > >> into the strengths or weaknesses of the languages. In other words, > >> one can write good or bad code in both languages. > >> > >> Hope that helps. > >> > >> Sean > >> > > > > _______________________________________________ > > Biopython mailing list - Biopython at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython > > > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From pengyu.ut at gmail.com Wed Dec 30 12:26:45 2009 From: pengyu.ut at gmail.com (Peng Yu) Date: Thu, 31 Dec 2009 11:26:45 +1800 Subject: [Bioperl-l] How to read in the whole fasta file in the memory? Message-ID: <366c6f340912300926k5af5cc88nc3c3babda541fd1@mail.gmail.com> With Bio::SeqIO, I can only read in the records in a fasta file one by one. This is preferable if there are many records in a file. But I also want to read all the records in. I could use a while loop to read all records in. But could somebody let me know if there is a function in bioperl that can read in all the record at once and return me an object? http://www.bioperl.org/wiki/HOWTO:SeqIO From sdavis2 at mail.nih.gov Wed Dec 30 13:04:53 2009 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Wed, 30 Dec 2009 13:04:53 -0500 Subject: [Bioperl-l] How to read in the whole fasta file in the memory? In-Reply-To: <366c6f340912300926k5af5cc88nc3c3babda541fd1@mail.gmail.com> References: <366c6f340912300926k5af5cc88nc3c3babda541fd1@mail.gmail.com> Message-ID: <264855a00912301004t396e0d4fwf9d291c5d82c3fb9@mail.gmail.com> On Wed, Dec 30, 2009 at 12:26 PM, Peng Yu wrote: > With Bio::SeqIO, I can only read in the records in a fasta file one by > one. This is preferable if there are many records in a file. > > But I also want to read all the records in. I could use a while loop > to read all records in. But could somebody let me know if there is a > function in bioperl that can read in all the record at once and return > me an object? In perl, you can use an array to store the records. You could also use a hash if you have reasonable keys for the entries. Sean > http://www.bioperl.org/wiki/HOWTO:SeqIO > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jason at bioperl.org Wed Dec 30 14:58:54 2009 From: jason at bioperl.org (Jason Stajich) Date: Wed, 30 Dec 2009 11:58:54 -0800 Subject: [Bioperl-l] How to read in the whole fasta file in the memory? In-Reply-To: <264855a00912301004t396e0d4fwf9d291c5d82c3fb9@mail.gmail.com> References: <366c6f340912300926k5af5cc88nc3c3babda541fd1@mail.gmail.com> <264855a00912301004t396e0d4fwf9d291c5d82c3fb9@mail.gmail.com> Message-ID: <3550F192-111F-48A7-B1B7-113FFFAC105B@bioperl.org> or use a database object so you can retrieve sequences that have a particular id. See Bio::DB::Fasta On Dec 30, 2009, at 10:04 AM, Sean Davis wrote: > On Wed, Dec 30, 2009 at 12:26 PM, Peng Yu wrote: >> With Bio::SeqIO, I can only read in the records in a fasta file one >> by >> one. This is preferable if there are many records in a file. >> >> But I also want to read all the records in. I could use a while loop >> to read all records in. But could somebody let me know if there is a >> function in bioperl that can read in all the record at once and >> return >> me an object? > > In perl, you can use an array to store the records. You could also > use a hash if you have reasonable keys for the entries. > > Sean > > >> http://www.bioperl.org/wiki/HOWTO:SeqIO >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ From maj at fortinbras.us Wed Dec 30 16:20:31 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 30 Dec 2009 16:20:31 -0500 Subject: [Bioperl-l] How to read in the whole fasta file in the memory? In-Reply-To: <366c6f340912300926k5af5cc88nc3c3babda541fd1@mail.gmail.com> References: <366c6f340912300926k5af5cc88nc3c3babda541fd1@mail.gmail.com> Message-ID: <2646F627E6D14AADB412A6E6B51E24DA@NewLife> I think you might want Bio::AlignIO: $alnio = Bio::AlignIO->new(-file=> 'my.fas' ); $aln = $alnio->next_aln; @seqs = $aln->each_seqs; MAJ ----- Original Message ----- From: "Peng Yu" To: Sent: Wednesday, December 30, 2009 12:26 PM Subject: [Bioperl-l] How to read in the whole fasta file in the memory? > With Bio::SeqIO, I can only read in the records in a fasta file one by > one. This is preferable if there are many records in a file. > > But I also want to read all the records in. I could use a while loop > to read all records in. But could somebody let me know if there is a > function in bioperl that can read in all the record at once and return > me an object? > > http://www.bioperl.org/wiki/HOWTO:SeqIO > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From David.Messina at sbc.su.se Thu Dec 31 05:55:32 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 31 Dec 2009 11:55:32 +0100 Subject: [Bioperl-l] question about a PAML module In-Reply-To: <31992102.1262223390984.JavaMail.oracle@rif2.s.upf.edu> References: <17885902.1262198478831.JavaMail.oracle@rif1.s.upf.edu> <31992102.1262223390984.JavaMail.oracle@rif2.s.upf.edu> Message-ID: Hi Rui and Sandra, Could you file this as a bug report at http://bugzilla.open-bio.org/enter_bug.cgi?product=Bioperl ? Once you've created the bug report with a brief description of the problem and submitted it, please attach the following to the bug report: - sample input files (a sequence file and a tree file, probably) - a script which reproduces the problem - the output (error messages) like you show below When I updated the code to work with the current version, I didn't exhaustively test all of the different modes of running codeml, so I appreciate you reporting this. There was another, similar issue reported a few days ago. I will try to take a look at both of these bug reports soon. Dave From David.Messina at sbc.su.se Tue Dec 1 10:14:40 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 1 Dec 2009 11:14:40 +0100 Subject: [Bioperl-l] [Bug 2937] Strand in fasta35 output does not seem to be parsed In-Reply-To: <8D08960C647E64438CE5740657CBBDC50148731FDA@iahcexch1.iah.bbsrc.ac.uk> References: <8D08960C647E64438CE5740657CBBDC50148731E47@iahcexch1.iah.bbsrc.ac.uk> <50F0159A-DE58-4405-A2FE-4FA95A3CDDA4@sbc.su.se> <8D08960C647E64438CE5740657CBBDC50148731FDA@iahcexch1.iah.bbsrc.ac.uk> Message-ID: Hi Mick, Did you try running the test case that you had originally attached to the bug report? Or is the below from different code and a diffrent fasta output file? In any case, I'll need to look at the fasta35 output file and the parse2.pl you ran in order to reproduce and fix this -- could you please open a new bug report and attach them to it? Thanks, Dave On Nov 30, 2009, at 17:49, michael watson (IAH-C) wrote: > Hi Dave > > Just got round to looking at this. > > In bioperl-1.6.0, the strand didn't get parsed, but the module only warned about something: > > --------------------- WARNING --------------------- > MSG: Unrecognized alignment line (1) ' /usr/local/fasta3/bin/fasta35 -n -U -Q -H -A -E 2.0 -C 19 -m 0 -m 9i -O iltv_pre.fasta35 iltv_pre.fasta clusters.fasta' > --------------------------------------------------- > > However, in the bioperl-live I just downloaded, this had turned into a full-on stack trace: > > ------------- EXCEPTION ------------- > MSG: Unrecognized alignment line (1) ' /usr/local/fasta3/bin/fasta35 -n -U -Q -H -A -E 2.0 -C 19 -m 0 -m 9i -O iltv_pre.fasta35 iltv_pre.fasta clusters.fasta' > STACK Bio::SearchIO::fasta::next_result /usr/local/bioperl-live_301109//Bio/SearchIO/fasta.pm:1347 > STACK toplevel parse2.pl:20 > ------------------------------------- > > I'm not sure if this is even related to the strand issue (I suspect not, but you never know) but something changed between bioperl-1.6.0 and the live trunk I downloaded today to ensure I still can't use the module. > > Is this another bug report? > > Thanks again for all your help > > Mick > > -----Original Message----- > From: Dave Messina [mailto:David.Messina at sbc.su.se] > Sent: 23 November 2009 17:46 > To: michael watson (IAH-C) > Subject: Re: [Bug 2937] Strand in fasta35 output does not seem to be parsed > > Hi Mick, > > Sure thing -- the current build from subversion is packaged up every > night and available here: > http://www.bioperl.org/DIST/nightly_builds/ > > Just grab bioperl-live.tar.gz from there and you'll get the changes. > > > Dave > > > > > On Nov 23, 2009, at 6:34 PM, michael watson (IAH-C) wrote: > >> Hi Dave >> >> Thanks for the hard work. >> >> Trying to get the latest updates so I can use this... don't have svn >> on my server, tried to install it and I don't have python either, >> which is needed to install it. >> >> I face about 3 weeks whilst my IT department sort this out, unless I >> can access the changes any other way? >> >> Thanks >> Mick >> >> -----Original Message----- >> From: bugzilla-daemon at portal.open-bio.org [mailto:bugzilla- >> daemon at portal.open-bio.org] >> Sent: 20 November 2009 15:12 >> To: michael watson (IAH-C) >> Subject: [Bug 2937] Strand in fasta35 output does not seem to be >> parsed >> >> http://bugzilla.open-bio.org/show_bug.cgi?id=2937 >> >> >> online at davemessina.com changed: >> >> What |Removed |Added >> ---------------------------------------------------------------------------- >> Status|NEW |RESOLVED >> Resolution| |FIXED >> >> >> >> >> ------- Comment #7 from online at davemessina.com 2009-11-20 10:12 EST >> ------- >> Fixed in r16394. >> >> Michael, thanks for the report. Your test cases pass, but please >> reopen the bug >> if needed. >> >> >> -- >> Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi? >> tab=email >> ------- You are receiving this mail because: ------- >> You reported the bug, or are watching the reporter. > From e.osimo at gmail.com Tue Dec 1 18:05:48 2009 From: e.osimo at gmail.com (Emanuele Osimo) Date: Tue, 1 Dec 2009 19:05:48 +0100 Subject: [Bioperl-l] Statistics: how to obtain the p value of a T test Message-ID: <2ac05d0f0912011005n6140869aoc634ad08cdf10ca4@mail.gmail.com> Hello everyone, I'm trying to get the p value of a statistic made with Statistics::TTest I cannot find this function: I can find if the null hypothesis is rejected at a certain confidence level, but I cannot make the script show me the actual p value. Do you know other scripts that can do that? Thanks Emanuele From cjfields at illinois.edu Tue Dec 1 19:25:03 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 1 Dec 2009 13:25:03 -0600 Subject: [Bioperl-l] Fwd: [Utilities-announce] NCBI E-Utility Policy Change References: <7B6F170840CA6C4DA63EE0C8A7BB43EC09CA7387@NIHCESMLBX15.nih.gov> Message-ID: <964687F9-989B-4F11-B74B-977912A922EB@illinois.edu> I'll be adjusting the requisite parameters as indicated below. I'm reluctant to include a time-based limit on submissions (NCBI wants a max of 100 requests at peak hours), but it may become necessary if they request it. chris Begin forwarded message: > From: > Date: December 1, 2009 12:59:34 PM CST > To: > Subject: [Utilities-announce] NCBI E-Utility Policy Change > Reply-To: utilities-announce at ncbi.nlm.nih.gov > > As part of an ongoing effort to ensure efficient access to the Entrez Utilities (E-utilities) by all users, NCBI has decided to change the usage policy for the E-utilities effective June 1, 2010. Effective on June 1, 2010, all E-utility requests, either using standard URLs or SOAP, must contain non-null values for both the &tool and &email parameters. Any E-utility request made after June 1, 2010 that does not contain values for both parameters will return an error explaining that these parameters must be included in E-utility requests. > > The value of the &tool parameter should be a URI-safe string that is the name of the software package, script or web page producing the E-utility request. > > The value of the &email parameter should be a valid e-mail address for the appropriate contact person or group responsible for maintaining the tool producing the E-utility request. > > NCBI uses these parameters to contact users whose use of the E-utilities violates the standard usage policies described athttp://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html#UserSystemRequirements. These usage policies are designed to prevent excessive requests from a small group of users from reducing or eliminating the wider community's access to the E-utilities. NCBI will attempt to contact a user at the e-mail address provided in the &email parameter prior to blocking access to the E-utilities. > > NCBI realizes that this policy change will require many of our users to change their code. Based on past experience, we anticipate that most of our users should be able to make the necessary changes before the June 1, 2010 deadline. If you have any concerns about making these changes by that date, or if you have any questions about these policies, please contact eutilities at ncbi.nlm.nih.gov. > > Thank you for your understanding and cooperation in helping us continue to deliver a reliable and efficient web service. > > _______________________________________________ > Utilities-announce mailing list > http://www.ncbi.nlm.nih.gov/mailman/listinfo/utilities-announce From maj at fortinbras.us Wed Dec 2 02:27:06 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 1 Dec 2009 21:27:06 -0500 Subject: [Bioperl-l] test test test Message-ID: <95142B0024EC48928CB56A69A17A8559@NewLife> MAJ From ocarnorsk138 at gmail.com Wed Dec 2 02:59:48 2009 From: ocarnorsk138 at gmail.com (Ocar Campos) Date: Tue, 1 Dec 2009 23:59:48 -0300 Subject: [Bioperl-l] test test test In-Reply-To: <95142B0024EC48928CB56A69A17A8559@NewLife> References: <95142B0024EC48928CB56A69A17A8559@NewLife> Message-ID: test test test test back O'car Campos C. Bioinformatics Engineering Student. University of Talca. Chile. 2009/12/1 Mark A. Jensen > MAJ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From maj at fortinbras.us Wed Dec 2 03:08:23 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 1 Dec 2009 22:08:23 -0500 Subject: [Bioperl-l] test test test In-Reply-To: References: <95142B0024EC48928CB56A69A17A8559@NewLife> Message-ID: I love when people are paying attention! ----- Original Message ----- From: Ocar Campos To: Mark A. Jensen ; Bioperl Mailing List. Sent: Tuesday, December 01, 2009 9:59 PM Subject: Re: [Bioperl-l] test test test test test test test back O'car Campos C. Bioinformatics Engineering Student. University of Talca. Chile. 2009/12/1 Mark A. Jensen MAJ _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From rtbio.2009 at gmail.com Wed Dec 2 12:07:08 2009 From: rtbio.2009 at gmail.com (Roopa Raghuveer) Date: Wed, 2 Dec 2009 13:07:08 +0100 Subject: [Bioperl-l] Remote blast Message-ID: Hello everyone, I have a problem. I am new to Bioperl. I am working on RNAi tool wherein a cgi script was written which connects to NCBI blast using remote blast program,i.e., The input sequence given in the html page is taken as input and Remote blast is performed on this based on the code for Remote blast.But,I have a problem in the Remote blast code. My code goes like this @compseqs=blastcode($in{'Inputseq'}); sub blastcode { $input1= $_[0]; open(NUC,'>',$nuc); print NUC $input1; close(NUC); my $prog = 'blastn'; my $db = 'refseq_rna'; my $e_val= '1e-10'; my $organism= 'Trypanosoma Brucei'; $gb = new Bio::DB::GenBank; my @params = ( '-prog' => $prog, '-data' => $db, '-expect' => $e_val, '-readmethod' => 'SearchIO', '-Organism' => $organism ); my $factory = Bio::Tools::Run::RemoteBlast->new(@params); #change a paramter $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma brucei[ORGN]'; my $v = 1; #$v is just to turn on and off the messages my $str = Bio::SeqIO->new(-file => $nuc , '-format' => 'fasta' , '-organism' => 'Trypanosoma Brucei' ); while (my $input = $str->next_seq()) { #Blast a sequence against a database: #Alternatively, you could pass in a file with many #sequences rather than loop through sequence one at a time #Remove the loop starting 'while (my $input = $str->next_seq())' #and swap the two lines below for an example of that. my $r = $factory->submit_blast($input); print STDERR "waiting...." if($v>0); while ( my @rids = $factory->each_rid ) { foreach my $rid ( @rids ) { my $rc = $factory->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $factory->remove_rid($rid); } print STDERR "." if ( $v > 0 ); sleep 5; } else { my $result = $rc->next_result(); #save the output my $filename = $result->query_name()."\.out"; $factory->save_output($filename); $factory->remove_rid($rid); # open(BLASTDEBUGFILE,'>',$blastdebugfile); # print BLASTDEBUGFILE "Test1 $result"; # close(BLASTDEBUGFILE); open(OUTFILE,'>',$outfile); print OUTFILE "Test2 $result->database_name()"; close(OUTFILE); while ( my $hit = $result->next_hit ) { next unless ( $v > 0); # open(OUTFILE,'>',$outfile); # print OUTFILE "in while hits"; #close(OUTFILE); my $sequ = $gb->get_Seq_by_version($hit->name); my $dna = $sequ->seq(); # get the sequence as a string push(@seqs,$dna); } } } } } # open(OUTFILE,'>',$outfile); #print OUTFILE $seqs[0]; # close(OUTFILE); return(@seqs); } Here in the above code,my program is able to go till the 'else' part and writing the output file i.e.,this step. my $filename = $result->query_name()."\.out"; But when I tried to enter in to the next while loop where I can get the hits,the program is not entering into the while loop i.e., Not entering into this while ( my $hit = $result->next_hit ) { next unless ( $v > 0); Hence I am unable to get any hits for my query. Ex:-If the query's accession number is Tb11.02.2210, I could just get a file Tb11.02.2210.out file,it is just displaying the file name on the browser. Please help me in solving this problem and mail me regarding any confusions. Regards, Roopa. From ashvip at gmail.com Wed Dec 2 05:24:09 2009 From: ashvip at gmail.com (Vipin Singh) Date: Wed, 2 Dec 2009 10:54:09 +0530 Subject: [Bioperl-l] Problems with installation Message-ID: <8d766b180912012124q44c58f62hecc598615f65e99c@mail.gmail.com> Dear Sir/Madam, I have not been able to install bioperl on my Windows 32 machine despite repeated attempts. I have tried both Active Perl and Strwaberry perl but both do not seem to work. I have followed the instruction given at -- http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows Please guide. Thanks, Vipin. Vipin Singh, Senior Research Fellow, Centre for Cellular and Molecular Biology, Hyderabad - 500007 India. contact - 91-040-27192778 From scott at scottcain.net Wed Dec 2 14:18:37 2009 From: scott at scottcain.net (Scott Cain) Date: Wed, 2 Dec 2009 09:18:37 -0500 Subject: [Bioperl-l] Problems with installation In-Reply-To: <8d766b180912012124q44c58f62hecc598615f65e99c@mail.gmail.com> References: <8d766b180912012124q44c58f62hecc598615f65e99c@mail.gmail.com> Message-ID: <4536f7700912020618y31f8fa15i6e01ce9614a87341@mail.gmail.com> Hello Vipin, "do not seem to work" doesn't give us much to go on; can you tell us what happened? Scott On Wed, Dec 2, 2009 at 12:24 AM, Vipin Singh wrote: > Dear Sir/Madam, > I have not been able to install bioperl on my Windows 32 machine despite > repeated attempts. I have tried both Active Perl and Strwaberry perl but > both do not seem to work. > I have followed the instruction given at > -- http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows > > Please guide. > Thanks, > Vipin. > Vipin Singh, > Senior Research Fellow, > Centre for Cellular and Molecular Biology, > Hyderabad - 500007 > India. > contact - 91-040-27192778 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From maj at fortinbras.us Wed Dec 2 14:18:31 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 2 Dec 2009 09:18:31 -0500 Subject: [Bioperl-l] Problems with installation In-Reply-To: <8d766b180912012124q44c58f62hecc598615f65e99c@mail.gmail.com> References: <8d766b180912012124q44c58f62hecc598615f65e99c@mail.gmail.com> Message-ID: <4A3B25FFC79F43E1AF65E56FD1630F44@NewLife> Hi Vipin-- We need some more information; your commands, error messages you received. Thanks, Mark ----- Original Message ----- From: "Vipin Singh" To: Sent: Wednesday, December 02, 2009 12:24 AM Subject: [Bioperl-l] Problems with installation > Dear Sir/Madam, > I have not been able to install bioperl on my Windows 32 machine despite > repeated attempts. I have tried both Active Perl and Strwaberry perl but > both do not seem to work. > I have followed the instruction given at > -- http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows > > Please guide. > Thanks, > Vipin. > Vipin Singh, > Senior Research Fellow, > Centre for Cellular and Molecular Biology, > Hyderabad - 500007 > India. > contact - 91-040-27192778 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From bcantarel at som.umaryland.edu Wed Dec 2 18:36:27 2009 From: bcantarel at som.umaryland.edu (Brandi Cantarel) Date: Wed, 2 Dec 2009 13:36:27 -0500 Subject: [Bioperl-l] Parsing Genbank Message-ID: Hi all, I am not sure if this is normal, but when I use SEQIO to parse genbank files, it changes the coordinates of things on the minus strand. For example, I have a sequence that has a CDS on the minus strand at it is from 911 to 974. The sequence is 974 nt. x $cds->start 1 x $cds->end 64 How can I get the original coordinates? Is there a command for that or will I have to just do the math? Feature or Bug? ~~~~~~~~~~~~~~~~~~~~ Brandi Cantarel, PhD Bioinformatics Analyst Institute for Genome Sciences School of Medicine University of Maryland, Baltimore From maj at fortinbras.us Wed Dec 2 19:09:11 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 2 Dec 2009 14:09:11 -0500 Subject: [Bioperl-l] Parsing Genbank In-Reply-To: References: Message-ID: Hi Brandi- If $cds is a Bio::SeqFeature::Generic, that's weird (I believe); if its an ordinary Bio::Seq, that's normal. Can you elaborate by posting your code? cheers, MAJ ----- Original Message ----- From: "Brandi Cantarel" To: Sent: Wednesday, December 02, 2009 1:36 PM Subject: [Bioperl-l] Parsing Genbank > Hi all, > I am not sure if this is normal, but when I use SEQIO to parse genbank files, > it changes the coordinates of things on the minus strand. > > > For example, I have a sequence that has a CDS on the minus strand at it is > from 911 to 974. The sequence is 974 nt. > > x $cds->start > 1 > x $cds->end > 64 > > How can I get the original coordinates? Is there a command for that or will I > have to just do the math? > > Feature or Bug? > > > ~~~~~~~~~~~~~~~~~~~~ > Brandi Cantarel, PhD > Bioinformatics Analyst > Institute for Genome Sciences > School of Medicine > University of Maryland, Baltimore > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From bcantarel at som.umaryland.edu Wed Dec 2 19:29:56 2009 From: bcantarel at som.umaryland.edu (Brandi Cantarel) Date: Wed, 2 Dec 2009 14:29:56 -0500 Subject: [Bioperl-l] Parsing Genbank In-Reply-To: References: Message-ID: <854933BD-4EF5-4A74-8CE2-62E11AD64B6A@som.umaryland.edu> Here is some of my code, the real code actually enters the data into a database. $in = Bio::SeqIO->new(-file => $gbkfile, '-format' => 'genbank'); W1:while (my $seq = $in->next_seq()) { my @feats = $seq->get_all_SeqFeatures(); my $j = 0; F1:foreach $cds (@feats) { next F1 unless ($cds->primary_tag() eq 'CDS'); #do something with the cds start and cds end } } LOCUS subjpool12_contig3 974 bp DNA linear UNK 19-Nov-2009 ACCESSION subjpool12_contig3 KEYWORDS . SOURCE human metagenome ORGANISM human metagenome unclassified sequences; organismal metagenomes,metagenomes. FEATURES Location/Qualifiers source 1..974 /mol_type="genomic DNA" /isolation_source="Homo sapiens" /organism="human metagenome" /collection_date="19-Nov-2009" CDS complement(911..974) /locus_tag="subjpool12_contig3|metagene|gene_2" /translation="IRIMTVELINPYIRHVEHST" /score="2.52804" /product="hypothetical protein" /note="score=2.52804" /note="score=2.52804" /note="frame=1" ORIGIN #some sequence?. >From this example, I would like to get the coordinates 911 and 974, rather than 1 and 64. ~~~~~~~~~~~~~~~~~~~~ Brandi Cantarel, PhD Bioinformatics Analyst Institute for Genome Sciences School of Medicine University of Maryland, Baltimore On Dec 2, 2009, at 2:09 PM, Mark A. Jensen wrote: > Hi Brandi- > If $cds is a Bio::SeqFeature::Generic, that's weird (I believe); if its an ordinary Bio::Seq, that's normal. > Can you elaborate by posting your code? > cheers, > MAJ > ----- Original Message ----- From: "Brandi Cantarel" > To: > Sent: Wednesday, December 02, 2009 1:36 PM > Subject: [Bioperl-l] Parsing Genbank > > >> Hi all, >> I am not sure if this is normal, but when I use SEQIO to parse genbank files, it changes the coordinates of things on the minus strand. >> >> >> For example, I have a sequence that has a CDS on the minus strand at it is from 911 to 974. The sequence is 974 nt. >> >> x $cds->start >> 1 >> x $cds->end >> 64 >> >> How can I get the original coordinates? Is there a command for that or will I have to just do the math? >> >> Feature or Bug? >> >> >> ~~~~~~~~~~~~~~~~~~~~ >> Brandi Cantarel, PhD >> Bioinformatics Analyst >> Institute for Genome Sciences >> School of Medicine >> University of Maryland, Baltimore >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > From maj at fortinbras.us Wed Dec 2 19:48:44 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 2 Dec 2009 14:48:44 -0500 Subject: [Bioperl-l] Parsing Genbank In-Reply-To: <854933BD-4EF5-4A74-8CE2-62E11AD64B6A@som.umaryland.edu> References: <854933BD-4EF5-4A74-8CE2-62E11AD64B6A@som.umaryland.edu> Message-ID: <24B3D1A1667D44338CDE5A4FFE425C56@NewLife> with fake seq data and that header, I don't get a problem: DB<2> x $cds->location 0 Bio::Location::Simple=HASH(0x37b1df4) '_end' => 974 '_location_type' => 'EXACT' '_root_verbose' => 0 '_seqid' => 'subjpool12_contig3' '_start' => 911 '_strand' => '-1' Are you using the latest BioPerl (1.6.1 or the trunk) ? MAJ ----- Original Message ----- From: "Brandi Cantarel" Cc: Sent: Wednesday, December 02, 2009 2:29 PM Subject: Re: [Bioperl-l] Parsing Genbank Here is some of my code, the real code actually enters the data into a database. $in = Bio::SeqIO->new(-file => $gbkfile, '-format' => 'genbank'); W1:while (my $seq = $in->next_seq()) { my @feats = $seq->get_all_SeqFeatures(); my $j = 0; F1:foreach $cds (@feats) { next F1 unless ($cds->primary_tag() eq 'CDS'); ###>> debugger stops here for above output #do something with the cds start and cds end } } LOCUS subjpool12_contig3 974 bp DNA linear UNK 19-Nov-2009 ACCESSION subjpool12_contig3 KEYWORDS . SOURCE human metagenome ORGANISM human metagenome unclassified sequences; organismal metagenomes,metagenomes. FEATURES Location/Qualifiers source 1..974 /mol_type="genomic DNA" /isolation_source="Homo sapiens" /organism="human metagenome" /collection_date="19-Nov-2009" CDS complement(911..974) /locus_tag="subjpool12_contig3|metagene|gene_2" /translation="IRIMTVELINPYIRHVEHST" /score="2.52804" /product="hypothetical protein" /note="score=2.52804" /note="score=2.52804" /note="frame=1" ORIGIN #some sequence?. >From this example, I would like to get the coordinates 911 and 974, rather than >1 and 64. ~~~~~~~~~~~~~~~~~~~~ Brandi Cantarel, PhD Bioinformatics Analyst Institute for Genome Sciences School of Medicine University of Maryland, Baltimore On Dec 2, 2009, at 2:09 PM, Mark A. Jensen wrote: > Hi Brandi- > If $cds is a Bio::SeqFeature::Generic, that's weird (I believe); if its an > ordinary Bio::Seq, that's normal. > Can you elaborate by posting your code? > cheers, > MAJ > ----- Original Message ----- From: "Brandi Cantarel" > > To: > Sent: Wednesday, December 02, 2009 1:36 PM > Subject: [Bioperl-l] Parsing Genbank > > >> Hi all, >> I am not sure if this is normal, but when I use SEQIO to parse genbank files, >> it changes the coordinates of things on the minus strand. >> >> >> For example, I have a sequence that has a CDS on the minus strand at it is >> from 911 to 974. The sequence is 974 nt. >> >> x $cds->start >> 1 >> x $cds->end >> 64 >> >> How can I get the original coordinates? Is there a command for that or will >> I have to just do the math? >> >> Feature or Bug? >> >> >> ~~~~~~~~~~~~~~~~~~~~ >> Brandi Cantarel, PhD >> Bioinformatics Analyst >> Institute for Genome Sciences >> School of Medicine >> University of Maryland, Baltimore >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Dec 2 19:39:40 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 2 Dec 2009 13:39:40 -0600 Subject: [Bioperl-l] Parsing Genbank In-Reply-To: <854933BD-4EF5-4A74-8CE2-62E11AD64B6A@som.umaryland.edu> References: <854933BD-4EF5-4A74-8CE2-62E11AD64B6A@som.umaryland.edu> Message-ID: <0E82A338-9D28-4685-A7DA-5019060D96F5@illinois.edu> That one's odd; the coordinates should relate back to the original sequence. Any chance you could pass on the sequence file so we can confirm it? you can do this off-list if the information is sensitive, or you can create a faux sequence that has the same problem). chris On Dec 2, 2009, at 1:29 PM, Brandi Cantarel wrote: > Here is some of my code, the real code actually enters the data into a database. > > > $in = Bio::SeqIO->new(-file => $gbkfile, > '-format' => 'genbank'); > > W1:while (my $seq = $in->next_seq()) { > my @feats = $seq->get_all_SeqFeatures(); > my $j = 0; > F1:foreach $cds (@feats) { > next F1 unless ($cds->primary_tag() eq 'CDS'); > #do something with the cds start and cds end > } > } > > > LOCUS subjpool12_contig3 974 bp DNA linear UNK 19-Nov-2009 > ACCESSION subjpool12_contig3 > KEYWORDS . > SOURCE human metagenome > ORGANISM human metagenome > unclassified sequences; organismal metagenomes,metagenomes. > FEATURES Location/Qualifiers > source 1..974 > /mol_type="genomic DNA" > /isolation_source="Homo sapiens" > /organism="human metagenome" > /collection_date="19-Nov-2009" > CDS complement(911..974) > /locus_tag="subjpool12_contig3|metagene|gene_2" > /translation="IRIMTVELINPYIRHVEHST" > /score="2.52804" > /product="hypothetical protein" > /note="score=2.52804" > /note="score=2.52804" > /note="frame=1" > ORIGIN > #some sequence?. > > > > >> From this example, I would like to get the coordinates 911 and 974, rather than 1 and 64. > > > > ~~~~~~~~~~~~~~~~~~~~ > Brandi Cantarel, PhD > Bioinformatics Analyst > Institute for Genome Sciences > School of Medicine > University of Maryland, Baltimore > > On Dec 2, 2009, at 2:09 PM, Mark A. Jensen wrote: > >> Hi Brandi- >> If $cds is a Bio::SeqFeature::Generic, that's weird (I believe); if its an ordinary Bio::Seq, that's normal. >> Can you elaborate by posting your code? >> cheers, >> MAJ >> ----- Original Message ----- From: "Brandi Cantarel" >> To: >> Sent: Wednesday, December 02, 2009 1:36 PM >> Subject: [Bioperl-l] Parsing Genbank >> >> >>> Hi all, >>> I am not sure if this is normal, but when I use SEQIO to parse genbank files, it changes the coordinates of things on the minus strand. >>> >>> >>> For example, I have a sequence that has a CDS on the minus strand at it is from 911 to 974. The sequence is 974 nt. >>> >>> x $cds->start >>> 1 >>> x $cds->end >>> 64 >>> >>> How can I get the original coordinates? Is there a command for that or will I have to just do the math? >>> >>> Feature or Bug? >>> >>> >>> ~~~~~~~~~~~~~~~~~~~~ >>> Brandi Cantarel, PhD >>> Bioinformatics Analyst >>> Institute for Genome Sciences >>> School of Medicine >>> University of Maryland, Baltimore >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Wed Dec 2 20:52:28 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 2 Dec 2009 15:52:28 -0500 Subject: [Bioperl-l] Parsing Genbank In-Reply-To: <001B6793-D1C3-46EF-AA96-CCA1B684AD8E@som.umaryland.edu> References: <854933BD-4EF5-4A74-8CE2-62E11AD64B6A@som.umaryland.edu> <24B3D1A1667D44338CDE5A4FFE425C56@NewLife> <001B6793-D1C3-46EF-AA96-CCA1B684AD8E@som.umaryland.edu> Message-ID: <07332179362A4D53ACAA9A72AD208049@NewLife> Yes, 1.006 is 1.6. There is a later update 1.6.1, but it sounds as if there is a bug. If you can provide data that can reproduce it, as Chris suggests, we can get onto it. thanks MAJ ----- Original Message ----- From: Brandi Cantarel To: Mark A. Jensen Sent: Wednesday, December 02, 2009 3:38 PM Subject: Re: [Bioperl-l] Parsing Genbank How can I tell what version I am using?When I use the command from the website: perl -MBio::Root::Version -e 'printf "%vd\n", $Bio::Root::Version::VERSION' I get 1.006, but the bioperl lib was updated in July, so probably 1.6.0 version since that was the last stable release?. Brandi On Dec 2, 2009, at 2:48 PM, Mark A. Jensen wrote: with fake seq data and that header, I don't get a problem: DB<2> x $cds->location 0 Bio::Location::Simple=HASH(0x37b1df4) '_end' => 974 '_location_type' => 'EXACT' '_root_verbose' => 0 '_seqid' => 'subjpool12_contig3' '_start' => 911 '_strand' => '-1' Are you using the latest BioPerl (1.6.1 or the trunk) ? MAJ ----- Original Message ----- From: "Brandi Cantarel" Cc: Sent: Wednesday, December 02, 2009 2:29 PM Subject: Re: [Bioperl-l] Parsing Genbank Here is some of my code, the real code actually enters the data into a database. $in = Bio::SeqIO->new(-file => $gbkfile, '-format' => 'genbank'); W1:while (my $seq = $in->next_seq()) { my @feats = $seq->get_all_SeqFeatures(); my $j = 0; F1:foreach $cds (@feats) { next F1 unless ($cds->primary_tag() eq 'CDS'); ###>> debugger stops here for above output #do something with the cds start and cds end } } LOCUS subjpool12_contig3 974 bp DNA linear UNK 19-Nov-2009 ACCESSION subjpool12_contig3 KEYWORDS . SOURCE human metagenome ORGANISM human metagenome unclassified sequences; organismal metagenomes,metagenomes. FEATURES Location/Qualifiers source 1..974 /mol_type="genomic DNA" /isolation_source="Homo sapiens" /organism="human metagenome" /collection_date="19-Nov-2009" CDS complement(911..974) /locus_tag="subjpool12_contig3|metagene|gene_2" /translation="IRIMTVELINPYIRHVEHST" /score="2.52804" /product="hypothetical protein" /note="score=2.52804" /note="score=2.52804" /note="frame=1" ORIGIN #some sequence?. From this example, I would like to get the coordinates 911 and 974, rather than 1 and 64. ~~~~~~~~~~~~~~~~~~~~ Brandi Cantarel, PhD Bioinformatics Analyst Institute for Genome Sciences School of Medicine University of Maryland, Baltimore On Dec 2, 2009, at 2:09 PM, Mark A. Jensen wrote: Hi Brandi- If $cds is a Bio::SeqFeature::Generic, that's weird (I believe); if its an ordinary Bio::Seq, that's normal. Can you elaborate by posting your code? cheers, MAJ ----- Original Message ----- From: "Brandi Cantarel" To: Sent: Wednesday, December 02, 2009 1:36 PM Subject: [Bioperl-l] Parsing Genbank Hi all, I am not sure if this is normal, but when I use SEQIO to parse genbank files, it changes the coordinates of things on the minus strand. For example, I have a sequence that has a CDS on the minus strand at it is from 911 to 974. The sequence is 974 nt. x $cds->start 1 x $cds->end 64 How can I get the original coordinates? Is there a command for that or will I have to just do the math? Feature or Bug? ~~~~~~~~~~~~~~~~~~~~ Brandi Cantarel, PhD Bioinformatics Analyst Institute for Genome Sciences School of Medicine University of Maryland, Baltimore _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Dec 2 21:07:58 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 2 Dec 2009 15:07:58 -0600 Subject: [Bioperl-l] Parsing Genbank In-Reply-To: <07332179362A4D53ACAA9A72AD208049@NewLife> References: <854933BD-4EF5-4A74-8CE2-62E11AD64B6A@som.umaryland.edu> <24B3D1A1667D44338CDE5A4FFE425C56@NewLife> <001B6793-D1C3-46EF-AA96-CCA1B684AD8E@som.umaryland.edu> <07332179362A4D53ACAA9A72AD208049@NewLife> Message-ID: <23AE9399-B370-4DB3-94AA-AC8021AF321E@illinois.edu> One never knows, but I would be very surprised if this somehow snuck by the test suite we have, particularly since Gbrowse extensively uses SeqFeatures (any changes should have popped out along the way). Not much we can do unless we have something to help confirm the problem. Also might help to know the source of the genbank file itself. chris On Dec 2, 2009, at 2:52 PM, Mark A. Jensen wrote: > Yes, 1.006 is 1.6. There is a later update 1.6.1, but it sounds > as if there is a bug. If you can provide data that can reproduce > it, as Chris suggests, we can get onto it. > thanks MAJ > ----- Original Message ----- > From: Brandi Cantarel > To: Mark A. Jensen > Sent: Wednesday, December 02, 2009 3:38 PM > Subject: Re: [Bioperl-l] Parsing Genbank > > > How can I tell what version I am using?When I use the command from the website: > > > perl -MBio::Root::Version -e 'printf "%vd\n", $Bio::Root::Version::VERSION' > > > I get 1.006, but the bioperl lib was updated in July, so probably 1.6.0 version since that was the last stable release?. > > > Brandi > > > > > On Dec 2, 2009, at 2:48 PM, Mark A. Jensen wrote: > > > with fake seq data and that header, I don't get a problem: > > DB<2> x $cds->location > 0 Bio::Location::Simple=HASH(0x37b1df4) > '_end' => 974 > '_location_type' => 'EXACT' > '_root_verbose' => 0 > '_seqid' => 'subjpool12_contig3' > '_start' => 911 > '_strand' => '-1' > > Are you using the latest BioPerl (1.6.1 or the trunk) ? > MAJ > ----- Original Message ----- From: "Brandi Cantarel" > Cc: > Sent: Wednesday, December 02, 2009 2:29 PM > Subject: Re: [Bioperl-l] Parsing Genbank > > > Here is some of my code, the real code actually enters the data into a database. > > > $in = Bio::SeqIO->new(-file => $gbkfile, > '-format' => 'genbank'); > > W1:while (my $seq = $in->next_seq()) { > my @feats = $seq->get_all_SeqFeatures(); > my $j = 0; > F1:foreach $cds (@feats) { > next F1 unless ($cds->primary_tag() eq 'CDS'); > ###>> debugger stops here for above output > > #do something with the cds start and cds end > } > } > > > LOCUS subjpool12_contig3 974 bp DNA linear UNK 19-Nov-2009 > ACCESSION subjpool12_contig3 > KEYWORDS . > SOURCE human metagenome > ORGANISM human metagenome > unclassified sequences; organismal metagenomes,metagenomes. > FEATURES Location/Qualifiers > source 1..974 > /mol_type="genomic DNA" > /isolation_source="Homo sapiens" > /organism="human metagenome" > /collection_date="19-Nov-2009" > CDS complement(911..974) > /locus_tag="subjpool12_contig3|metagene|gene_2" > /translation="IRIMTVELINPYIRHVEHST" > /score="2.52804" > /product="hypothetical protein" > /note="score=2.52804" > /note="score=2.52804" > /note="frame=1" > ORIGIN > #some sequence?. > > > > > > From this example, I would like to get the coordinates 911 and 974, rather than 1 and 64. > > > > > ~~~~~~~~~~~~~~~~~~~~ > Brandi Cantarel, PhD > Bioinformatics Analyst > Institute for Genome Sciences > School of Medicine > University of Maryland, Baltimore > > On Dec 2, 2009, at 2:09 PM, Mark A. Jensen wrote: > > > Hi Brandi- > > If $cds is a Bio::SeqFeature::Generic, that's weird (I believe); if its an ordinary Bio::Seq, that's normal. > > Can you elaborate by posting your code? > > cheers, > > MAJ > > ----- Original Message ----- From: "Brandi Cantarel" > > To: > > Sent: Wednesday, December 02, 2009 1:36 PM > > Subject: [Bioperl-l] Parsing Genbank > > > > > > Hi all, > > I am not sure if this is normal, but when I use SEQIO to parse genbank files, it changes the coordinates of things on the minus strand. > > > > > > For example, I have a sequence that has a CDS on the minus strand at it is from 911 to 974. The sequence is 974 nt. > > > > x $cds->start > > 1 > > x $cds->end > > 64 > > > > How can I get the original coordinates? Is there a command for that or will I have to just do the math? > > > > Feature or Bug? > > > > > > ~~~~~~~~~~~~~~~~~~~~ > > Brandi Cantarel, PhD > > Bioinformatics Analyst > > Institute for Genome Sciences > > School of Medicine > > University of Maryland, Baltimore > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From lstein at cshl.edu Thu Dec 3 10:31:31 2009 From: lstein at cshl.edu (Lincoln Stein) Date: Thu, 3 Dec 2009 05:31:31 -0500 Subject: [Bioperl-l] modENCODE seeking data managers Message-ID: <6dce9a0b0912030231p740d0ecbj4a7e79a6ab71801d@mail.gmail.com> Hi All, My apologies for spamming the list, but this announcement may be of interest: The modENCODE Data Coordinating Center (Model Organism Encylopedia of DNA Elements; www.modencode.org) is seeking data managers to gather and curate large scale functional genomics data sets in fly and worm. For details, see http://blog.modencode.org/?p=350. Lincoln -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa From dan.bolser at gmail.com Thu Dec 3 11:44:40 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Thu, 3 Dec 2009 11:44:40 +0000 Subject: [Bioperl-l] Merging separate sequence and quality files to FASTQ ? Message-ID: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com> Hi, can someone test the script here on zero length fasta / qual files? http://www.bioperl.org/wiki/Merging_separate_sequence_and_quality_files_to_FASTQ It seems the output has an extra newline in the sequence part of the output (which throws off scripts that rely on the 'four lines per record' structure of the fastq (although I'm not sure if it's illegal fastq). Here is what I see BEGIN $ head one.fna >FVF7ZWH02PFOVG length=0 xy=2116_2074 region=2 $ head one.qual >FVF7ZWH02PFOVG length=0 xy=2116_2074 region=2 $ createFastq.plx one.fna one.qual @FVF7ZWH02PFOVG +FVF7ZWH02PFOVG END Currently I just put in a clause in the script to skip any zero length sequences, but I think the Qual shouldn't output an extra newline like this. Cheers, Dan. -- JHB: Bioinformatics is Biology and Biology is Bioinformatics. From biopython at maubp.freeserve.co.uk Thu Dec 3 12:12:15 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 3 Dec 2009 12:12:15 +0000 Subject: [Bioperl-l] Merging separate sequence and quality files to FASTQ ? In-Reply-To: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com> References: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com> Message-ID: <320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com> On Thu, Dec 3, 2009 at 11:44 AM, Dan Bolser wrote: > Hi, can someone test the script here on zero length fasta / qual files? > > http://www.bioperl.org/wiki/Merging_separate_sequence_and_quality_files_to_FASTQ > > It seems the output has an extra newline in the sequence part of the > output (which throws off scripts that rely on the 'four lines per > record' structure of the fastq (although I'm not sure if it's illegal > fastq). Hi Dan, The OBF consensus was FASTQ records with a zero length sequence might be useful, and should be output as exactly four lines (one blank sequence line, one blank quality line). However for parsing, any number of blank lines should be OK. http://lists.open-bio.org/pipermail/open-bio-l/2009-July/000522.html I can confirm the perl script currently outputs a FASTQ file with TWO blank lines for the sequence, giving five lines in total for the zero length record. That does suggest a bug. What version of BioPerl are you running? Peter P.S. The script is throwing away any description after the identifier. From dan.bolser at gmail.com Thu Dec 3 13:07:27 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Thu, 3 Dec 2009 13:07:27 +0000 Subject: [Bioperl-l] Merging separate sequence and quality files to FASTQ ? In-Reply-To: <320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com> References: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com> <320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com> Message-ID: <2c8757af0912030507h1b3d12b8p60c8b405792d6579@mail.gmail.com> 2009/12/3 Peter : > On Thu, Dec 3, 2009 at 11:44 AM, Dan Bolser wrote: >> Hi, can someone test the script here on zero length fasta / qual files? >> >> http://www.bioperl.org/wiki/Merging_separate_sequence_and_quality_files_to_FASTQ >> >> It seems the output has an extra newline in the sequence part of the >> output (which throws off scripts that rely on the 'four lines per >> record' structure of the fastq (although I'm not sure if it's illegal >> fastq). > > Hi Dan, > > The OBF consensus was FASTQ records with a zero length > sequence might be useful, and should be output as exactly > four lines (one blank sequence line, one blank quality line). > However for parsing, any number of blank lines should be OK. > http://lists.open-bio.org/pipermail/open-bio-l/2009-July/000522.html > > I can confirm the perl script currently outputs a FASTQ file > with TWO blank lines for the sequence, giving five lines in > total for the zero length record. That does suggest a bug. > What version of BioPerl are you running? Hi Peter, Basically, I'm not running the 'latest' version of BP, which is why I asked this question of the list rather than filing a bug report. What version are you running? ;-) Sounds like 5 lines instead of the expected 4 is a minor bug. (Thanks for the info). > Peter > > P.S. The script is throwing away any description after the > identifier. That's probably bad. Feel free to edit the script on the wiki. Sadly, MediaWiki's diff features are less than optimal, so developing scripts on the wiki isn't ideal. Anyone know how to plug git-hub into a script apparently hosted on a wiki? Or is git-hub basically designed to be 'wiki for code'? I'm wondering, because with the FlaggedRevs extension you could basically build a whole release in the wiki. Which would be fun if nothing else! -- JHP: Biology is bioinformatics and bioinformatics is biology. From heyne at informatik.uni-freiburg.de Thu Dec 3 13:19:51 2009 From: heyne at informatik.uni-freiburg.de (Steffen Heyne) Date: Thu, 03 Dec 2009 14:19:51 +0100 Subject: [Bioperl-l] problem with alignments and sequence locations In-Reply-To: References: <4AF962AA.7060908@informatik.uni-freiburg.de> Message-ID: <4B17BAF7.2050604@informatik.uni-freiburg.de> Hello, so I tried to fix the problem with the location. Currently it works for me with the following changes: LocatableSeq.pm sub get_nse{ ... my $ret; if ($self->strand() >= 0) { $ret = $id . $v. $char1 . $st . $char2 . $end ; } else { $ret = $id . $v. $char1 . $end . $char2 . $st ; } return $ret; } Then I recognized during the usage of $aln->remove_seq() that it cannot remove a seq as it uses a wrong NSE to lookup sequences. I changed the following: SimpleAlign.pm sub remove_seq { ... $id = $seq->id(); $start = $seq->start(); $end = $seq->end(); ## changed code: my $v = $seq->version ? '.'.$seq->version : ''; if ($seq->strand >=0){ $name = sprintf("%s%s/%d-%d",$id,$v,$start,$end); } elsif ($seq->strand == -1){ $name = sprintf("%s%s/%d-%d",$id,$v,$end,$start); } ... } The above code in LocatableSeq.pm worked in the case if I read an alignment in stockholm format and write it out in clustalw format. But if I read an alignment in clustalw and write it out as stockholm (or something else) it didn't worked, as the strand is not correctly set in ClustalW::next_aln. It works with the following changes: ClustalW.pm sub next_aln{ ... my ( $sname, $start, $end, $strand ); ## strand added $strand = 0; ## new, standard = 0??? foreach my $name ( sort { $order{$a} <=> $order{$b} } keys %alignments ) { if ( $name =~ /(\S+):(\d+)-(\d+)/ ) { ( $sname, $start, $end ) = ( $1, $2, $3 ); $strand = 1; ## new if ($start > $end) { ## new ($start, $end, $strand) = ($end, $start, -1); ##new } ## new } else { ( $sname, $start ) = ( $name, 1 ); my $str = $alignments{$name}; $str =~ s/[^A-Za-z]//g; $end = length($str); } my $seq = Bio::LocatableSeq->new( -seq => $alignments{$name}, -id => $sname, -start => $start, -end => $end, -strand=> $strand ## new ); ... } So I don't know if I changed things at their correct position. And I found them only because I used certain functions. I dont know how broad the effect of a changed NSE in LocatableSeq.pm is to other Modules and functions. But I'm happy with my changes (so far :-)...). Do you will change this to your proposed way in bioperl trunk? Thanks! steffen Chris Fields schrieb: > On Nov 10, 2009, at 6:55 AM, Steffen Heyne wrote: > >> Hi, >> >> I'm using Bioperl for my research and it is very useful! Thank you! >> >> Currently I have a problem with locations tags of sequences. I read in >> seed alignments of Rfam (in stockholm format, but I think it is >> similar to other formats). >> >> If the location is like: >> >> AB194432.1/908-846 >> >> the start/end values are changed to >> >> $seq->start = 846 >> $seq->end = 908 >> >> and therefore the new location (e.g.$seq->get_nse) is: >> >> AB194432.1/846-908 >> >> The $seq->strand tag is correctly set to -1 in this case, but if the >> alignment is written out again (clustal, stockholm,...) this strand >> info is lost and the sequences have this "wrong" location. But this >> information is important in respect to the sequence accession number. >> >> Is there a way to set the location back to the original one or is this >> behavior desired? Any manually setting with $seq->start($val) failed >> due to automatic checking. >> >> I'm using bioperl 1.6.1 >> >> Thanks! >> >> steffen > > This is a definite bug. We recently discussed amending the NSE format > due to this (the subject came up over the last few months or so); it's > fallen through the cracks. Fortunaely it is very easy to fix (the > relevant method is in LocatableSeq). > > Does anyone have a problem with me adding this in? It will change > output for only those instances where the strand is -1, so > > AB194432.1/908-846 > > would be start = 846, end = 908, strand = -1 > > AB194432.1/846-908 > > would be start = 846, end = 908, strand = 1 > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- --- Steffen Heyne, Dipl.-Bioinf. Lehrstuhl f?r Bioinformatik Institut f?r Informatik Albert-Ludwigs-Universit?t Freiburg Georges-K?hler-Allee 106 79110 Freiburg, Germany Tel: (+49) 761 203 7465 Fax: (+49) 761 203 7462 Mail: heyne at informatik.uni-freiburg.de From cjfields at illinois.edu Thu Dec 3 13:47:32 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 3 Dec 2009 07:47:32 -0600 Subject: [Bioperl-l] Merging separate sequence and quality files to FASTQ ? In-Reply-To: <2c8757af0912030507h1b3d12b8p60c8b405792d6579@mail.gmail.com> References: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com> <320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com> <2c8757af0912030507h1b3d12b8p60c8b405792d6579@mail.gmail.com> Message-ID: <747E7628-2A72-4064-A7B9-65A904ACDFE7@illinois.edu> Dan, On Dec 3, 2009, at 7:07 AM, Dan Bolser wrote: > 2009/12/3 Peter : >> On Thu, Dec 3, 2009 at 11:44 AM, Dan Bolser wrote: >>> Hi, can someone test the script here on zero length fasta / qual files? >>> >>> http://www.bioperl.org/wiki/Merging_separate_sequence_and_quality_files_to_FASTQ >>> >>> It seems the output has an extra newline in the sequence part of the >>> output (which throws off scripts that rely on the 'four lines per >>> record' structure of the fastq (although I'm not sure if it's illegal >>> fastq). >> >> Hi Dan, >> >> The OBF consensus was FASTQ records with a zero length >> sequence might be useful, and should be output as exactly >> four lines (one blank sequence line, one blank quality line). >> However for parsing, any number of blank lines should be OK. >> http://lists.open-bio.org/pipermail/open-bio-l/2009-July/000522.html >> >> I can confirm the perl script currently outputs a FASTQ file >> with TWO blank lines for the sequence, giving five lines in >> total for the zero length record. That does suggest a bug. >> What version of BioPerl are you running? > > Hi Peter, > > Basically, I'm not running the 'latest' version of BP, which is why I > asked this question of the list rather than filing a bug report. What > version are you running? ;-) > > Sounds like 5 lines instead of the expected 4 is a minor bug. (Thanks > for the info). FASTQ parsing had undergone a major revision prior to 1.6.1 (the latest release in CPAN). Basically, it now parses all three FASTQ variants. However, Peter indicates there may still be a problem, and it's likely he's running 1.6.1. Peter can you confirm that? >> Peter >> >> P.S. The script is throwing away any description after the >> identifier. > > That's probably bad. Feel free to edit the script on the wiki. Sadly, > MediaWiki's diff features are less than optimal, so developing scripts > on the wiki isn't ideal. Anyone know how to plug git-hub into a script > apparently hosted on a wiki? > > Or is git-hub basically designed to be 'wiki for code'? It's more an integrated solution for hosting code via git, with a wiki, bug queue, etc. Think Soourceforge, but a lot nicer and with no ads ;> BitBucket/Hg is another (very nice) solution along the same lines, developed in Python (Github is Ruby-centric). > I'm wondering, because with the FlaggedRevs extension you could > basically build a whole release in the wiki. Which would be fun if > nothing else! I'm not following you there. Could you elaborate on why that would be beneficial? I could see ( chris From biopython at maubp.freeserve.co.uk Thu Dec 3 14:20:32 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 3 Dec 2009 14:20:32 +0000 Subject: [Bioperl-l] Merging separate sequence and quality files to FASTQ ? In-Reply-To: <747E7628-2A72-4064-A7B9-65A904ACDFE7@illinois.edu> References: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com> <320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com> <2c8757af0912030507h1b3d12b8p60c8b405792d6579@mail.gmail.com> <747E7628-2A72-4064-A7B9-65A904ACDFE7@illinois.edu> Message-ID: <320fb6e00912030620m6ce87fc6t310750969e320be7@mail.gmail.com> On Thu, Dec 3, 2009 at 1:47 PM, Chris Fields wrote: > > FASTQ parsing had undergone a major revision prior to > 1.6.1 (the latest release in CPAN). ?Basically, it now parses > all three FASTQ variants. ?However, Peter indicates there > may still be a problem, and it's likely he's running 1.6.1. > Peter can you confirm that? I had BioPerl from SVN circa 1.6.1 (not sure if this was before or after the release of 1.6.1 now): $ perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"' 1.0069 $ perl -MBio::SeqIO -e 'print $Bio::SeqIO::VERSION,"\n"' 1.0069 If the tuples mean anything to you: $ perl -MBio::Root::Version -e 'printf "%vd\n", $Bio::Root::Version::VERSION' 49.46.48.48.54.57 $ perl -MBio::SeqIO -e 'printf "%vd\n", $Bio::SeqIO::VERSION' 49.46.48.48.54.57 I just updated to revision 16435, and retested. I get the same BioPerl version numbers, and the same extra blank line in the sequence FASTQ output as Dan reported. Peter From cjfields at illinois.edu Thu Dec 3 14:39:35 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 3 Dec 2009 08:39:35 -0600 Subject: [Bioperl-l] Merging separate sequence and quality files to FASTQ ? In-Reply-To: <320fb6e00912030620m6ce87fc6t310750969e320be7@mail.gmail.com> References: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com> <320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com> <2c8757af0912030507h1b3d12b8p60c8b405792d6579@mail.gmail.com> <747E7628-2A72-4064-A7B9-65A904ACDFE7@illinois.edu> <320fb6e00912030620m6ce87fc6t310750969e320be7@mail.gmail.com> Message-ID: On Dec 3, 2009, at 8:20 AM, Peter wrote: > On Thu, Dec 3, 2009 at 1:47 PM, Chris Fields wrote: >> >> FASTQ parsing had undergone a major revision prior to >> 1.6.1 (the latest release in CPAN). Basically, it now parses >> all three FASTQ variants. However, Peter indicates there >> may still be a problem, and it's likely he's running 1.6.1. >> Peter can you confirm that? > > I had BioPerl from SVN circa 1.6.1 (not sure if this was before > or after the release of 1.6.1 now): > > $ perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"' > 1.0069 > $ perl -MBio::SeqIO -e 'print $Bio::SeqIO::VERSION,"\n"' > 1.0069 > > If the tuples mean anything to you: > > $ perl -MBio::Root::Version -e 'printf "%vd\n", $Bio::Root::Version::VERSION' > 49.46.48.48.54.57 > $ perl -MBio::SeqIO -e 'printf "%vd\n", $Bio::SeqIO::VERSION' > 49.46.48.48.54.57 > > I just updated to revision 16435, and retested. I get the same > BioPerl version numbers, and the same extra blank line in the > sequence FASTQ output as Dan reported. > > Peter Okay I will try to look into it today (it should be an easy fix). There are two issues, correct? 1) extra blank line. 2) missing description Dan, could you go ahead and submit this as a bug, just in case (so we don't lose track)? Otherwise it might get lost on the mail list or wiki. chris From biopython at maubp.freeserve.co.uk Thu Dec 3 14:56:39 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 3 Dec 2009 14:56:39 +0000 Subject: [Bioperl-l] Merging separate sequence and quality files to FASTQ ? In-Reply-To: References: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com> <320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com> <2c8757af0912030507h1b3d12b8p60c8b405792d6579@mail.gmail.com> <747E7628-2A72-4064-A7B9-65A904ACDFE7@illinois.edu> <320fb6e00912030620m6ce87fc6t310750969e320be7@mail.gmail.com> Message-ID: <320fb6e00912030656p5b75a566t22e1d2037d945338@mail.gmail.com> On Thu, Dec 3, 2009 at 2:39 PM, Chris Fields wrote: > Okay I will try to look into it today (it should be an easy fix). ?There are two issues, correct? > > 1) extra blank line. Which seems to be a bug in BioPerl SeqIO itself. > 2) missing description This is just a trivial bug/omission in the wiki example, http://www.bioperl.org/wiki/Merging_separate_sequence_and_quality_files_to_FASTQ You just need to replace this: my $bsq_obj = Bio::Seq::Quality-> new( -id => $seq_obj->id, -seq => $seq_obj->seq, -qual => $qual_obj->qual, ); With: my $bsq_obj = Bio::Seq::Quality-> new( -id => $seq_obj->id, -description => $seq_obj->description, -seq => $seq_obj->seq, -qual => $qual_obj->qual, ); Look - I seem to be learning Perl by osmosis ;) Peter From dan.bolser at gmail.com Thu Dec 3 16:29:11 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Thu, 3 Dec 2009 16:29:11 +0000 Subject: [Bioperl-l] Merging separate sequence and quality files to FASTQ ? In-Reply-To: <320fb6e00912030656p5b75a566t22e1d2037d945338@mail.gmail.com> References: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com> <320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com> <2c8757af0912030507h1b3d12b8p60c8b405792d6579@mail.gmail.com> <747E7628-2A72-4064-A7B9-65A904ACDFE7@illinois.edu> <320fb6e00912030620m6ce87fc6t310750969e320be7@mail.gmail.com> <320fb6e00912030656p5b75a566t22e1d2037d945338@mail.gmail.com> Message-ID: <2c8757af0912030829t54e87a4bmf166370ca10e966a@mail.gmail.com> 2009/12/3 Peter : > On Thu, Dec 3, 2009 at 2:39 PM, Chris Fields wrote: >> Okay I will try to look into it today (it should be an easy fix). ?There are two issues, correct? ... >> 2) missing description > > This is just a trivial bug/omission in the wiki example, ... > Look - I seem to be learning Perl by osmosis ;) Yay! From dan.bolser at gmail.com Thu Dec 3 16:30:44 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Thu, 3 Dec 2009 16:30:44 +0000 Subject: [Bioperl-l] Merging separate sequence and quality files to FASTQ ? In-Reply-To: <747E7628-2A72-4064-A7B9-65A904ACDFE7@illinois.edu> References: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com> <320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com> <2c8757af0912030507h1b3d12b8p60c8b405792d6579@mail.gmail.com> <747E7628-2A72-4064-A7B9-65A904ACDFE7@illinois.edu> Message-ID: <2c8757af0912030830n718f8cc7hc9e501919435e4a8@mail.gmail.com> 2009/12/3 Chris Fields : > Dan, > > On Dec 3, 2009, at 7:07 AM, Dan Bolser wrote: ... >> I'm wondering, because with the FlaggedRevs extension you could >> basically build a whole release in the wiki. Which would be fun if >> nothing else! > > I'm not following you there. ?Could you elaborate on why that would be beneficial? ?I could see ( I never said it would be beneficial, only that it would be fun. http://www.mediawiki.org/wiki/Flaggedrevs From florent.angly at gmail.com Thu Dec 3 18:26:57 2009 From: florent.angly at gmail.com (Florent Angly) Date: Thu, 03 Dec 2009 10:26:57 -0800 Subject: [Bioperl-l] problem with alignments and sequence locations In-Reply-To: <4B17BAF7.2050604@informatik.uni-freiburg.de> References: <4AF962AA.7060908@informatik.uni-freiburg.de> <4B17BAF7.2050604@informatik.uni-freiburg.de> Message-ID: <4B1802F1.1040304@gmail.com> Hi all, Like Steffen, I've had a few burning questions too regarding LocatableSeq lately. I've had an occasional issue with LocatableSeq. Most assembly-related modules use LocatableSeq objects. They specify the sequence start but not the sequence end. This works in most cases, but I've recently encountered very occasional error messages related to having not explicitely set the end of the sequence. I've been unable to put together a small test case to reproduce the bug easily. My question is. If the start of the sequence is set, is it mandatory to set the end of the sequence? If so, then maybe the documentation needs to be explicit about it and maybe there needs to be a check that enforces that the end is set. In fact, it seems like if I provide a sequence and its start position, the LocatableSeq code should be able to automatically calculate its end, no? Florent Steffen Heyne wrote: > Hello, > > so I tried to fix the problem with the location. Currently it works for > me with the following changes: > > LocatableSeq.pm > > sub get_nse{ > > ... > > my $ret; > if ($self->strand() >= 0) { > $ret = $id . $v. $char1 . $st . $char2 . $end ; > } else { > $ret = $id . $v. $char1 . $end . $char2 . $st ; > } > return $ret; > } > > Then I recognized during the usage of $aln->remove_seq() that it cannot > remove a seq as it uses a wrong NSE to lookup sequences. I changed the > following: > > SimpleAlign.pm > > sub remove_seq { > > ... > $id = $seq->id(); > $start = $seq->start(); > $end = $seq->end(); > > ## changed code: > > my $v = $seq->version ? '.'.$seq->version : ''; > if ($seq->strand >=0){ > $name = sprintf("%s%s/%d-%d",$id,$v,$start,$end); > } elsif ($seq->strand == -1){ > $name = sprintf("%s%s/%d-%d",$id,$v,$end,$start); > } > ... > > } > > The above code in LocatableSeq.pm worked in the case if I read an > alignment in stockholm format and write it out in clustalw format. But > if I read an alignment in clustalw and write it out as stockholm (or > something else) it didn't worked, as the strand is not correctly set in > ClustalW::next_aln. It works with the following changes: > > ClustalW.pm > > sub next_aln{ > > ... > > my ( $sname, $start, $end, $strand ); ## strand added > $strand = 0; ## new, standard = 0??? > foreach my $name ( sort { $order{$a} <=> $order{$b} } keys > %alignments ) { > if ( $name =~ /(\S+):(\d+)-(\d+)/ ) { > ( $sname, $start, $end ) = ( $1, $2, $3 ); > $strand = 1; ## new > if ($start > $end) { ## new > ($start, $end, $strand) = ($end, $start, -1); ##new > } ## new > > } > else { > ( $sname, $start ) = ( $name, 1 ); > my $str = $alignments{$name}; > $str =~ s/[^A-Za-z]//g; > $end = length($str); > } > > my $seq = Bio::LocatableSeq->new( > -seq => $alignments{$name}, > -id => $sname, > -start => $start, > -end => $end, > -strand=> $strand ## new > ); > > ... > > } > > So I don't know if I changed things at their correct position. And I > found them only because I used certain functions. I dont know how broad > the effect of a changed NSE in LocatableSeq.pm is to other Modules and > functions. But I'm happy with my changes (so far :-)...). > > Do you will change this to your proposed way in bioperl trunk? > > Thanks! > > steffen > > > Chris Fields schrieb: > >> On Nov 10, 2009, at 6:55 AM, Steffen Heyne wrote: >> >> >>> Hi, >>> >>> I'm using Bioperl for my research and it is very useful! Thank you! >>> >>> Currently I have a problem with locations tags of sequences. I read in >>> seed alignments of Rfam (in stockholm format, but I think it is >>> similar to other formats). >>> >>> If the location is like: >>> >>> AB194432.1/908-846 >>> >>> the start/end values are changed to >>> >>> $seq->start = 846 >>> $seq->end = 908 >>> >>> and therefore the new location (e.g.$seq->get_nse) is: >>> >>> AB194432.1/846-908 >>> >>> The $seq->strand tag is correctly set to -1 in this case, but if the >>> alignment is written out again (clustal, stockholm,...) this strand >>> info is lost and the sequences have this "wrong" location. But this >>> information is important in respect to the sequence accession number. >>> >>> Is there a way to set the location back to the original one or is this >>> behavior desired? Any manually setting with $seq->start($val) failed >>> due to automatic checking. >>> >>> I'm using bioperl 1.6.1 >>> >>> Thanks! >>> >>> steffen >>> >> This is a definite bug. We recently discussed amending the NSE format >> due to this (the subject came up over the last few months or so); it's >> fallen through the cracks. Fortunaely it is very easy to fix (the >> relevant method is in LocatableSeq). >> >> Does anyone have a problem with me adding this in? It will change >> output for only those instances where the strand is -1, so >> >> AB194432.1/908-846 >> >> would be start = 846, end = 908, strand = -1 >> >> AB194432.1/846-908 >> >> would be start = 846, end = 908, strand = 1 >> >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > From cjfields at illinois.edu Fri Dec 4 04:16:48 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 3 Dec 2009 22:16:48 -0600 Subject: [Bioperl-l] Merging separate sequence and quality files to FASTQ ? In-Reply-To: <2c8757af0912030830n718f8cc7hc9e501919435e4a8@mail.gmail.com> References: <2c8757af0912030344o5b940c47p8350084e9152facc@mail.gmail.com> <320fb6e00912030412s73eba41aw9950da8b4529cc0c@mail.gmail.com> <2c8757af0912030507h1b3d12b8p60c8b405792d6579@mail.gmail.com> <747E7628-2A72-4064-A7B9-65A904ACDFE7@illinois.edu> <2c8757af0912030830n718f8cc7hc9e501919435e4a8@mail.gmail.com> Message-ID: <37058F8C-419E-4E88-AC4F-543FF9B563E1@illinois.edu> On Dec 3, 2009, at 10:30 AM, Dan Bolser wrote: > 2009/12/3 Chris Fields : >> Dan, >> >> On Dec 3, 2009, at 7:07 AM, Dan Bolser wrote: > > ... > >>> I'm wondering, because with the FlaggedRevs extension you could >>> basically build a whole release in the wiki. Which would be fun if >>> nothing else! >> >> I'm not following you there. Could you elaborate on why that would be beneficial? I could see ( > > I never said it would be beneficial, only that it would be fun. > > http://www.mediawiki.org/wiki/Flaggedrevs Ah, okay, that makes some sense. Just to stay on subject, committed a fix (r16439) to bioperl-live that addresses the additional newline issue. chris From rtbio.2009 at gmail.com Fri Dec 4 13:57:21 2009 From: rtbio.2009 at gmail.com (Roopa Raghuveer) Date: Fri, 4 Dec 2009 14:57:21 +0100 Subject: [Bioperl-l] Regarding Organism based search in Remote blast Message-ID: Hello all, I am working on Remote blast.Here,I am trying to get 2 parameters into the remote blast code.They are 1.The input sequence that has to be sent to blast 2.Organism (The organism which has to be searched for ex:-Trypanasoma brucei etc.,) When I tried to take the organism parameter as an input from the user,through a web page,the Remote blast was not giving any results i.e., it says that there are no alignments found. But,when I hard coded the organism in the code,it gives me the results i.e., 3hits. I could not understand this problem.Could any body please help me in this regard? My code is sub blastcode { $input1= $_[0]; $organ= $_[1]; open(NUC,'>',$nuc); print NUC $input1; close(NUC); my $prog = 'blastn'; my $db = 'refseq_rna'; my $e_val= '1e-10'; my $organism= $organ; $gb = new Bio::DB::GenBank; my @params = ( '-prog' => $prog, '-data' => $db, '-expect' => $e_val, '-readmethod' => 'SearchIO', '-Organism' => $organism ); open(OUTFILE,'>',$debugfile); print OUTFILE @params; close(OUTFILE); my $factory = Bio::Tools::Run::RemoteBlast->new(@params); #change a paramter $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$organism[ORGN]'; #change a paramter # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]'; my $v = 1; #$v is just to turn on and off the messages my $str = Bio::SeqIO->new(-file => $nuc , '-format' => 'fasta' , '-Organism' => $organism ); while (my $input = $str->next_seq()) { #Blast a sequence against a database: #Alternatively, you could pass in a file with many #sequences rather than loop through sequence one at a time #Remove the loop starting 'while (my $input = $str->next_seq())' #and swap the two lines below for an example of that. my $r = $factory->submit_blast($input); # my $r = $factory->submit_blast('amino.fa'); print STDERR "waiting...." if($v>0); while ( my @rids = $factory->each_rid ) { foreach my $rid ( @rids ) { my $rc = $factory->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $factory->remove_rid($rid); } print STDERR "." if ( $v > 0 ); sleep 5; } else { my $result = $rc->next_result(); #save the output $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; # open(BLASTDEBUGFILE,'>',$debugfile); # print BLASTDEBUGFILE $result->next_hit(); # close(BLASTDEBUGFILE); my $filename = $serverpath."/blastdata_".time().$result->query_name()."\.out"; # open(DEBUGFILE,'>',$debugfile); # open(new,'>',$filename); # @arra=; # print DEBUGFILE @arra; # close(DEBUGFILE); # close(new); $factory->save_output($filename); # open(BLASTDEBUGFILE,'>',$debugfile); # print BLASTDEBUGFILE "Hello $rid"; # close(BLASTDEBUGFILE); $factory->remove_rid($rid); open(BLASTDEBUGFILE,'>',$blastdebugfile); print BLASTDEBUGFILE $organism; close(BLASTDEBUGFILE); # open(OUTFILE,'>',$outfile); # print OUTFILE "Test2 $result->database_name()"; # close(OUTFILE); #$hit = $result->next_hit; #open(new,'>',$debugfile); #print $hit; #close(new); while ( my $hit = $result->next_hit ) { next unless ( $v > 0); # open(OUTFILE,'>',$debugfile); # print OUTFILE "$hit in while hits"; # close(OUTFILE); my $sequ = $gb->get_Seq_by_version($hit->name); my $dna = $sequ->seq(); # get the sequence as a string push(@seqs,$dna); } } } } } #open(OUTFILE,'>',$debugfile); #print OUTFILE $seqs[0]; #close(OUTFILE); return(@seqs); } Regards, Roopa. From cjfields at illinois.edu Fri Dec 4 14:59:17 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 4 Dec 2009 08:59:17 -0600 Subject: [Bioperl-l] Regarding Organism based search in Remote blast In-Reply-To: References: Message-ID: <77EDAB6B-68B5-460C-AD9F-EB45B9C3AFF7@illinois.edu> Roopa, At one point a couple of parameters differed between NCBI's web interface and our RemoteBlast-based BLAST interface to URLAPI (this should be indicated in your BLAST reports). See here: http://thread.gmane.org/gmane.comp.lang.perl.bio.general/14155 Also, are the returned hits specific for the genome? You shoudl double-check; in some cases you have to set both HEADER and RETRIEVALHEADER to get the expected results (not sure why): http://article.gmane.org/gmane.comp.lang.perl.bio.general/18737/match=remoteblast+ncbi chris On Dec 4, 2009, at 7:57 AM, Roopa Raghuveer wrote: > Hello all, > > I am working on Remote blast.Here,I am trying to get 2 parameters into the > remote blast code.They are > > 1.The input sequence that has to be sent to blast > > 2.Organism (The organism which has to be searched for ex:-Trypanasoma brucei > etc.,) > > When I tried to take the organism parameter as an input from the > user,through a web page,the Remote blast was not giving any results i.e., it > says that there are no alignments found. > > But,when I hard coded the organism in the code,it gives me the results i.e., > 3hits. > > I could not understand this problem.Could any body please help me in this > regard? > > My code is > > sub blastcode > { > > $input1= $_[0]; > > $organ= $_[1]; > > open(NUC,'>',$nuc); > print NUC $input1; > close(NUC); > > my $prog = 'blastn'; > my $db = 'refseq_rna'; > my $e_val= '1e-10'; > my $organism= $organ; > > $gb = new Bio::DB::GenBank; > > my @params = ( '-prog' => $prog, > '-data' => $db, > '-expect' => $e_val, > '-readmethod' => 'SearchIO', > '-Organism' => $organism ); > > open(OUTFILE,'>',$debugfile); > print OUTFILE @params; > close(OUTFILE); > > > my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > > #change a paramter > $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$organism[ORGN]'; > #change a paramter > # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]'; > > my $v = 1; > #$v is just to turn on and off the messages > > my $str = Bio::SeqIO->new(-file => $nuc , '-format' => 'fasta' , > '-Organism' => $organism ); > > while (my $input = $str->next_seq()) > > { > #Blast a sequence against a database: > #Alternatively, you could pass in a file with many > #sequences rather than loop through sequence one at a time > #Remove the loop starting 'while (my $input = $str->next_seq())' > #and swap the two lines below for an example of that. > > my $r = $factory->submit_blast($input); > > # my $r = $factory->submit_blast('amino.fa'); > > print STDERR "waiting...." if($v>0); > > while ( my @rids = $factory->each_rid ) { > > foreach my $rid ( @rids ) { > > my $rc = $factory->retrieve_blast($rid); > > if( !ref($rc) ) > { > if( $rc < 0 ) > { > $factory->remove_rid($rid); > } > print STDERR "." if ( $v > 0 ); > sleep 5; > } > else { > my $result = $rc->next_result(); > #save the output > $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; > > # open(BLASTDEBUGFILE,'>',$debugfile); > # print BLASTDEBUGFILE $result->next_hit(); > # close(BLASTDEBUGFILE); > > my $filename = > $serverpath."/blastdata_".time().$result->query_name()."\.out"; > > # open(DEBUGFILE,'>',$debugfile); > # open(new,'>',$filename); > # @arra=; > # print DEBUGFILE @arra; > # close(DEBUGFILE); > # close(new); > $factory->save_output($filename); > > # open(BLASTDEBUGFILE,'>',$debugfile); > # print BLASTDEBUGFILE "Hello $rid"; > # close(BLASTDEBUGFILE); > > $factory->remove_rid($rid); > > open(BLASTDEBUGFILE,'>',$blastdebugfile); > print BLASTDEBUGFILE $organism; > close(BLASTDEBUGFILE); > > # open(OUTFILE,'>',$outfile); > # print OUTFILE "Test2 $result->database_name()"; > # close(OUTFILE); > > #$hit = $result->next_hit; > #open(new,'>',$debugfile); > #print $hit; > #close(new); > > while ( my $hit = $result->next_hit ) { > > next unless ( $v > 0); > > # open(OUTFILE,'>',$debugfile); > # print OUTFILE "$hit in while hits"; > # close(OUTFILE); > > my $sequ = $gb->get_Seq_by_version($hit->name); > my $dna = $sequ->seq(); # get the sequence as a string > push(@seqs,$dna); > } > } > } > } > } > > #open(OUTFILE,'>',$debugfile); > #print OUTFILE $seqs[0]; > #close(OUTFILE); > > return(@seqs); > } > > Regards, > Roopa. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From robert.bradbury at gmail.com Fri Dec 4 18:27:38 2009 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Fri, 4 Dec 2009 13:27:38 -0500 Subject: [Bioperl-l] Gene critical region analysis -- visual display Message-ID: Background: I have been involved in aging research off and on for ~16 years. My initial focus was in the eventual decline of the "program" (because DNA has no ECC and only limited redundancy) therefore my initial work (in the early 1990's was focused on DNA repair genes (of which there about 150 in the human genome) [1,2]. Most recently I have focused in on the DNA double strand break repair processes (NHEJ) as a fundamental cause of aging because it may fundamentally corrupt the genomes of individual cells. (And as most programmers would agree -- break the code and you break the program). Michael Lieber at UCLA has estimated that by the time a human is ~70 on the order of several hundred genes in ones cells have been corrupted (which may be an indeterminate effect on the cells functioning). Problem: Just looking at the GenBank output for the human Artemis (DCLRE1C) gene there are on the order of 18 SNPs and 8 possible phosphorylation sites (not to mention other potential modification sites) -- this combined with the fact that Methionine and Tryptophan and to a lesser extent Cysteine are more susceptible to single base mutations (due the alteration of the codon->amino acid coding even involving single base mutations/repairs) . There are various programs to analyze such proteins for the critical sites -- SIFT and the various programs pointed to by their sites. Now it seems to me that one could attack this problem by integrating SNPs, mutations, etc. at the critical sites (where "critical" may or may not be at normal SNPs -- which presumably are primarily at non-critical sites -- and those proteins where if you change the coding sequence to non-synomonous amino acids you potentially break the protein (the real interpretation of which will not be understood until population studies are done). So, in the process of looking at the DCLRE1C protein I asked myself, "Why is there not a BioPerl function which simply enables a visual interpretation of the critical sites of the protein?" I.e. some color-coded representation of the protein (which presumably has some augmented functionality to determine things like probability or statistical information). I.e. hand the function a .fasta file and it will give you an visual (colored) analysis of the critical nature of specific a.a. -- i.e. something which could be used by genomic or SNP analysis (such as I presume that being done by 23andme -- as well as other organizations) to begin to separate out the variations in the human genome (e.g. SNPs) from the mutations which may effect individuals. I have the C programming and to a lesser extent Perl experience to contribute to this -- I lack the BioPerl wisdom to make it generally available. If anyone has some suggestions as to what functions/modules might be of use (in providing a "single-look" view of gene a.a. whose mutations may be more or less detrimental) I would appreciate hearing from them. Robert Bradbury 1. "DNA Repair and Mutagenesis", E.C. Friedberg et al, 2nd Ed., ASM Press (2006) 2. "Aging of the Genome", J. Vijg, Oxford University Press (2007) From maj at fortinbras.us Sun Dec 6 22:54:00 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sun, 6 Dec 2009 17:54:00 -0500 Subject: [Bioperl-l] bioperl-mode new feature: base class browsing Message-ID: <59494F4102D84535B3A5D05B595ACBF7@NewLife> Hi All, You can now browse pod of the base/parent classes of bioperl modules with one keystroke using the latest update of bioperl-mode. See http://bioperl.org/wiki/Emacs_bioperl-mode Press "B" or "P" while in pod view to get a completion list of the parent classes for the module whose pod you're viewing. cheers, MAJ From mmokrejs at ribosome.natur.cuni.cz Mon Dec 7 20:33:48 2009 From: mmokrejs at ribosome.natur.cuni.cz (=?UTF-8?B?TWFydGluIE1PS1JFSsWg?=) Date: Mon, 07 Dec 2009 21:33:48 +0100 Subject: [Bioperl-l] Generalized reciprocal blast In-Reply-To: References: Message-ID: <4B1D66AC.4080804@ribosome.natur.cuni.cz> Hi, I just stumbled across this older posting ... maybe you want to exploit SIMAP (http://webclu.bio.wzw.tum.de/portal/web/simap/). I think it has remote API available. Martin Robert Bradbury wrote: > I would like to know whether or not anyone has attempted to create a > "generalized" reciprocal blast component for BioPerl? > > One sees papers all the time where they discuss running reciprocal blasts to > compare a new species to an old "standard" species or a set of species or > running an all-to-all set of comparisons to match up all of the "known" > proteins from species and determine which are outliers (and therefore > "novel"). There are also accumulating merged sets in NCBI HomoloGene (which > seems to be a some strict subset (perhaps a dozen) "well sequenced" genomes) > and Ensembl (which seems to be working with a much larger set of 40-50 > genomes some of which may be somewhat incomplete and are certainly poorly > "explored". > > I have, I believe, seen code "fragments" from various authors, perhaps some > on the BioPerl list, which perform some major subset of a typical > "reciprocal blast". > > Now what I am looking for is a relatively generalizable some-to-some > reciprocal blast utility. I want to be able to specify the genes (or gene > family), e.g. some of the ~150 known DNA repair genes. It would be helpful > to also specify how "tolerant" the blast "true reciprocal" criteria are. > There are some genes where there is a very strict 1-to-1 relationship across > many genomes. But for genes which involve relatively standard domains, e.g. > "helicase" domains, the 1-to-1 relationship becomes cloudy -- in mammals for > example its more like 5-to-5 and it would be really nice to be able to > specify the strictness or quality level [1] for "matching" genes (and even > which genes are to be excluded because they are known to be false > homologues). > > Then to top this off I want to be able to combine known public e.g. > (HomoloGene / Uniigene / Ensembl) databases with perhaps local private > databases or database subsets (e.g. emerging or specialized genomes). > > The goal here of course to determine the precise phylogenetic relationships > between all of the DNA repair genes and how there may be gain / loss / > evolution of function that can be related to species characteristics (size, > longevity, etc.). > > Is there a generalized reciprocal blast component in BioPerl? Or is it a > "build-it-yourself" situation (that I have to believe has been built > probably a few dozen times by various researchers / organizations / > companies)? > > Thanks, > Robert Bradbury > > 1. This would be handled in BioPerl with a customizable user function which > could be tailored to handle specific cases -- for example a function which > when handed a set of 100 potential "matches" could go through those 100 > matches, identify common domains, and then "re-rate" matches based on > considerations such as the type and number of common domains, domains being > in the same order, etc. I.e. criteria which may be difficult to completely > generalize across entire genomes but are fairly obvious if you are looking > at a graphical replication of a gene set in HomoloGene. From robert.bradbury at gmail.com Mon Dec 7 20:41:54 2009 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Mon, 7 Dec 2009 15:41:54 -0500 Subject: [Bioperl-l] Remote blast fork errors / Process limit restrictions Message-ID: This comment could also have a subject line: "Why does Bioperl/get_sequence> fork at all! Why are not all operations sequential? And if this is a "default" mode that I'm unaware of -- How to I ever write a reliable BioPerl script if I have little or no capability of what the program uses when it runs? I may have days so I can bear the burden of relatively slow results (and so can use sequential processing rather than parallel). I've got a perl script that uses remote blast to blast a sequence against a subset of the NCBI sequences. It "mostly" works, in that it returns a seemingly complete .bls result file but when attempting to look at the sequences (so it can more accurately summarize the information from the results than a standard blast report allows) it terminates prematurely with errors. The error is: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Couldn't fork: Resource temporarily unavailable STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/vendor_perl/5.8.8/Bio/Root/Root.pm:368 STACK: Bio::DB::WebDBSeqI::_open_pipe /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:722 STACK: Bio::DB::WebDBSeqI::get_seq_stream /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:463 STACK: Bio::DB::NCBIHelper::get_Stream_by_acc /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/NCBIHelper.pm:479 STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:186 STACK: Bio::Perl::get_sequence /usr/lib/perl5/vendor_perl/5.8.8/Bio/Perl.pm:520 STACK: main::acc_2_desc /home/bradbury/Genomes/bin/RB.pl:182 STACK: /home/bradbury/Genomes/bin/RB.pl:155 ----------------------------------------------------------- The precise line (in my code) whcih appears to be generating the error is: $seq = get_sequence('GenBank', $accsn); Now this can be a problem if NCBI/Genbank fails due to load conditions -- but this specific failure (which is repeatable is due to most likely hitting the user process limit restrictions) -- but the small blast results work fine -- its only if the Blast has returned several hundred hits that it runs into this problem. Now what it sounds like to me is an attempt to do multiple asynchronous NCBI queries (to get a sequence) with complete disregard of the environment (process limits, NCBI limits, etc.). But I do not know enough about how this works to point a finger at some specific function. As a result get_sequence process results are accumulated, summarized, etc. without ever having issued to respect "wait-variant()) calls to collect former children [This IMO would clearly be a bug.] It could be adjusted to by allowing the BioPerl library to run in 3 modes. (1) completely synchronous -- if you fork you wait until its done -- and you collect "it" and any fork fails then one either collects the process or switches to the non-conservative mode. Robert From cjfields at illinois.edu Mon Dec 7 21:08:40 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 7 Dec 2009 15:08:40 -0600 Subject: [Bioperl-l] Remote blast fork errors / Process limit restrictions In-Reply-To: References: Message-ID: Robert, If you use the relative components directly (by that I mean use Bio::DB::GenBank and Bio::Tools::Run::RemoteBlast instead of Bio::Perl), you can control whether the process forks or not. All Bio::Perl does is wrap those modules for simple beginner tasks; if you want full control over the various parts of the pipeline you will need to use those tools directly. See the POD for those specific modules for more information. chris On Dec 7, 2009, at 2:41 PM, Robert Bradbury wrote: > This comment could also have a subject line: "Why does Bioperl/get_sequence> > fork at all! Why are not all operations sequential? And if this is a > "default" mode that I'm unaware of -- How to I ever write a reliable BioPerl > script if I have little or no capability of what the program uses when it > runs? I may have days so I can bear the burden of relatively slow results > (and so can use sequential processing rather than parallel). > > I've got a perl script that uses remote blast to blast a sequence against a > subset of the NCBI sequences. It "mostly" works, in that it returns a > seemingly complete .bls result file but when attempting to look at the > sequences (so it can more accurately summarize the information from the > results than a standard blast report allows) it terminates prematurely with > errors. > > The error is: > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Couldn't fork: Resource temporarily unavailable > STACK: Error::throw > STACK: Bio::Root::Root::throw > /usr/lib/perl5/vendor_perl/5.8.8/Bio/Root/Root.pm:368 > STACK: Bio::DB::WebDBSeqI::_open_pipe > /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:722 > STACK: Bio::DB::WebDBSeqI::get_seq_stream > /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:463 > STACK: Bio::DB::NCBIHelper::get_Stream_by_acc > /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/NCBIHelper.pm:479 > STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc > /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:186 > STACK: Bio::Perl::get_sequence > /usr/lib/perl5/vendor_perl/5.8.8/Bio/Perl.pm:520 > STACK: main::acc_2_desc /home/bradbury/Genomes/bin/RB.pl:182 > STACK: /home/bradbury/Genomes/bin/RB.pl:155 > ----------------------------------------------------------- > > The precise line (in my code) whcih appears to be generating the error is: > $seq = get_sequence('GenBank', $accsn); > > Now this can be a problem if NCBI/Genbank fails due to load conditions -- > but this specific failure (which is repeatable is due to most likely hitting > the user process limit restrictions) -- but the small blast results work > fine -- its only if the Blast has returned several hundred hits that it runs > into this problem. > > Now what it sounds like to me is an attempt to do multiple asynchronous NCBI > queries (to get a sequence) with complete disregard of the environment > (process limits, NCBI limits, etc.). But I do not know enough about how > this works to point a finger at some specific function. As a result > get_sequence process results are accumulated, summarized, etc. without ever > having issued to respect "wait-variant()) calls to collect former children > [This IMO would clearly be a bug.] > > It could be adjusted to by allowing the BioPerl library to run in 3 modes. > (1) completely synchronous -- if you fork you wait until its done -- and > you collect "it" and any fork fails then one either collects the process or > switches to the non-conservative mode. > > Robert > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason at bioperl.org Mon Dec 7 21:24:54 2009 From: jason at bioperl.org (Jason Stajich) Date: Mon, 7 Dec 2009 13:24:54 -0800 Subject: [Bioperl-l] Remote blast fork errors / Process limit restrictions In-Reply-To: References: Message-ID: <39420695-FC14-446F-A4C2-C187E2221E44@bioperl.org> Robert - You seem to be mixing the blast remote and the sequence query retrieval problems. These messages are related to the remote retrieval of sequences. It is hard to tell from your message specifically which modules you are using or how you are querying NCBI - there are several ways to do this either with the NCBI tools or the Bio::DB::GenBank. If you are using Bio::DB::Query::GenBank that allows for async access and has built in controls to adhere to the wait variant that NCBI requests but I don't think Bio::DB::GenBank get_Seq_by_acc method does any sort of thing (at least when it was originally written). I always advocate if you want highly available and reliable access to sequences you should download the nr or whichever DB and use the local indexing tools for the retrieval. Once you start doing hundreds of queries I don't see any good reason to be doing the query against NCBI directly given unreliabilities of the web and services. Local databases are faster and more reliable for most people so I urge you take advantage of the tools which provide local database access with the same APIs. I would like to comment that the tone of your posts to the list are not particularly helpful. I wonder if you are actually asking for help or just interested in complaining about when things don't work as you expect? This is a collaborative and volunteer-only project, with the principles of working together to make useful toolkit. We encourage you to build programs and applications from this base that suit your needs, but not all things will be directly implemented in the toolkit if they aren't generic enough (at least that is my feeling, the other Core devs help with these decisions). If there is a useful, generic, and reusable part we would like that to be part of the API. Otherwise we suggest the new application that fits a developer's vision. We encourage you to write (and publish) that application separately, but certainly encourage bug (and fixes) submissions and also code contributions for new features where they can be seen as generally useful. -jason On Dec 7, 2009, at 12:41 PM, Robert Bradbury wrote: > This comment could also have a subject line: "Why does Bioperl/ > get_sequence> > fork at all! Why are not all operations sequential? And if this is a > "default" mode that I'm unaware of -- How to I ever write a reliable > BioPerl > script if I have little or no capability of what the program uses > when it > runs? I may have days so I can bear the burden of relatively slow > results > (and so can use sequential processing rather than parallel). > > I've got a perl script that uses remote blast to blast a sequence > against a > subset of the NCBI sequences. It "mostly" works, in that it returns a > seemingly complete .bls result file but when attempting to look at the > sequences (so it can more accurately summarize the information from > the > results than a standard blast report allows) it terminates > prematurely with > errors. > > The error is: > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Couldn't fork: Resource temporarily unavailable > STACK: Error::throw > STACK: Bio::Root::Root::throw > /usr/lib/perl5/vendor_perl/5.8.8/Bio/Root/Root.pm:368 > STACK: Bio::DB::WebDBSeqI::_open_pipe > /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:722 > STACK: Bio::DB::WebDBSeqI::get_seq_stream > /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:463 > STACK: Bio::DB::NCBIHelper::get_Stream_by_acc > /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/NCBIHelper.pm:479 > STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc > /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:186 > STACK: Bio::Perl::get_sequence > /usr/lib/perl5/vendor_perl/5.8.8/Bio/Perl.pm:520 > STACK: main::acc_2_desc /home/bradbury/Genomes/bin/RB.pl:182 > STACK: /home/bradbury/Genomes/bin/RB.pl:155 > ----------------------------------------------------------- > > The precise line (in my code) whcih appears to be generating the > error is: > $seq = get_sequence('GenBank', $accsn); > > Now this can be a problem if NCBI/Genbank fails due to load > conditions -- > but this specific failure (which is repeatable is due to most likely > hitting > the user process limit restrictions) -- but the small blast results > work > fine -- its only if the Blast has returned several hundred hits that > it runs > into this problem. > > Now what it sounds like to me is an attempt to do multiple > asynchronous NCBI > queries (to get a sequence) with complete disregard of the environment > (process limits, NCBI limits, etc.). But I do not know enough about > how > this works to point a finger at some specific function. As a result > get_sequence process results are accumulated, summarized, etc. > without ever > having issued to respect "wait-variant()) calls to collect former > children > [This IMO would clearly be a bug.] > > It could be adjusted to by allowing the BioPerl library to run in 3 > modes. > (1) completely synchronous -- if you fork you wait until its done -- > and > you collect "it" and any fork fails then one either collects the > process or > switches to the non-conservative mode. > > Robert > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From Jonas_Schaer at gmx.de Tue Dec 8 15:21:58 2009 From: Jonas_Schaer at gmx.de (Jonas Schaer) Date: Tue, 8 Dec 2009 16:21:58 +0100 Subject: [Bioperl-l] fasta format Message-ID: <36E9C2F3282347918FD3B3ACA0EC8126@jonas> Hi there, I have a little question concerning bioperl. I have BioPerl-1.6.1.tar.gz installed and i use the fasta.pm module to read in some fasta files. first it worked fine, but now i have some fastafiles in slightly different format (not all lines have the same length!). ------------- EXCEPTION ------------- MSG: Each line of the fasta entry must be the same length except the last. Line above #49 ' ..' is 28 != 101 chars. STACK Bio::DB::Fasta::calculate_offsets C:/Perl/site/lib/Bio/DB/Fasta.pm:771 STACK Bio::DB::Fasta::index_file C:/Perl/site/lib/Bio/DB/Fasta.pm:681 STACK Bio::DB::Fasta::new C:/Perl/site/lib/Bio/DB/Fasta.pm:491 STACK Bio::DB::Fasta::newFh C:/Perl/site/lib/Bio/DB/Fasta.pm:513 STACK main::readfasta blast_eval.pm:174 STACK toplevel blast_eval.pm:83 ------------------------------------- indexing was interrupted, so unlinking test.fasta.index at C:/Perl/site/lib/Bio/ DB/Fasta.pm line 1054. Is there any way to use these fasta files with diffrent length of lines with this fasta.pm module or will i have to change the format of my fasta-files(big databases...) ? Thanks in advance for any help! Regards, Jonas From awitney at sgul.ac.uk Tue Dec 8 17:01:58 2009 From: awitney at sgul.ac.uk (Adam Witney) Date: Tue, 8 Dec 2009 17:01:58 +0000 Subject: [Bioperl-l] package to associate genes with branches on trees? Message-ID: Hi, I have been generating some trees with Phylip (pars) and then processing them with Bioperl. These trees are generated by comparing multiple strains of a bacterial organism by presence/absence (0/1) calls for each gene. I was wondering of there was any package in Bioperl to try to determine if any specific genes were associated with specific branches of the trees? Or if anyone knew of another tool that can do this? thanks for any help adam From jason at bioperl.org Tue Dec 8 17:44:43 2009 From: jason at bioperl.org (Jason Stajich) Date: Tue, 8 Dec 2009 09:44:43 -0800 Subject: [Bioperl-l] fasta format In-Reply-To: <36E9C2F3282347918FD3B3ACA0EC8126@jonas> References: <36E9C2F3282347918FD3B3ACA0EC8126@jonas> Message-ID: you can run sreformat (HMMER) or bp_sreformat.pl script in scripts/utilties (or that is installed when you install the Bioperl scripts) $ bp_sreformat.pl -if fasta -of fasta -i yourfile.fa -o yournewfile.fa # rename it back $ mv yournewfile.fa yourfile.fa or $ sreformat fasta yourfile.fa > yournewfile.fa $ mv yournewfile.fa yourfile.fa -jason On Dec 8, 2009, at 7:21 AM, Jonas Schaer wrote: > Hi there, > I have a little question concerning bioperl. I have > BioPerl-1.6.1.tar.gz installed and i use the fasta.pm module to read > in some fasta files. first it worked fine, but now i have some > fastafiles in slightly different format (not all lines have the same > length!). > > ------------- EXCEPTION ------------- > MSG: Each line of the fasta entry must be the same length except the > last. > Line above #49 ' > ..' is 28 != 101 chars. > STACK Bio::DB::Fasta::calculate_offsets C:/Perl/site/lib/Bio/DB/ > Fasta.pm:771 > STACK Bio::DB::Fasta::index_file C:/Perl/site/lib/Bio/DB/Fasta.pm:681 > STACK Bio::DB::Fasta::new C:/Perl/site/lib/Bio/DB/Fasta.pm:491 > STACK Bio::DB::Fasta::newFh C:/Perl/site/lib/Bio/DB/Fasta.pm:513 > STACK main::readfasta blast_eval.pm:174 > STACK toplevel blast_eval.pm:83 > ------------------------------------- > > indexing was interrupted, so unlinking test.fasta.index at C:/Perl/ > site/lib/Bio/ > DB/Fasta.pm line 1054. > > > Is there any way to use these fasta files with diffrent length of > lines with this fasta.pm module or will i have to change the format > of my fasta-files(big databases...) ? > > Thanks in advance for any help! > > Regards, Jonas > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From cjfields at illinois.edu Wed Dec 9 04:30:26 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 8 Dec 2009 22:30:26 -0600 Subject: [Bioperl-l] [ANNOUNCEMENT] BioPerl Meeting at the GMOD Conference Message-ID: <1BC089CD-75C3-437E-86A5-22220D724DF6@illinois.edu> All, For those interested, we will be holding a general BioPerl meeting, tentatively scheduled for January 13, 2010, just prior to the GMOD Community Meeting from Jan 14-15 in San Diego. This will be just following the Plant and Animal Genome (PAG) conference Jan 9-13. The exact day and time is somewhat flexible depending on attendees' schedules. For those interested, sign up here: http://www.bioperl.org/wiki/GMOD_2010_Meeting For those interested in attending the GMOD meeting or PAG: http://gmod.org/wiki/January_2010_GMOD_Meeting I can envision the following items popping up: * Refactoring of Alignment and GFF3/FeatureIO * Addressing BioPerl's monolithic nature * Moose and Perl 6 * Documentation Any others? chris From akarger at CGR.Harvard.edu Wed Dec 9 15:01:45 2009 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Wed, 9 Dec 2009 10:01:45 -0500 Subject: [Bioperl-l] fasta format In-Reply-To: <36E9C2F3282347918FD3B3ACA0EC8126@jonas> References: <36E9C2F3282347918FD3B3ACA0EC8126@jonas> Message-ID: <1B12003244CE894E85B4726023637888055929@FASXCH01.fasmail.priv> > Is there any way to use these fasta files with diffrent length of > lines with this fasta.pm module or will i have to change the format > of my fasta-files(big databases...) ? > Jonas, It's not Bioperl, but for a quick fix you can use the Scriptome. Use the change_fasta_to_tab script (http://sysbio.harvard.edu/csb/resources/computational/scriptome/Windows/Tools/Change.html#change_a_fasta_file_into_tabular_format__change_fasta_to_tab_) to change your FASTA into a tab-delimited file. Then use the next tool (change_tab_to_fasta) to change your files back. To use a tool: change the input and output file names on the website, then cut and paste the Perl script from the green box into a CMD window. The script works one sequence at a time, so it doesn't need a lot of memory. (As long as you have enough disk space to store the tab-delimited copy). The recreated FASTAs will be 60 characters per line (although you can hand-edit the line after you paste it to be whatever number of characters you'd like). Let me know if you have a problem. -Amir Karger Life Sciences Research Computing, FAS IT Harvard University From Kevin.M.Brown at asu.edu Wed Dec 9 15:26:22 2009 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Wed, 9 Dec 2009 08:26:22 -0700 Subject: [Bioperl-l] fasta format In-Reply-To: <1B12003244CE894E85B4726023637888055929@FASXCH01.fasmail.priv> References: <36E9C2F3282347918FD3B3ACA0EC8126@jonas> <1B12003244CE894E85B4726023637888055929@FASXCH01.fasmail.priv> Message-ID: <1A4207F8295607498283FE9E93B775B4066B4D53@EX02.asurite.ad.asu.edu> Even easier to accomplish in one step. Read in the fasta file and output it right to another fasta file with SeqIO my $in = Bio::SeqIO->new(-format=>'fasta',-file=>$file); my $out = Bio::SeqIO->new(-format=>'fasta',-file=>'>file.fasta'); while (my $seq = $in->next){$out->write_seq($seq);} Kevin Brown Center for Innovations in Medicine Biodesign Institute Arizona State University > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Amir Karger > Sent: Wednesday, December 09, 2009 8:02 AM > To: Jonas Schaer; bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] fasta format > > > Is there any way to use these fasta files with diffrent length of > > lines with this fasta.pm module or will i have to change the format > > of my fasta-files(big databases...) ? > > > > Jonas, > > It's not Bioperl, but for a quick fix you can use the > Scriptome. Use the change_fasta_to_tab script > (http://sysbio.harvard.edu/csb/resources/computational/scripto > me/Windows/Tools/Change.html#change_a_fasta_file_into_tabular_ > format__change_fasta_to_tab_) to change your FASTA into a > tab-delimited file. Then use the next tool > (change_tab_to_fasta) to change your files back. > > To use a tool: change the input and output file names on the > website, then cut and paste the Perl script from the green > box into a CMD window. The script works one sequence at a > time, so it doesn't need a lot of memory. (As long as you > have enough disk space to store the tab-delimited copy). > > The recreated FASTAs will be 60 characters per line (although > you can hand-edit the line after you paste it to be whatever > number of characters you'd like). > > Let me know if you have a problem. > > -Amir Karger > Life Sciences Research Computing, FAS IT > Harvard University > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From Russell.Smithies at agresearch.co.nz Wed Dec 9 19:44:41 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 10 Dec 2009 08:44:41 +1300 Subject: [Bioperl-l] fasta format In-Reply-To: <1A4207F8295607498283FE9E93B775B4066B4D53@EX02.asurite.ad.asu.edu> References: <36E9C2F3282347918FD3B3ACA0EC8126@jonas> <1B12003244CE894E85B4726023637888055929@FASXCH01.fasmail.priv> <1A4207F8295607498283FE9E93B775B4066B4D53@EX02.asurite.ad.asu.edu> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32B6603815F@exchsth.agresearch.co.nz> It's even easier as the script is already written for you :-) bp_seqconvert.pl --from fasta --to fasta < file.in.fa > file.out.fa --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Kevin Brown > Sent: Thursday, 10 December 2009 4:26 a.m. > To: bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] fasta format > > Even easier to accomplish in one step. Read in the fasta file and output > it right to another fasta file with SeqIO > > my $in = Bio::SeqIO->new(-format=>'fasta',-file=>$file); > my $out = Bio::SeqIO->new(-format=>'fasta',-file=>'>file.fasta'); > while (my $seq = $in->next){$out->write_seq($seq);} > > Kevin Brown > Center for Innovations in Medicine > Biodesign Institute > Arizona State University > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org > > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Amir Karger > > Sent: Wednesday, December 09, 2009 8:02 AM > > To: Jonas Schaer; bioperl-l at bioperl.org > > Subject: Re: [Bioperl-l] fasta format > > > > > Is there any way to use these fasta files with diffrent length of > > > lines with this fasta.pm module or will i have to change the format > > > of my fasta-files(big databases...) ? > > > > > > > Jonas, > > > > It's not Bioperl, but for a quick fix you can use the > > Scriptome. Use the change_fasta_to_tab script > > (http://sysbio.harvard.edu/csb/resources/computational/scripto > > me/Windows/Tools/Change.html#change_a_fasta_file_into_tabular_ > > format__change_fasta_to_tab_) to change your FASTA into a > > tab-delimited file. Then use the next tool > > (change_tab_to_fasta) to change your files back. > > > > To use a tool: change the input and output file names on the > > website, then cut and paste the Perl script from the green > > box into a CMD window. The script works one sequence at a > > time, so it doesn't need a lot of memory. (As long as you > > have enough disk space to store the tab-delimited copy). > > > > The recreated FASTAs will be 60 characters per line (although > > you can hand-edit the line after you paste it to be whatever > > number of characters you'd like). > > > > Let me know if you have a problem. > > > > -Amir Karger > > Life Sciences Research Computing, FAS IT > > Harvard University > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From maj at fortinbras.us Wed Dec 9 20:18:08 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 9 Dec 2009 15:18:08 -0500 Subject: [Bioperl-l] fasta format In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32B6603815F@exchsth.agresearch.co.nz> References: <36E9C2F3282347918FD3B3ACA0EC8126@jonas><1B12003244CE894E85B4726023637888055929@FASXCH01.fasmail.priv><1A4207F8295607498283FE9E93B775B4066B4D53@EX02.asurite.ad.asu.edu> <18DF7D20DFEC044098A1062202F5FFF32B6603815F@exchsth.agresearch.co.nz> Message-ID: <5C992E6556584BDFBF39604FDEA8ECE0@NewLife> $ perl -MPerlIO::via::SeqIO -e 'open($f, "<:via(SeqIO)", shift); open($g, ">:via(SeqIO::fasta)", shift); while (<$f>) { print $g $_; }' in.fas out.fas ----- Original Message ----- From: "Smithies, Russell" To: "'Kevin Brown'" ; Sent: Wednesday, December 09, 2009 2:44 PM Subject: Re: [Bioperl-l] fasta format > It's even easier as the script is already written for you :-) > > bp_seqconvert.pl --from fasta --to fasta < file.in.fa > file.out.fa > > > --Russell > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Kevin Brown >> Sent: Thursday, 10 December 2009 4:26 a.m. >> To: bioperl-l at bioperl.org >> Subject: Re: [Bioperl-l] fasta format >> >> Even easier to accomplish in one step. Read in the fasta file and output >> it right to another fasta file with SeqIO >> >> my $in = Bio::SeqIO->new(-format=>'fasta',-file=>$file); >> my $out = Bio::SeqIO->new(-format=>'fasta',-file=>'>file.fasta'); >> while (my $seq = $in->next){$out->write_seq($seq);} >> >> Kevin Brown >> Center for Innovations in Medicine >> Biodesign Institute >> Arizona State University >> >> > -----Original Message----- >> > From: bioperl-l-bounces at lists.open-bio.org >> > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Amir Karger >> > Sent: Wednesday, December 09, 2009 8:02 AM >> > To: Jonas Schaer; bioperl-l at bioperl.org >> > Subject: Re: [Bioperl-l] fasta format >> > >> > > Is there any way to use these fasta files with diffrent length of >> > > lines with this fasta.pm module or will i have to change the format >> > > of my fasta-files(big databases...) ? >> > > >> > >> > Jonas, >> > >> > It's not Bioperl, but for a quick fix you can use the >> > Scriptome. Use the change_fasta_to_tab script >> > (http://sysbio.harvard.edu/csb/resources/computational/scripto >> > me/Windows/Tools/Change.html#change_a_fasta_file_into_tabular_ >> > format__change_fasta_to_tab_) to change your FASTA into a >> > tab-delimited file. Then use the next tool >> > (change_tab_to_fasta) to change your files back. >> > >> > To use a tool: change the input and output file names on the >> > website, then cut and paste the Perl script from the green >> > box into a CMD window. The script works one sequence at a >> > time, so it doesn't need a lot of memory. (As long as you >> > have enough disk space to store the tab-delimited copy). >> > >> > The recreated FASTAs will be 60 characters per line (although >> > you can hand-edit the line after you paste it to be whatever >> > number of characters you'd like). >> > >> > Let me know if you have a problem. >> > >> > -Amir Karger >> > Life Sciences Research Computing, FAS IT >> > Harvard University >> > >> > >> > >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From kellert at ohsu.edu Thu Dec 10 00:36:13 2009 From: kellert at ohsu.edu (Tom Keller) Date: Wed, 9 Dec 2009 16:36:13 -0800 Subject: [Bioperl-l] how to map ensembl id to NCBI gi Message-ID: <435849B7-B66E-4553-988B-0645775E785E@ohsu.edu> Greetings, Is there a simple way to map a list of ensembl ids to the NCBI gis? thanks, Tom Thomas (Tom) Keller kellert at ohsu.edu 503.494.2442 6339b R Jones Hall (BSc/CROET) www.ohsu.edu/xd/research/research-cores/dna-analysis/ From cjfields at illinois.edu Thu Dec 10 01:59:37 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 9 Dec 2009 19:59:37 -0600 Subject: [Bioperl-l] how to map ensembl id to NCBI gi In-Reply-To: <435849B7-B66E-4553-988B-0645775E785E@ohsu.edu> References: <435849B7-B66E-4553-988B-0645775E785E@ohsu.edu> Message-ID: <14495B1F-911C-4FE7-8224-A3F050F7E03C@illinois.edu> Tom, Probably best to do this via BioMart: http://www.ensembl.org/biomart/ I would assume you can also do this via the ensembl perl API as well. Also, have a look at the UniProt ID Mapper: http://www.uniprot.org/?tab=mapping chris On Dec 9, 2009, at 6:36 PM, Tom Keller wrote: > Greetings, > Is there a simple way to map a list of ensembl ids to the NCBI gis? > > thanks, > Tom > > Thomas (Tom) Keller > kellert at ohsu.edu > 503.494.2442 > 6339b R Jones Hall (BSc/CROET) > www.ohsu.edu/xd/research/research-cores/dna-analysis/ > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From lovebaby39 at gmail.com Thu Dec 10 14:22:14 2009 From: lovebaby39 at gmail.com (Hsueh) Date: Thu, 10 Dec 2009 22:22:14 +0800 Subject: [Bioperl-l] about bioperl issue Message-ID: <5F281DC3E4514B3AAA8881169B240227@SHAPC> Dear The following is code. -------------------------------------------------------------------------------- my at params_rb = ( 'program' => 'blastn', 'database' => 'DB\\RB_GUS\\RB_GUS'); my $factory_rb = Bio::Tools::Run::StandAloneBlast->new(@params_rb); my $input_rb = Bio::Seq->new(-id =>"test_query", -seq => $testline2); my $blast_report_rb = $factory_rb->blastall($input_rb); while (my $result_rb = $blast_report_rb-> next_result ) { while (my $hit_rb = $result_rb->next_hit()){ while (my $hsp_rb = $hit_rb->next_hsp()){ print $hit_rb->name,"\nevalue = " , $hsp_rb->evalue , "\t score = " , $hsp_rb->score , "\n" ; #print " ",$hit->name,"\n"; } } } -------------------------------------------------------------------------------- I know how to get "name", "evalue" and "score", but I don't know how to get the word which is in red color. (or please see attachment.) ------------------------------------------------------------------------------------------------------------------ Query: 147 ctctttactcttaggtttacccgccggggtatcgtggcaaacaaggatagtttaaacaga 206 |||||| |||||||||||||||||| |||| || |||||| |||||||||||| || Sbjct: 114 ctcttttctcttaggtttacccgccaatatatcctgtcaaacactgatagtttaaactga 173 ------------------------------------------------------------------------------------------------------------------ I will appreciate if you could tell me how to do it. Thank you. Reginald Hsueh -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: R20080801-1.seq.txt URL: From SMarkel at accelrys.com Thu Dec 10 14:47:36 2009 From: SMarkel at accelrys.com (Scott Markel) Date: Thu, 10 Dec 2009 06:47:36 -0800 Subject: [Bioperl-l] about bioperl issue In-Reply-To: <5F281DC3E4514B3AAA8881169B240227@SHAPC> References: <5F281DC3E4514B3AAA8881169B240227@SHAPC> Message-ID: <5ACBA19439E77B43A06F4CAB897EC977067C6E@EXCH1-COLO.accelrys.net> Reginald, I didn't see anything highlighted in red but the three strings in the pairwise alignment display can be obtained from an HSP using $hsp->query_string() $hsp->hit_string() $hsp->homology_string() Scott Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel at accelrys.com Accelrys (SciTegic R&D) mobile: +1 858 205 3653 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 San Diego, CA 92121 fax: +1 858 799 5222 USA web: http://www.accelrys.com http://www.linkedin.com/in/smarkel Vice President, Board of Directors: International Society for Computational Biology Chair: ISCB Publications Committee Associate Editor: PLoS Computational Biology Editorial Board: Briefings in Bioinformatics -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hsueh Sent: Thursday, 10 December 2009 6:22 AM To: bioperl-l at bioperl.org Subject: [Bioperl-l] about bioperl issue Importance: High Dear The following is code. -------------------------------------------------------------------------------- my at params_rb = ( 'program' => 'blastn', 'database' => 'DB\\RB_GUS\\RB_GUS'); my $factory_rb = Bio::Tools::Run::StandAloneBlast->new(@params_rb); my $input_rb = Bio::Seq->new(-id =>"test_query", -seq => $testline2); my $blast_report_rb = $factory_rb->blastall($input_rb); while (my $result_rb = $blast_report_rb-> next_result ) { while (my $hit_rb = $result_rb->next_hit()){ while (my $hsp_rb = $hit_rb->next_hsp()){ print $hit_rb->name,"\nevalue = " , $hsp_rb->evalue , "\t score = " , $hsp_rb->score , "\n" ; #print " ",$hit->name,"\n"; } } } -------------------------------------------------------------------------------- I know how to get "name", "evalue" and "score", but I don't know how to get the word which is in red color. (or please see attachment.) ------------------------------------------------------------------------------------------------------------------ Query: 147 ctctttactcttaggtttacccgccggggtatcgtggcaaacaaggatagtttaaacaga 206 |||||| |||||||||||||||||| |||| || |||||| |||||||||||| || Sbjct: 114 ctcttttctcttaggtttacccgccaatatatcctgtcaaacactgatagtttaaactga 173 ------------------------------------------------------------------------------------------------------------------ I will appreciate if you could tell me how to do it. Thank you. Reginald Hsueh From David.Messina at sbc.su.se Thu Dec 10 15:09:31 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 10 Dec 2009 16:09:31 +0100 Subject: [Bioperl-l] about bioperl issue In-Reply-To: <5F281DC3E4514B3AAA8881169B240227@SHAPC> References: <5F281DC3E4514B3AAA8881169B240227@SHAPC> Message-ID: <107080B6-BC05-470C-B426-5DB69BD574C1@sbc.su.se> Hi Reginald, None of the words in your email or the attachment are colored red ? unfortunately any kind of formatting tends to get removed from emails send to mailing lists. Could you be more specific about what part of the blast report you are not able to get? You could even just copy and paste that particular bit of the report into your reply if it's not clear what to call it. Dave From David.Messina at sbc.su.se Thu Dec 10 15:36:49 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 10 Dec 2009 16:36:49 +0100 Subject: [Bioperl-l] about bioperl issue In-Reply-To: <9DEC7152C11A4F00B2F919B653E6D572@SHAPC> References: <5F281DC3E4514B3AAA8881169B240227@SHAPC> <107080B6-BC05-470C-B426-5DB69BD574C1@sbc.su.se> <9DEC7152C11A4F00B2F919B653E6D572@SHAPC> Message-ID: <15F92119-7625-4491-899A-0D49CE1BC861@sbc.su.se> Hi Reginald, Please keep all replies on the list so that everyone can follow the thread. In a separate email, Scott gave the answer you were looking for, I think. Namely: $hsp->query_string() OR $hsp->hit_string() Dave On Dec 10, 2009, at 16:31, Hsueh wrote: > Dear Dave Messina > > I need to get the string that is "ctctttactcttaggtttacccgccggggtatcgtggcaaacaaggatagtttaaacaga". > > Thank you > > Reginald Hsueh > > ------------------------------------------------------------------------------------------------------------------------------ > Query: 147 ctctttactcttaggtttacccgccggggtatcgtggcaaacaaggatagtttaaacaga 206 > |||||| |||||||||||||||||| |||| || |||||| |||||||||||| || > Sbjct: 114 ctcttttctcttaggtttacccgccaatatatcctgtcaaacactgatagtttaaactga 173 > ------------------------------------------------------------------------------------------------------------------------------ > > > > > -------------------------------------------------- > From: "Dave Messina" > Sent: Thursday, December 10, 2009 11:09 PM > To: "Hsueh" > Cc: > Subject: Re: [Bioperl-l] about bioperl issue > >> Hi Reginald, >> >> None of the words in your email or the attachment are colored red ? unfortunately any kind of formatting tends to get removed from emails send to mailing lists. >> >> Could you be more specific about what part of the blast report you are not able to get? You could even just copy and paste that particular bit of the report into your reply if it's not clear what to call it. >> >> >> Dave From lovebaby39 at gmail.com Thu Dec 10 15:53:00 2009 From: lovebaby39 at gmail.com (Hsueh) Date: Thu, 10 Dec 2009 23:53:00 +0800 Subject: [Bioperl-l] about bioperl issue In-Reply-To: <15F92119-7625-4491-899A-0D49CE1BC861@sbc.su.se> References: <5F281DC3E4514B3AAA8881169B240227@SHAPC> <107080B6-BC05-470C-B426-5DB69BD574C1@sbc.su.se> <9DEC7152C11A4F00B2F919B653E6D572@SHAPC> <15F92119-7625-4491-899A-0D49CE1BC861@sbc.su.se> Message-ID: Dear Dave Messina Thank you for your replies. Reginald Hsueh -------------------------------------------------- From: "Dave Messina" Sent: Thursday, December 10, 2009 11:36 PM To: "Hsueh" Cc: Subject: Re: [Bioperl-l] about bioperl issue > Hi Reginald, > > Please keep all replies on the list so that everyone can follow the > thread. > > In a separate email, Scott gave the answer you were looking for, I think. > > Namely: > $hsp->query_string() > OR > $hsp->hit_string() > > > > Dave > > > > > On Dec 10, 2009, at 16:31, Hsueh wrote: > >> Dear Dave Messina >> >> I need to get the string that is >> "ctctttactcttaggtttacccgccggggtatcgtggcaaacaaggatagtttaaacaga". >> >> Thank you >> >> Reginald Hsueh >> >> ------------------------------------------------------------------------------------------------------------------------------ >> Query: 147 ctctttactcttaggtttacccgccggggtatcgtggcaaacaaggatagtttaaacaga >> 206 >> |||||| |||||||||||||||||| |||| || |||||| >> |||||||||||| || >> Sbjct: 114 ctcttttctcttaggtttacccgccaatatatcctgtcaaacactgatagtttaaactga >> 173 >> ------------------------------------------------------------------------------------------------------------------------------ >> >> >> >> >> -------------------------------------------------- >> From: "Dave Messina" >> Sent: Thursday, December 10, 2009 11:09 PM >> To: "Hsueh" >> Cc: >> Subject: Re: [Bioperl-l] about bioperl issue >> >>> Hi Reginald, >>> >>> None of the words in your email or the attachment are colored red ? >>> unfortunately any kind of formatting tends to get removed from emails >>> send to mailing lists. >>> >>> Could you be more specific about what part of the blast report you are >>> not able to get? You could even just copy and paste that particular bit >>> of the report into your reply if it's not clear what to call it. >>> >>> >>> Dave >>>>Dear >>>> >>>>The following is code. >>>> >>>> >>>>-------------------------------------------------------------------------------- >>>> >>>>my at params_rb = ( 'program' => 'blastn', >>>> 'database' => 'DB\\RB_GUS\\RB_GUS'); >>>>my $factory_rb = Bio::Tools::Run::StandAloneBlast->new(@params_rb); >>>> >>>>my $input_rb = Bio::Seq->new(-id =>"test_query", >>>> -seq => $testline2); >>>>my $blast_report_rb = $factory_rb->blastall($input_rb); >>>> >>>> while (my $result_rb = $blast_report_rb-> next_result ) { >>>> while (my $hit_rb = $result_rb->next_hit()){ >>>> while (my $hsp_rb = $hit_rb->next_hsp()){ >>>> print $hit_rb->name,"\nevalue = " , $hsp_rb->evalue , "\t score = " >>>> , $hsp_rb->score , "\n" ; >>>> #print " ",$hit->name,"\n"; >>>> } >>>> } >>>> } >>>> >>>>-------------------------------------------------------------------------------- >>>> >>>> >>>>I know how to get "name", "evalue" and "score", but I don't know how >>>>to get the word which is in red color. (or please see attachment.) >>>>------------------------------------------------------------------------------------------------------------------ >>>>Query: 147 ctctttactcttaggtttacccgccggggtatcgtggcaaacaaggatagtttaaacaga >>>>206 >>>> |||||| |||||||||||||||||| |||| || |||||| >>>> |||||||||||| || >>>>Sbjct: 114 >>>>ctcttttctcttaggtttacccgccaatatatcctgtcaaacactgatagtttaaactga 173 >>>>------------------------------------------------------------------------------------------------------------------ >>>> >>>>I will appreciate if you could tell me how to do it. >>>>Thank you. >>>> >>>>Reginald Hsueh From pg4 at sanger.ac.uk Thu Dec 10 20:50:40 2009 From: pg4 at sanger.ac.uk (Pablo Marin-Garcia) Date: Thu, 10 Dec 2009 20:50:40 +0000 (GMT) Subject: [Bioperl-l] how to map ensembl id to NCBI gi In-Reply-To: References: Message-ID: If you are mapping ensembl genes to NCBI genes (via ensemblaPI or biomart) please read this recent thread at ensembl-dev: http://listserver.ebi.ac.uk/mailing-lists-archives/ensembl-dev/msg05417.html Seems that the ensembl gene mapping to NCBI is done through translation so the noncoding genes do not have the corresponding NCBI gene mapped. -Pablo > ------------------------------ > > Message: 4 > Date: Wed, 9 Dec 2009 19:59:37 -0600 > From: Chris Fields > Subject: Re: [Bioperl-l] how to map ensembl id to NCBI gi > To: Tom Keller > Cc: BioPerl-List > Message-ID: <14495B1F-911C-4FE7-8224-A3F050F7E03C at illinois.edu> > Content-Type: text/plain; charset=us-ascii > > Tom, > > Probably best to do this via BioMart: > > http://www.ensembl.org/biomart/ > > I would assume you can also do this via the ensembl perl API as well. > > Also, have a look at the UniProt ID Mapper: > > http://www.uniprot.org/?tab=mapping > > chris > > On Dec 9, 2009, at 6:36 PM, Tom Keller wrote: > >> Greetings, >> Is there a simple way to map a list of ensembl ids to the NCBI gis? >> >> thanks, >> Tom >> >> Thomas (Tom) Keller >> kellert at ohsu.edu >> 503.494.2442 >> 6339b R Jones Hall (BSc/CROET) >> www.ohsu.edu/xd/research/research-cores/dna-analysis/ >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > ==================================================================== Pablo Marin-Garcia, PhD \\// (Argiope bruennichi \/\/`(||>O:'\/\/ with stabilimentum) //\\ Sanger Institute | PostDoc / Computer Biologist Wellcome Trust Genome Campus | team : 128/108 (Human Genetics) Hinxton, Cambridge CB10 1HH | room : N333 United Kingdom | email: pablo.marin at sanger.ac.uk ==================================================================== -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From umjsm at leeds.ac.uk Fri Dec 11 16:44:42 2009 From: umjsm at leeds.ac.uk (Joan Segura Mora) Date: Fri, 11 Dec 2009 16:44:42 +0000 Subject: [Bioperl-l] extract and write a pdb chain Message-ID: <1260549882.6484.11.camel@limm-pc1254> Hello, I am trying to do a very easy think but I don't get it. I want to write in a file a chain of a pdb. I have try a lot of thinks but what I think that it should work is the next script: use Bio::Structure::IO; use strict; my $structio = Bio::Structure::IO->new(-file => "101m.pdb",'-format' => 'pdb'); my $struc = $structio->next_structure; my $new_entry = Bio::Structure::Entry->new( -id => 'structure_id'); for my $chain ($struc->get_chains) { if($chain->id eq "A"){ $new_entry->chain($chain); last; } } my $out = Bio::Structure::IO->new(-file => ">out.pdb",'-format' => 'pdb');# $out->write_structure($new_entry); it doesn't. I get the next error: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: add_chain: first argument needs to be a Model object () STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:368 STACK: Bio::Structure::Entry::add_chain /usr/local/share/perl/5.8.8/Bio/Structure/Entry.pm:335 STACK: Bio::Structure::Entry::get_chains /usr/local/share/perl/5.8.8/Bio/Structure/Entry.pm:391 STACK: Bio::Structure::Entry::chain /usr/local/share/perl/5.8.8/Bio/Structure/Entry.pm:304 STACK: read_pdb.pl:10 ----------------------------------------------------------- As far I understand the documentation, the method chain of the object Bio::Structure::Entry requires an as input an object of type Chain. Any solution will be very welcome. best regards, Joan From wkretzsch at gmail.com Fri Dec 11 19:22:31 2009 From: wkretzsch at gmail.com (Warren W. Kretzschmar) Date: Fri, 11 Dec 2009 14:22:31 -0500 Subject: [Bioperl-l] Proposed project: SeqIO module for msOUT files generated by Hudson's ms Message-ID: <5d2ac05c0912111122p1fea0961rfff0f1cf7aa8f97f@mail.gmail.com> Hi, I'm new to the bioperl community. I've created a perl module that reads in msOUT files generated by Hudson's ms. As far as I understand, there is no SeqIO module to read and output these files? If so, I propose to create a module that does this. Any suggestions? Thanks, Warren Kretzschmar From maj at fortinbras.us Fri Dec 11 19:59:53 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 11 Dec 2009 14:59:53 -0500 Subject: [Bioperl-l] Proposed project: SeqIO module for msOUT filesgenerated by Hudson's ms In-Reply-To: <5d2ac05c0912111122p1fea0961rfff0f1cf7aa8f97f@mail.gmail.com> References: <5d2ac05c0912111122p1fea0961rfff0f1cf7aa8f97f@mail.gmail.com> Message-ID: <07382508ED0B41F4B8289813B734239B@NewLife> Hi Warren, I say go for it. You'll want to have a look at http://bio.perl.org/wiki/Advanced_BioPerl which explains most of our tips and "policies" for prospective code contributors, as well as http://bio.perl.org/wiki/HOWTO:SeqIO which details SeqIO from the user's perspective. Look carefully at some Bio::SeqIO::* modules for implementation details. If you have code to propose, use http://bugzilla.bioperl.org and enter a new enhancement, where you can upload your module for us to review. MAJ ----- Original Message ----- From: "Warren W. Kretzschmar" To: Sent: Friday, December 11, 2009 2:22 PM Subject: [Bioperl-l] Proposed project: SeqIO module for msOUT filesgenerated by Hudson's ms > Hi, > I'm new to the bioperl community. I've created a perl module that > reads in msOUT files generated by Hudson's ms. As far as I > understand, there is no SeqIO module to read and output these files? > If so, I propose to create a module that does this. Any suggestions? > > Thanks, > Warren Kretzschmar > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From bosborne11 at verizon.net Fri Dec 11 20:37:45 2009 From: bosborne11 at verizon.net (Brian Osborne) Date: Fri, 11 Dec 2009 15:37:45 -0500 Subject: [Bioperl-l] extract and write a pdb chain In-Reply-To: <1260549882.6484.11.camel@limm-pc1254> References: <1260549882.6484.11.camel@limm-pc1254> Message-ID: Joan, It looks to me like the first argument to the add_chain() method has to be a Model object, the second is the Chain itself. See Structure/ Entry.pm, for example. However if you're seeing some documentation that says something else then tell us where, it needs to be corrected. In Bio::Structure an Entry consists of one or Models, each of which has one or more Chains. This allows you to build macromolecular complexes (an Entry), which could have more than one defined proteins or protein complexes (Models). Brian O. On Dec 11, 2009, at 11:44 AM, Joan Segura Mora wrote: > Hello, > > I am trying to do a very easy think but I don't get it. I want to > write > in a file a chain of a pdb. I have try a lot of thinks but what I > think > that it should work is the next script: > > use Bio::Structure::IO; > use strict; > > my $structio = Bio::Structure::IO->new(-file => "101m.pdb",'-format' > => > 'pdb'); > my $struc = $structio->next_structure; > > my $new_entry = Bio::Structure::Entry->new( -id => 'structure_id'); > > for my $chain ($struc->get_chains) { > if($chain->id eq "A"){ > $new_entry->chain($chain); > last; > } > } > > my $out = Bio::Structure::IO->new(-file => ">out.pdb",'-format' => > 'pdb');# > $out->write_structure($new_entry); > > it doesn't. I get the next error: > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: add_chain: first argument needs to be a Model object () > > STACK: Error::throw > STACK: > Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/Root.pm: > 368 > STACK: > Bio::Structure::Entry::add_chain /usr/local/share/perl/5.8.8/Bio/ > Structure/Entry.pm:335 > STACK: > Bio::Structure::Entry::get_chains /usr/local/share/perl/5.8.8/Bio/ > Structure/Entry.pm:391 > STACK: > Bio::Structure::Entry::chain /usr/local/share/perl/5.8.8/Bio/ > Structure/Entry.pm:304 > STACK: read_pdb.pl:10 > ----------------------------------------------------------- > > As far I understand the documentation, the method chain of the object > Bio::Structure::Entry requires an as input an object of type Chain. > > Any solution will be very welcome. > > best regards, > Joan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From awitney at sgul.ac.uk Sun Dec 13 21:48:13 2009 From: awitney at sgul.ac.uk (Adam Witney) Date: Sun, 13 Dec 2009 21:48:13 +0000 Subject: [Bioperl-l] combining tree image with heatmap Message-ID: <4B25611D.6050009@sgul.ac.uk> I am trying to draw a tree on the side of a heatmap image, much like you see after clustering data. I was wondering if anyone has managed to do this using bioperl? I can draw the two separately, but can't quite seem to work out how to put the two together and get the nodes to line up with the correct row of clustering data. Is there any particular module to look at? thanks for any help adam From dhwani1030 at gmail.com Sat Dec 12 20:04:01 2009 From: dhwani1030 at gmail.com (dhwani gandhi) Date: Sat, 12 Dec 2009 15:04:01 -0500 Subject: [Bioperl-l] Bioperl code help Message-ID: Hi, I am very new to Bioperl but I am somewhat familiar to perl though. I write my perl programs in Notepad++ and run them in cmd. Now, I want to run Bioperl programs. I just installed bioperl on my computer. And I have a program using bioperl modules in Notepad++. My question is how to run these programs? Can they be ran in cmd as well? or do I use ppm? Please help. Thanks, -Dhwani Gandhi. From eric_donaldson at med.unc.edu Sun Dec 13 23:15:24 2009 From: eric_donaldson at med.unc.edu (eric_donaldson at med.unc.edu) Date: Sun, 13 Dec 2009 18:15:24 -0500 Subject: [Bioperl-l] problem with install Message-ID: Hello, Today I downloaded bioperl 1.61 on my new macbook pro using fink.? I used the fink install bioperl.pm-588 as I could not get it to instal using the perl version 5.10. But now I get an error when trying to run a bioperl script. Here is the error: Can't locate Bio/Tools/BPlite.pm in @INC (@INC contains: /sw/lib/perl5/darwin-thread-multi-2level /sw/lib/perl5 /sw/lib/perl5/darwin /Library/Perl/Updates/5.10.0 /System/Library/Perl/5.10.0/darwin-thread-multi-2level /System/Library/Perl/5.10.0 /Library/Perl/5.10.0/darwin-thread-multi-2level /Library/Perl/5.10.0 /Network/Library/Perl/5.10.0/darwin-thread-multi-2level /Network/Library/Perl/5.10.0 /Network/Library/Perl /System/Library/Perl/Extras/5.10.0/darwin-thread-multi-2level /System/Library/Perl/Extras/5.10.0 .) at blastparser.pl line 8. BEGIN failed--compilation aborted at blastparser.pl line 8. I am a novice at unix and bioperl so I do not know how to troubleshoot this, would you please hleo me? Thank you, Eric Eric F. Donaldson, Ph.D. Research Assistant Professor, Ralph Baric Lab University of North Carolina Department of Epidemiology -------------- next part -------------- begin:vcard n:Donaldson;Eric fn:Eric F. Donaldson, PhD tel;work:919.966.3881 org:University of North Carolina, School of Medicine;Epidemiology adr:;;2107 McGavran-Greenberg Hall CB# 7435 ;Chapel Hill;NC;27599;USA email;internet:eric_donaldson at med.unc.edu email;home;internet:viralnerd at gmail.com title:Research Assistant Professor version:2.1 end:vcard From jason at bioperl.org Mon Dec 14 01:24:26 2009 From: jason at bioperl.org (Jason Stajich) Date: Sun, 13 Dec 2009 17:24:26 -0800 Subject: [Bioperl-l] problem with install In-Reply-To: References: Message-ID: <119F436D-D36D-4D28-BAE7-6EB17D665FC2@bioperl.org> Hi Eric - Bio::Tools::BPlite is no longer supported in Bioperl - it was deprecated several releases ago. It was replaced with Bio::SearchIO -jason On Dec 13, 2009, at 3:15 PM, eric_donaldson at med.unc.edu wrote: > Hello, > > Today I downloaded bioperl 1.61 on my new macbook pro using fink. I > used the > > fink install bioperl.pm-588 as I could not get it to instal using > the perl version 5.10. > > But now I get an error when trying to run a bioperl script. > > Here is the error: > > Can't locate Bio/Tools/BPlite.pm in @INC (@INC contains: /sw/lib/ > perl5/darwin-thread-multi-2level /sw/lib/perl5 /sw/lib/perl5/darwin / > Library/Perl/Updates/5.10.0 /System/Library/Perl/5.10.0/darwin- > thread-multi-2level /System/Library/Perl/5.10.0 /Library/Perl/5.10.0/ > darwin-thread-multi-2level /Library/Perl/5.10.0 /Network/Library/ > Perl/5.10.0/darwin-thread-multi-2level /Network/Library/Perl/5.10.0 / > Network/Library/Perl /System/Library/Perl/Extras/5.10.0/darwin- > thread-multi-2level /System/Library/Perl/Extras/5.10.0 .) at > blastparser.pl line 8. > BEGIN failed--compilation aborted at blastparser.pl line 8. > > > I am a novice at unix and bioperl so I do not know how to > troubleshoot this, would you please hleo me? > > Thank you, > > Eric > > > Eric F. Donaldson, Ph.D. > Research Assistant Professor, Ralph Baric Lab > University of North Carolina > Department of Epidemiology > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From jason at bioperl.org Mon Dec 14 04:09:45 2009 From: jason at bioperl.org (Jason Stajich) Date: Sun, 13 Dec 2009 20:09:45 -0800 Subject: [Bioperl-l] problem with install In-Reply-To: References: <119F436D-D36D-4D28-BAE7-6EB17D665FC2@bioperl.org> Message-ID: <404D2600-58D3-4491-834E-8C9F860D3ACC@bioperl.org> So you installed perl-5.10 or using system perl? I'm confused if you actually installed bioperl.pm or not via fink? It seems like since your @INC or $PERL5LIB points to /sw/lib/perl5 which is one of the dirs it would have installed in, but I don't think you actually installed bioperl. you can try and do: $ locate Bio/SearchIO.pm We'll see if any of the other osx/fink gurus are on the list that can help or you can install it via CPAN I guess. -jason On Dec 13, 2009, at 6:39 PM, eric_donaldson at med.unc.edu wrote: > > I actually tried a different blastparser that uses BIO::SearchIO and > got the same message: > > Can't locate Bio/SearchIO.pm in @INC (@INC contains: /sw/lib/perl5/ > darwin-thread-multi-2level /sw/lib/perl5 /sw/lib/perl5/darwin / > Library/Perl/Updates/5.10.0 /System/Library/Perl/5.10.0/darwin- > thread-multi-2level /System/Library/Perl/5.10.0 /Library/Perl/5.10.0/ > darwin-thread-multi-2level /Library/Perl/5.10.0 /Network/Library/ > Perl/5.10.0/darwin-thread-multi-2level /Network/Library/Perl/5.10.0 / > Network/Library/Perl /System/Library/Perl/Extras/5.10.0/darwin- > thread-multi-2level /System/Library/Perl/Extras/5.10.0 .) at > blastparser.new.pl line 8. > BEGIN failed--compilation aborted at blastparser.new.pl line 8. > > I suspect there is a path problem, but am not savvy enough to know > how to fix it. I am really just a hacker.... I have several scripts > that I use regularly and that I know how to modify, but am lost when > they don't work... > > Thanks for any help, > > Eric > > ----- Original Message ----- > From: Jason Stajich > Date: Sunday, December 13, 2009 8:24 pm > Subject: Re: [Bioperl-l] problem with install > To: eric_donaldson at med.unc.edu > Cc: bioperl-l at bioperl.org > >> Hi Eric - >> >> Bio::Tools::BPlite is no longer supported in Bioperl - it >> was >> deprecated several releases ago. >> It was replaced with Bio::SearchIO >> >> -jason >> On Dec 13, 2009, at 3:15 PM, eric_donaldson at med.unc.edu wrote: >> >>> Hello, >>> >>> Today I downloaded bioperl 1.61 on my new macbook pro using >> fink. I >>> used the >>> >>> fink install bioperl.pm-588 as I could not get it to instal >> using >>> the perl version 5.10. >>> >>> But now I get an error when trying to run a bioperl script. >>> >>> Here is the error: >>> >>> Can't locate Bio/Tools/BPlite.pm in @INC (@INC contains: >> /sw/lib/ >>> perl5/darwin-thread-multi-2level /sw/lib/perl5 >> /sw/lib/perl5/darwin / >>> Library/Perl/Updates/5.10.0 /System/Library/Perl/5.10.0/darwin- >> >>> thread-multi-2level /System/Library/Perl/5.10.0 >> /Library/Perl/5.10.0/ >>> darwin-thread-multi-2level /Library/Perl/5.10.0 >> /Network/Library/ >>> Perl/5.10.0/darwin-thread-multi-2level >> /Network/Library/Perl/5.10.0 / >>> Network/Library/Perl /System/Library/Perl/Extras/5.10.0/darwin- >> >>> thread-multi-2level /System/Library/Perl/Extras/5.10.0 .) >> at >>> blastparser.pl line 8. >>> BEGIN failed--compilation aborted at blastparser.pl line 8. >>> >>> >>> I am a novice at unix and bioperl so I do not know how >> to >>> troubleshoot this, would you please hleo me? >>> >>> Thank you, >>> >>> Eric >>> >>> >>> Eric F. Donaldson, Ph.D. >>> Research Assistant Professor, Ralph Baric Lab >>> University of North Carolina >>> Department of Epidemiology >>> >>> >>> >> < >> eric_donaldson.vcf>_______________________________________________> >> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Jason Stajich >> jason.stajich at gmail.com >> jason at bioperl.org >> >> > > Eric F. Donaldson, Ph.D. > Research Assistant Professor, Ralph Baric Lab > University of North Carolina > Department of Epidemiology > > > -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From jason at bioperl.org Mon Dec 14 05:10:54 2009 From: jason at bioperl.org (Jason Stajich) Date: Sun, 13 Dec 2009 21:10:54 -0800 Subject: [Bioperl-l] problem with install In-Reply-To: References: <119F436D-D36D-4D28-BAE7-6EB17D665FC2@bioperl.org> <404D2600-58D3-4491-834E-8C9F860D3ACC@bioperl.org> Message-ID: <7B2EBA9A-E9DF-49A5-ABC7-C42512BA9C9A@bioperl.org> Eric - please CC the bioperl list when responding so others can help - I can't be the only answerer. But since your @INC message doesn't include /sw/lib/perl5/5.8.8/ you would need to make sure that is added to your PERL5LIB. There are some help docs on the perl sites I expect on how to get your PATHs in order. Or you can just install via CPAN which will put it in the right path - there are docs on the bioperl website about installing via CPAN. -jason On Dec 13, 2009, at 9:02 PM, eric_donaldson at med.unc.edu wrote: > Hi Jason, > > The fink package did not have support for perl 5.10, so I attempted > to install the perl 5.8.6 package. > > When I attempted: locate Bio/SearchIO.pm > I got: -bash: $: command not found > > So even though I can find SearchIO.pm in sw/lib/perl5/5.8.8/Bio/ > SearchIO.pm I cannot access it. Do I need to use the older version > of perl? > > Would it be better to install with CPAN? If so, can you send me to > a page that has instructions? > > Thank you so much! > > ERic > > > ----- Original Message ----- > From: Jason Stajich > Date: Sunday, December 13, 2009 11:10 pm > Subject: Re: [Bioperl-l] problem with install > To: eric_donaldson at med.unc.edu > Cc: BioPerl List > >> So you installed perl-5.10 or using system perl? I'm >> confused if you >> actually installed bioperl.pm or not via fink? >> >> It seems like since your @INC or $PERL5LIB points to >> /sw/lib/perl5 >> which is one of the dirs it would have installed in, but I don't >> think >> you actually installed bioperl. >> >> you can try and do: >> $ locate Bio/SearchIO.pm >> >> We'll see if any of the other osx/fink gurus are on the list >> that can >> help or you can install it via CPAN I guess. >> >> -jason >> On Dec 13, 2009, at 6:39 PM, eric_donaldson at med.unc.edu wrote: >> >>> >>> I actually tried a different blastparser that uses >> BIO::SearchIO and >>> got the same message: >>> >>> Can't locate Bio/SearchIO.pm in @INC (@INC contains: >> /sw/lib/perl5/ >>> darwin-thread-multi-2level /sw/lib/perl5 /sw/lib/perl5/darwin >> / >>> Library/Perl/Updates/5.10.0 /System/Library/Perl/5.10.0/darwin- >> >>> thread-multi-2level /System/Library/Perl/5.10.0 >> /Library/Perl/5.10.0/ >>> darwin-thread-multi-2level /Library/Perl/5.10.0 >> /Network/Library/ >>> Perl/5.10.0/darwin-thread-multi-2level >> /Network/Library/Perl/5.10.0 / >>> Network/Library/Perl /System/Library/Perl/Extras/5.10.0/darwin- >> >>> thread-multi-2level /System/Library/Perl/Extras/5.10.0 .) >> at >>> blastparser.new.pl line 8. >>> BEGIN failed--compilation aborted at blastparser.new.pl line 8. >>> >>> I suspect there is a path problem, but am not savvy enough to >> know >>> how to fix it. I am really just a hacker.... I have >> several scripts >>> that I use regularly and that I know how to modify, but am >> lost when >>> they don't work... >>> >>> Thanks for any help, >>> >>> Eric >>> >>> ----- Original Message ----- >>> From: Jason Stajich >>> Date: Sunday, December 13, 2009 8:24 pm >>> Subject: Re: [Bioperl-l] problem with install >>> To: eric_donaldson at med.unc.edu >>> Cc: bioperl-l at bioperl.org >>> >>>> Hi Eric - >>>> >>>> Bio::Tools::BPlite is no longer supported in Bioperl - it >>>> was >>>> deprecated several releases ago. >>>> It was replaced with Bio::SearchIO >>>> >>>> -jason >>>> On Dec 13, 2009, at 3:15 PM, eric_donaldson at med.unc.edu wrote: >>>> >>>>> Hello, >>>>> >>>>> Today I downloaded bioperl 1.61 on my new macbook pro using >>>> fink. I >>>>> used the >>>>> >>>>> fink install bioperl.pm-588 as I could not get it to instal >>>> using >>>>> the perl version 5.10. >>>>> >>>>> But now I get an error when trying to run a bioperl script. >>>>> >>>>> Here is the error: >>>>> >>>>> Can't locate Bio/Tools/BPlite.pm in @INC (@INC contains: >>>> /sw/lib/ >>>>> perl5/darwin-thread-multi-2level /sw/lib/perl5 >>>> /sw/lib/perl5/darwin / >>>>> Library/Perl/Updates/5.10.0 >> /System/Library/Perl/5.10.0/darwin- >>>> >>>>> thread-multi-2level /System/Library/Perl/5.10.0 >>>> /Library/Perl/5.10.0/ >>>>> darwin-thread-multi-2level /Library/Perl/5.10.0 >>>> /Network/Library/ >>>>> Perl/5.10.0/darwin-thread-multi-2level >>>> /Network/Library/Perl/5.10.0 / >>>>> Network/Library/Perl >> /System/Library/Perl/Extras/5.10.0/darwin- >>>> >>>>> thread-multi-2level /System/Library/Perl/Extras/5.10.0 .) >>>> at >>>>> blastparser.pl line 8. >>>>> BEGIN failed--compilation aborted at blastparser.pl line 8. >>>>> >>>>> >>>>> I am a novice at unix and bioperl so I do not know how >>>> to >>>>> troubleshoot this, would you please hleo me? >>>>> >>>>> Thank you, >>>>> >>>>> Eric >>>>> >>>>> >>>>> Eric F. Donaldson, Ph.D. >>>>> Research Assistant Professor, Ralph Baric Lab >>>>> University of North Carolina >>>>> Department of Epidemiology >>>>> >>>>> >>>>> >>>> < >>>> >> eric_donaldson.vcf>_______________________________________________> >>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> -- >>>> Jason Stajich >>>> jason.stajich at gmail.com >>>> jason at bioperl.org >>>> >>>> >>> >>> Eric F. Donaldson, Ph.D. >>> Research Assistant Professor, Ralph Baric Lab >>> University of North Carolina >>> Department of Epidemiology >>> >>> >>> >> >> -- >> Jason Stajich >> jason.stajich at gmail.com >> jason at bioperl.org >> >> > > Eric F. Donaldson, Ph.D. > Research Assistant Professor, Ralph Baric Lab > University of North Carolina > Department of Epidemiology > > > -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From awitney at sgul.ac.uk Mon Dec 14 09:36:19 2009 From: awitney at sgul.ac.uk (Adam Witney) Date: Mon, 14 Dec 2009 09:36:19 +0000 Subject: [Bioperl-l] Bioperl code help In-Reply-To: References: Message-ID: <4B260713.3070402@sgul.ac.uk> bioperl programs are just perl programs so you should run them in exactly the same way as your perl prorgrams, from the command line HTH adam On 12/12/2009 20:04, dhwani gandhi wrote: > Hi, > I am very new to Bioperl but I am somewhat familiar to perl though. > > I write my perl programs in Notepad++ and run them in cmd. > > Now, I want to run Bioperl programs. I just installed bioperl on my > computer. And I have a program using bioperl modules in Notepad++. > > My question is how to run these programs? Can they be ran in cmd as well? or > do I use ppm? > > Please help. > > Thanks, > -Dhwani Gandhi. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From umjsm at leeds.ac.uk Mon Dec 14 10:39:32 2009 From: umjsm at leeds.ac.uk (Joan Segura Mora) Date: Mon, 14 Dec 2009 10:39:32 +0000 Subject: [Bioperl-l] extract and write a pdb chain In-Reply-To: References: <1260549882.6484.11.camel@limm-pc1254> Message-ID: <1260787172.7359.0.camel@limm-pc1254> Hi Brian, I am not calling the method add_chain, I am calling the method chain http://doc.bioperl.org/releases/bioperl-1.0.1/Bio/Structure/Entry.html#POD6 and if I don't use as an argument an object of type Bio::Structure::Chain I get an error like this (-->depends of the argument<--) ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Supplied a -->Bio::Structure::Residue=HASH(0x11be6a0)<-- to chain, we want a Bio::Structure::Chain or a list of these STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:368 STACK: Bio::Structure::Entry::chain /usr/local/share/perl/5.8.8/Bio/Structure/Entry.pm:314 STACK: read_pdb.pl:11 ----------------------------------------------------------- And if I use a Chain object I get the error that I told you. I have try this code: use Bio::Structure::IO; use strict; my $structio = Bio::Structure::IO->new(-file => "101m.pdb",'-format' => 'pdb'); my $struc = $structio->next_structure; my $new_entry = Bio::Structure::Entry->new( -id => 'structure_id'); my $model = Bio::Structure::Model->new( -id => '0'); for my $chain ($struc->get_chains) { if($chain->id eq "A"){ $new_entry->add_chain($model,$chain); last; } } $new_entry->add_model($model); my $out = Bio::Structure::IO->new(-file => ">out.pdb",'-format' => 'pdb'); $out->write_structure($new_entry); But I get an empty pdb HEADER DEFAULT CLASSIFICATION 24-JAN-70 stru REMARK 1 TER 1 A 0 MASTER END I am trying a lot of combinations, but I can't write a single chain into a file. I don't know what I am doing wrong. Thanks for helping regards, Joan On Fri, 2009-12-11 at 15:37 -0500, Brian Osborne wrote: > Joan, > > It looks to me like the first argument to the add_chain() method has > to be a Model object, the second is the Chain itself. See Structure/ > Entry.pm, for example. However if you're seeing some documentation > that says something else then tell us where, it needs to be corrected. > > In Bio::Structure an Entry consists of one or Models, each of which > has one or more Chains. This allows you to build macromolecular > complexes (an Entry), which could have more than one defined proteins > or protein complexes (Models). > > Brian O. > > On Dec 11, 2009, at 11:44 AM, Joan Segura Mora wrote: > > > Hello, > > > > I am trying to do a very easy think but I don't get it. I want to > > write > > in a file a chain of a pdb. I have try a lot of thinks but what I > > think > > that it should work is the next script: > > > > use Bio::Structure::IO; > > use strict; > > > > my $structio = Bio::Structure::IO->new(-file => "101m.pdb",'-format' > > => > > 'pdb'); > > my $struc = $structio->next_structure; > > > > my $new_entry = Bio::Structure::Entry->new( -id => 'structure_id'); > > > > for my $chain ($struc->get_chains) { > > if($chain->id eq "A"){ > > $new_entry->chain($chain); > > last; > > } > > } > > > > my $out = Bio::Structure::IO->new(-file => ">out.pdb",'-format' => > > 'pdb');# > > $out->write_structure($new_entry); > > > > it doesn't. I get the next error: > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: add_chain: first argument needs to be a Model object () > > > > STACK: Error::throw > > STACK: > > Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/Root.pm: > > 368 > > STACK: > > Bio::Structure::Entry::add_chain /usr/local/share/perl/5.8.8/Bio/ > > Structure/Entry.pm:335 > > STACK: > > Bio::Structure::Entry::get_chains /usr/local/share/perl/5.8.8/Bio/ > > Structure/Entry.pm:391 > > STACK: > > Bio::Structure::Entry::chain /usr/local/share/perl/5.8.8/Bio/ > > Structure/Entry.pm:304 > > STACK: read_pdb.pl:10 > > ----------------------------------------------------------- > > > > As far I understand the documentation, the method chain of the object > > Bio::Structure::Entry requires an as input an object of type Chain. > > > > Any solution will be very welcome. > > > > best regards, > > Joan > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From fs5 at sanger.ac.uk Mon Dec 14 12:18:17 2009 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Mon, 14 Dec 2009 12:18:17 +0000 Subject: [Bioperl-l] parse EMBL Feature Table only Message-ID: <1260793098.17180.184.camel@deskpro15336.dynamic.sanger.ac.uk> Hi, Maybe I'm really missing something here but I can't find how to parse a file that is basically just the Feature Table from an EMBL file, looking like this: FT CDS join(37467..37521,38078..38195,38312..38400,38859..38936,39067..39154,39379..39675,39818..39842) FT /colour=7 FT /product="RNA-binding protein, putative" FT CDS 213199..214812 FT /colour=7 FT /product="eukaryotic translation initiation factor 3 FT subunit 7, putative" ...[more of the same] So the file has no header and no actual sequence and it is used simply to annotate a chromosome in a genome assembly. I've always used GFF for that purpose but have been given this file now. BioSeqIO->new(-format=>"EMBL") complains about the missing header and if I stick in a fake ID line, it warns about the missing sequence and the fact that the features don't fit on the sequence (of length 0). Of course it's not difficult to write my own parser but I'm sure there must be a BioPerl way of doing that that I have just overlooked. Thanks for your help. -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From David.Messina at sbc.su.se Mon Dec 14 14:06:54 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 14 Dec 2009 15:06:54 +0100 Subject: [Bioperl-l] parse EMBL Feature Table only In-Reply-To: <1260793098.17180.184.camel@deskpro15336.dynamic.sanger.ac.uk> References: <1260793098.17180.184.camel@deskpro15336.dynamic.sanger.ac.uk> Message-ID: <0F8203F6-06D8-43EF-BB35-EB723F4B9DFA@sbc.su.se> Hi Frank, You will need to look at the feature table parsing code that Bio::SeqIO::embl itself uses to read those lines, probably the _read_FTHelper_EMBL method: http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/SeqIO/embl.html#POD12 Since you're trying to parse what is effectively a part of an EMBL record, and a somewhat complicated part at that, as you might imagine this could be a little hairy. It might be easier to go the route you started down: add a fake header and a (relatively long) fake sequence, and go through Bio::SeqIO in the normal way. Dave PS ? I suspect you may already be familiar with it, but for an overview on how to get at data in feature tables, look at the Feature Annotation HOWTO: http://www.bioperl.org/wiki/HOWTO:Feature-Annotation From eric_donaldson at med.unc.edu Mon Dec 14 14:22:40 2009 From: eric_donaldson at med.unc.edu (eric_donaldson at med.unc.edu) Date: Mon, 14 Dec 2009 09:22:40 -0500 Subject: [Bioperl-l] problem with install In-Reply-To: <7B2EBA9A-E9DF-49A5-ABC7-C42512BA9C9A@bioperl.org> References: <119F436D-D36D-4D28-BAE7-6EB17D665FC2@bioperl.org> <404D2600-58D3-4491-834E-8C9F860D3ACC@bioperl.org> <7B2EBA9A-E9DF-49A5-ABC7-C42512BA9C9A@bioperl.org> Message-ID: Thank you Jason.? I appreciate the help. Eric ----- Original Message ----- From: Jason Stajich Date: Monday, December 14, 2009 12:10 am Subject: Re: [Bioperl-l] problem with install To: eric_donaldson at med.unc.edu Cc: BioPerl List > Eric - > please CC the bioperl list when responding so others can help - > I? > can't be the only answerer. > > But since your @INC message doesn't include /sw/lib/perl5/5.8.8/ > you? > would need to make sure that is added to your PERL5LIB. > There are some help docs on the perl sites I expect on how to > get your? > PATHs in order. > > Or you can just install via CPAN which will put it in the right > path -? > there are docs on the bioperl website about installing via CPAN. > > -jason > On Dec 13, 2009, at 9:02 PM, eric_donaldson at med.unc.edu wrote: > > > Hi Jason, > > > > The fink package did not have support for perl 5.10, so I > attempted? > > to install the perl 5.8.6 package. > > > > When I attempted: locate Bio/SearchIO.pm > > I got: -bash: $: command not found > > > > So even though I can find SearchIO.pm in > sw/lib/perl5/5.8.8/Bio/ > > SearchIO.pm? I cannot access it.? Do I need to use > the older version? > > of perl? > > > > Would it be better to install with CPAN?? If so, can you > send me to? > > a page that has instructions? > > > > Thank you so much! > > > > ERic > > > > > > ----- Original Message ----- > > From: Jason Stajich > > Date: Sunday, December 13, 2009 11:10 pm > > Subject: Re: [Bioperl-l] problem with install > > To: eric_donaldson at med.unc.edu > > Cc: BioPerl List > > > >> So you installed perl-5.10 or using system perl?? I'm > >> confused if you > >> actually installed bioperl.pm or not via fink? > >> > >> It seems like since your @INC or $PERL5LIB points to > >> /sw/lib/perl5 > >> which is one of the dirs it would have installed in, but I don't > >> think > >> you actually installed bioperl. > >> > >> you can try and do: > >> $ locate Bio/SearchIO.pm > >> > >> We'll see if any of the other osx/fink gurus are on the list > >> that can > >> help or you can install it via CPAN I guess. > >> > >> -jason > >> On Dec 13, 2009, at 6:39 PM, eric_donaldson at med.unc.edu wrote: > >> > >>> > >>> I actually tried a different blastparser that uses > >> BIO::SearchIO and > >>> got the same message: > >>> > >>> Can't locate Bio/SearchIO.pm in @INC (@INC contains: > >> /sw/lib/perl5/ > >>> darwin-thread-multi-2level /sw/lib/perl5 /sw/lib/perl5/darwin > >> / > >>> Library/Perl/Updates/5.10.0 > /System/Library/Perl/5.10.0/darwin- > >> > >>> thread-multi-2level /System/Library/Perl/5.10.0 > >> /Library/Perl/5.10.0/ > >>> darwin-thread-multi-2level /Library/Perl/5.10.0 > >> /Network/Library/ > >>> Perl/5.10.0/darwin-thread-multi-2level > >> /Network/Library/Perl/5.10.0 / > >>> Network/Library/Perl > /System/Library/Perl/Extras/5.10.0/darwin- > >> > >>> thread-multi-2level /System/Library/Perl/Extras/5.10.0 .) > >> at > >>> blastparser.new.pl line 8. > >>> BEGIN failed--compilation aborted at blastparser.new.pl line 8. > >>> > >>> I suspect there is a path problem, but am not savvy enough to > >> know > >>> how to fix it.? I am really just a hacker.... I have > >> several scripts > >>> that I use regularly and that I know how to modify, but am > >> lost when > >>> they don't work... > >>> > >>> Thanks for any help, > >>> > >>> Eric > >>> > >>> ----- Original Message ----- > >>> From: Jason Stajich > >>> Date: Sunday, December 13, 2009 8:24 pm > >>> Subject: Re: [Bioperl-l] problem with install > >>> To: eric_donaldson at med.unc.edu > >>> Cc: bioperl-l at bioperl.org > >>> > >>>> Hi Eric - > >>>> > >>>> Bio::Tools::BPlite is no longer supported in Bioperl - it > >>>> was > >>>> deprecated several releases ago. > >>>> It was replaced with Bio::SearchIO > >>>> > >>>> -jason > >>>> On Dec 13, 2009, at 3:15 PM, eric_donaldson at med.unc.edu wrote: > >>>> > >>>>> Hello, > >>>>> > >>>>> Today I downloaded bioperl 1.61 on my new macbook pro using > >>>> fink.? I > >>>>> used the > >>>>> > >>>>> fink install bioperl.pm-588 as I could not get it to instal > >>>> using > >>>>> the perl version 5.10. > >>>>> > >>>>> But now I get an error when trying to run a bioperl script. > >>>>> > >>>>> Here is the error: > >>>>> > >>>>> Can't locate Bio/Tools/BPlite.pm in @INC (@INC contains: > >>>> /sw/lib/ > >>>>> perl5/darwin-thread-multi-2level /sw/lib/perl5 > >>>> /sw/lib/perl5/darwin / > >>>>> Library/Perl/Updates/5.10.0 > >> /System/Library/Perl/5.10.0/darwin- > >>>> > >>>>> thread-multi-2level /System/Library/Perl/5.10.0 > >>>> /Library/Perl/5.10.0/ > >>>>> darwin-thread-multi-2level /Library/Perl/5.10.0 > >>>> /Network/Library/ > >>>>> Perl/5.10.0/darwin-thread-multi-2level > >>>> /Network/Library/Perl/5.10.0 / > >>>>> Network/Library/Perl > >> /System/Library/Perl/Extras/5.10.0/darwin- > >>>> > >>>>> thread-multi-2level /System/Library/Perl/Extras/5.10.0 .) > >>>> at > >>>>> blastparser.pl line 8. > >>>>> BEGIN failed--compilation aborted at blastparser.pl line 8. > >>>>> > >>>>> > >>>>> I am a novice at unix and bioperl so I do not know how > >>>> to > >>>>> troubleshoot this, would you please hleo me? > >>>>> > >>>>> Thank you, > >>>>> > >>>>> Eric > >>>>> > >>>>> > >>>>> Eric F. Donaldson, Ph.D. > >>>>> Research Assistant Professor, Ralph Baric Lab > >>>>> University of North Carolina > >>>>> Department of Epidemiology > >>>>> > >>>>> > >>>>> > >>>> < > >>>> > >> eric_donaldson.vcf>_______________________________________________> > >>>> Bioperl-l mailing list > >>>>> Bioperl-l at lists.open-bio.org > >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>> > >>>> -- > >>>> Jason Stajich > >>>> jason.stajich at gmail.com > >>>> jason at bioperl.org > >>>> > >>>> > >>> > >>> Eric F. Donaldson, Ph.D. > >>> Research Assistant Professor, Ralph Baric Lab > >>> University of North Carolina > >>> Department of Epidemiology > >>> > >>> > >>> > >> > >> -- > >> Jason Stajich > >> jason.stajich at gmail.com > >> jason at bioperl.org > >> > >> > > > > Eric F. Donaldson, Ph.D. > > Research Assistant Professor, Ralph Baric Lab > > University of North Carolina > > Department of Epidemiology > > > > > > > > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > > Eric F. Donaldson, Ph.D. Research Assistant Professor, Ralph Baric Lab University of North Carolina Department of Epidemiology -------------- next part -------------- begin:vcard n:Donaldson;Eric fn:Eric F. Donaldson, PhD tel;work:919.966.3881 org:University of North Carolina, School of Medicine;Epidemiology adr:;;2107 McGavran-Greenberg Hall CB# 7435 ;Chapel Hill;NC;27599;USA email;internet:eric_donaldson at med.unc.edu email;home;internet:viralnerd at gmail.com title:Research Assistant Professor version:2.1 end:vcard From umjsm at leeds.ac.uk Mon Dec 14 16:58:03 2009 From: umjsm at leeds.ac.uk (Joan Segura Mora) Date: Mon, 14 Dec 2009 16:58:03 +0000 Subject: [Bioperl-l] extract and write a pdb chain In-Reply-To: <1260787172.7359.0.camel@limm-pc1254> References: <1260549882.6484.11.camel@limm-pc1254> <1260787172.7359.0.camel@limm-pc1254> Message-ID: <1260809883.7359.15.camel@limm-pc1254> Hi again, To extract a pdb chain in a file, I have had to do it adding atom by atom to a new structure. use Bio::Structure::IO; use strict; my $structio = Bio::Structure::IO->new(-file => "101m.pdb",'-format' => 'pdb'); my $struc = $structio->next_structure; my $new_struct = Bio::Structure::Entry->new( -id => 'structure_id'); for my $model ($struc->get_models){ $new_struct->add_model($model); for my $chain ($struc->get_chains) { $new_struct->add_chain($model,$chain); if($chain->id eq "A"){ foreach my $res ($struc->get_residues($chain)){ $new_struct->add_residue($chain,$res); foreach my $atom ($struc->get_atoms($res)){ $new_struct->add_atom($res,$atom); } } } last; } last; } my $out = Bio::Structure::IO->new(-file => ">out.pdb",'-format' => 'pdb'); $out->write_structure($new_struct); I suppose that there should be a more elegant way to do it. If someone knows it and can explain it I will be very grateful. kind regards, Joan On Mon, 2009-12-14 at 10:39 +0000, Joan Segura Mora wrote: > Hi Brian, > > I am not calling the method add_chain, I am calling the method chain > > http://doc.bioperl.org/releases/bioperl-1.0.1/Bio/Structure/Entry.html#POD6 > > and if I don't use as an argument an object of type > > Bio::Structure::Chain > > I get an error like this (-->depends of the argument<--) > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Supplied a -->Bio::Structure::Residue=HASH(0x11be6a0)<-- to chain, > we want a Bio::Structure::Chain or a list of these > > STACK: Error::throw > STACK: > Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:368 > STACK: > Bio::Structure::Entry::chain /usr/local/share/perl/5.8.8/Bio/Structure/Entry.pm:314 > STACK: read_pdb.pl:11 > ----------------------------------------------------------- > > > And if I use a Chain object I get the error that I told you. > > I have try this code: > > use Bio::Structure::IO; > use strict; > > my $structio = Bio::Structure::IO->new(-file => "101m.pdb",'-format' => > 'pdb'); > my $struc = $structio->next_structure; > my $new_entry = Bio::Structure::Entry->new( -id => 'structure_id'); > my $model = Bio::Structure::Model->new( -id => '0'); > > for my $chain ($struc->get_chains) { > if($chain->id eq "A"){ > $new_entry->add_chain($model,$chain); > > last; > } > } > $new_entry->add_model($model); > my $out = Bio::Structure::IO->new(-file => ">out.pdb",'-format' => > 'pdb'); > $out->write_structure($new_entry); > > > But I get an empty pdb > > HEADER DEFAULT CLASSIFICATION 24-JAN-70 > stru > REMARK > 1 > TER 1 A > 0 > MASTER > END > > I am trying a lot of combinations, but I can't write a single chain into > a file. I don't know what I am doing wrong. > > Thanks for helping > > regards, > Joan > > > On Fri, 2009-12-11 at 15:37 -0500, Brian Osborne wrote: > > Joan, > > > > It looks to me like the first argument to the add_chain() method has > > to be a Model object, the second is the Chain itself. See Structure/ > > Entry.pm, for example. However if you're seeing some documentation > > that says something else then tell us where, it needs to be corrected. > > > > In Bio::Structure an Entry consists of one or Models, each of which > > has one or more Chains. This allows you to build macromolecular > > complexes (an Entry), which could have more than one defined proteins > > or protein complexes (Models). > > > > Brian O. > > > > On Dec 11, 2009, at 11:44 AM, Joan Segura Mora wrote: > > > > > Hello, > > > > > > I am trying to do a very easy think but I don't get it. I want to > > > write > > > in a file a chain of a pdb. I have try a lot of thinks but what I > > > think > > > that it should work is the next script: > > > > > > use Bio::Structure::IO; > > > use strict; > > > > > > my $structio = Bio::Structure::IO->new(-file => "101m.pdb",'-format' > > > => > > > 'pdb'); > > > my $struc = $structio->next_structure; > > > > > > my $new_entry = Bio::Structure::Entry->new( -id => 'structure_id'); > > > > > > for my $chain ($struc->get_chains) { > > > if($chain->id eq "A"){ > > > $new_entry->chain($chain); > > > last; > > > } > > > } > > > > > > my $out = Bio::Structure::IO->new(-file => ">out.pdb",'-format' => > > > 'pdb');# > > > $out->write_structure($new_entry); > > > > > > it doesn't. I get the next error: > > > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > > MSG: add_chain: first argument needs to be a Model object () > > > > > > STACK: Error::throw > > > STACK: > > > Bio::Root::Root::throw /usr/local/share/perl/5.8.8/Bio/Root/Root.pm: > > > 368 > > > STACK: > > > Bio::Structure::Entry::add_chain /usr/local/share/perl/5.8.8/Bio/ > > > Structure/Entry.pm:335 > > > STACK: > > > Bio::Structure::Entry::get_chains /usr/local/share/perl/5.8.8/Bio/ > > > Structure/Entry.pm:391 > > > STACK: > > > Bio::Structure::Entry::chain /usr/local/share/perl/5.8.8/Bio/ > > > Structure/Entry.pm:304 > > > STACK: read_pdb.pl:10 > > > ----------------------------------------------------------- > > > > > > As far I understand the documentation, the method chain of the object > > > Bio::Structure::Entry requires an as input an object of type Chain. > > > > > > Any solution will be very welcome. > > > > > > best regards, > > > Joan > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From gowthaman.ramasamy at sbri.org Mon Dec 14 19:16:32 2009 From: gowthaman.ramasamy at sbri.org (Gowthaman Ramasamy) Date: Mon, 14 Dec 2009 11:16:32 -0800 Subject: [Bioperl-l] GO::Parser / GO::Model::Term In-Reply-To: <67E6A22C-6968-460D-B192-E129773A0BA5@vecna.com> Message-ID: Hi All, I have a list of GO terms. And would like to pull GO accessions for them. I can easily do the revere of it using get_term("GO::00000051"). But can someone tell me how to get the GO accessions from GO Terms , for eg: retrive GO accession for "citrulline metabolic process". Thanks very much, Gowtham From lsbrath at gmail.com Mon Dec 14 19:41:39 2009 From: lsbrath at gmail.com (Mgavi Brathwaite) Date: Mon, 14 Dec 2009 14:41:39 -0500 Subject: [Bioperl-l] Issues with loading BioPerl-1.6.0 on to my Mac Message-ID: <69367b8f0912141141n5bf94978k61dc6e31e54a4a8a@mail.gmail.com> Hello, I have loaded BioPerl -1.6.0 onto my Mac. When I run my script I get the following error message: Can't locate Bio/SeqIO.pm in @INC (@INC contains: /sw/lib/perl5 /sw/lib/perl5/darwin /System/Library/Perl/5.8.8/darwin-thread-multi-2level /System/Library/Perl/5.8.8 /Library/Perl/5.8.8/darwin-thread-multi-2level /Library/Perl/5.8.8 /Library/Perl /Network/Library/Perl/5.8.8/darwin-thread-multi-2level /Network/Library/Perl/5.8.8 /Network/Library/Perl /System/Library/Perl/Extras/5.8.8/darwin-thread-multi-2level /System/Library/Perl/Extras/5.8.8 /Library/Perl/5.8.6 /Library/Perl/5.8.1 .) at project_example.pl line 4. BEGIN failed--compilation aborted at project_example.pl line 4. I moved the BioPerl dir to /sw/lib/perl5 and I still get the error message. Any ideas? MEB From scott at scottcain.net Mon Dec 14 19:47:05 2009 From: scott at scottcain.net (Scott Cain) Date: Mon, 14 Dec 2009 14:47:05 -0500 Subject: [Bioperl-l] Issues with loading BioPerl-1.6.0 on to my Mac In-Reply-To: <69367b8f0912141141n5bf94978k61dc6e31e54a4a8a@mail.gmail.com> References: <69367b8f0912141141n5bf94978k61dc6e31e54a4a8a@mail.gmail.com> Message-ID: <4536f7700912141147ld16d67av1a58bbf5c1fc5e9e@mail.gmail.com> Hi Mgavi, I think Jason may have already started helping, but the question is: is SeqIO.pm anywhere in those directories? If not, why not? If so, why can't the perl you are using find it? Do you have more than one instance of perl on your machine (fairly likely if you are using a fink-installed BioPerl)? When you execute your script, which perl are you using? Scott On Mon, Dec 14, 2009 at 2:41 PM, Mgavi Brathwaite wrote: > Hello, > > I have loaded BioPerl -1.6.0 onto my Mac. When I run my script I get the > following error message: > > Can't locate Bio/SeqIO.pm in @INC (@INC contains: /sw/lib/perl5 > /sw/lib/perl5/darwin /System/Library/Perl/5.8.8/darwin-thread-multi-2level > /System/Library/Perl/5.8.8 /Library/Perl/5.8.8/darwin-thread-multi-2level > /Library/Perl/5.8.8 /Library/Perl > /Network/Library/Perl/5.8.8/darwin-thread-multi-2level > /Network/Library/Perl/5.8.8 /Network/Library/Perl > /System/Library/Perl/Extras/5.8.8/darwin-thread-multi-2level > /System/Library/Perl/Extras/5.8.8 /Library/Perl/5.8.6 /Library/Perl/5.8.1 .) > at project_example.pl line 4. > BEGIN failed--compilation aborted at project_example.pl line 4. > > I moved the BioPerl dir to /sw/lib/perl5 and I still get the error message. > Any ideas? > > MEB > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From bosborne11 at verizon.net Mon Dec 14 19:45:35 2009 From: bosborne11 at verizon.net (Brian Osborne) Date: Mon, 14 Dec 2009 14:45:35 -0500 Subject: [Bioperl-l] Issues with loading BioPerl-1.6.0 on to my Mac In-Reply-To: <69367b8f0912141141n5bf94978k61dc6e31e54a4a8a@mail.gmail.com> References: <69367b8f0912141141n5bf94978k61dc6e31e54a4a8a@mail.gmail.com> Message-ID: <38104B41-104B-42D7-94FA-30016E110BFD@verizon.net> Mgavi, So there's a directory called /sw/lib/perl5/Bio? Or is it called something else? Brian O. On Dec 14, 2009, at 2:41 PM, Mgavi Brathwaite wrote: > Hello, > > I have loaded BioPerl -1.6.0 onto my Mac. When I run my script I get > the > following error message: > > Can't locate Bio/SeqIO.pm in @INC (@INC contains: /sw/lib/perl5 > /sw/lib/perl5/darwin /System/Library/Perl/5.8.8/darwin-thread- > multi-2level > /System/Library/Perl/5.8.8 /Library/Perl/5.8.8/darwin-thread- > multi-2level > /Library/Perl/5.8.8 /Library/Perl > /Network/Library/Perl/5.8.8/darwin-thread-multi-2level > /Network/Library/Perl/5.8.8 /Network/Library/Perl > /System/Library/Perl/Extras/5.8.8/darwin-thread-multi-2level > /System/Library/Perl/Extras/5.8.8 /Library/Perl/5.8.6 /Library/Perl/ > 5.8.1 .) > at project_example.pl line 4. > BEGIN failed--compilation aborted at project_example.pl line 4. > > I moved the BioPerl dir to /sw/lib/perl5 and I still get the error > message. > Any ideas? > > MEB > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason at bioperl.org Mon Dec 14 21:42:09 2009 From: jason at bioperl.org (Jason Stajich) Date: Mon, 14 Dec 2009 13:42:09 -0800 Subject: [Bioperl-l] fasta format In-Reply-To: References: <36E9C2F3282347918FD3B3ACA0EC8126@jonas> Message-ID: <614B8A2C-3B17-4E3B-AAC5-3210C7435BB5@bioperl.org> you can read the man page from sean Eddy or use it exactly as I showed you sreformat fasta filename > filename.new you can also use the 1st example which is a bioperl solution. -jason On Dec 13, 2009, at 7:02 AM, Jonas Schaer wrote: > Hi Jason, > thank you very much for your answer. > i am sorry to bother u again but i'm afraid i need some help with > that because i don't see how to use sreformat? > i dont get it managed to write a script that works. > > thank u again :) > jonas > > > ----- Original Message ----- From: "Jason Stajich" > To: "Jonas Schaer" > Cc: > Sent: Tuesday, December 08, 2009 6:44 PM > Subject: Re: [Bioperl-l] fasta format > > >> you can run >> sreformat (HMMER) or bp_sreformat.pl script in scripts/utilties (or >> that is installed when you install the Bioperl scripts) >> $ bp_sreformat.pl -if fasta -of fasta -i yourfile.fa -o >> yournewfile.fa >> # rename it back >> $ mv yournewfile.fa yourfile.fa >> >> or >> $ sreformat fasta yourfile.fa > yournewfile.fa >> $ mv yournewfile.fa yourfile.fa >> >> >> -jason >> On Dec 8, 2009, at 7:21 AM, Jonas Schaer wrote: >> >>> Hi there, >>> I have a little question concerning bioperl. I have >>> BioPerl-1.6.1.tar.gz installed and i use the fasta.pm module to read >>> in some fasta files. first it worked fine, but now i have some >>> fastafiles in slightly different format (not all lines have the same >>> length!). >>> >>> ------------- EXCEPTION ------------- >>> MSG: Each line of the fasta entry must be the same length except the >>> last. >>> Line above #49 ' >>> ..' is 28 != 101 chars. >>> STACK Bio::DB::Fasta::calculate_offsets C:/Perl/site/lib/Bio/DB/ >>> Fasta.pm:771 >>> STACK Bio::DB::Fasta::index_file C:/Perl/site/lib/Bio/DB/Fasta.pm: >>> 681 >>> STACK Bio::DB::Fasta::new C:/Perl/site/lib/Bio/DB/Fasta.pm:491 >>> STACK Bio::DB::Fasta::newFh C:/Perl/site/lib/Bio/DB/Fasta.pm:513 >>> STACK main::readfasta blast_eval.pm:174 >>> STACK toplevel blast_eval.pm:83 >>> ------------------------------------- >>> >>> indexing was interrupted, so unlinking test.fasta.index at C:/Perl/ >>> site/lib/Bio/ >>> DB/Fasta.pm line 1054. >>> >>> >>> Is there any way to use these fasta files with diffrent length of >>> lines with this fasta.pm module or will i have to change the format >>> of my fasta-files(big databases...) ? >>> >>> Thanks in advance for any help! >>> >>> Regards, Jonas >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Jason Stajich >> jason.stajich at gmail.com >> jason at bioperl.org > > > -------------------------------------------------------------------------------- > > > > No virus found in this incoming message. > Checked by AVG - www.avg.com > Version: 8.5.426 / Virus Database: 270.14.98/2552 - Release Date: > 12/08/09 07:34:00 > -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From cjfields at illinois.edu Tue Dec 15 01:23:05 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 14 Dec 2009 19:23:05 -0600 Subject: [Bioperl-l] LocatableSeq NSE format - suggested changes Message-ID: <579AEE4E-A79C-496B-881C-226616BF0A2B@illinois.edu> All, The current output for NSE format (Name/Start-End) via Bio::LocatableSeq::get_nse() currently doesn't allow for strandedness. I have seen two variations of NSE that incorporate strandedness: 1) Stockholm Rfam reverses start and end if the strand == -1 chrY/598-1 2) Sheldon McKay's Gbrowse_syn uses Name(strand)/start-end rice-3(+)/16598648-16600199 The former breaks fewer things within BioPerl, but the latter seems more explicit. Any preferences? Do we want a new method that creates this, and deprecate out simple non-stranded NSE? chris From bernd.web at gmail.com Tue Dec 15 08:37:44 2009 From: bernd.web at gmail.com (Bernd Web) Date: Tue, 15 Dec 2009 09:37:44 +0100 Subject: [Bioperl-l] GO::Parser / GO::Model::Term In-Reply-To: References: <67E6A22C-6968-460D-B192-E129773A0BA5@vecna.com> Message-ID: <716af09c0912150037k513c6efah442a236cb323e14e@mail.gmail.com> Dear Gowthaman, A non-BioPerl solution: the Ontology Lookup service at EBI. It also provides a web service interface. http://www.ebi.ac.uk/ontology-lookup/ citrulline metabolic process has to be selected from the pull-down list in the interactive page. This will return the ID (GO:0000052) and addional info: definition The chemical reactions and pathways involving citrulline, N5-carbamoyl-L-ornithine, an alpha amino acid not found in proteins. preferred name citrulline metabolic process exact synonym citrulline metabolism subset Prokaryotic GO subset xref_definition ISBN:209853"Oxford Dictionary of Biochemistry and Molecular Biology" The webservice is described at http://www.ebi.ac.uk/ontology-lookup/WSDLDocumentation.do Regards, Bernd On Mon, Dec 14, 2009 at 8:16 PM, Gowthaman Ramasamy wrote: > > Hi All, > I have a list of GO terms. And would like to pull GO accessions for them. > I can easily do the revere of it using get_term("GO::00000051"). > > But can someone tell me how to get the GO accessions from GO Terms , for eg: retrive GO accession for "citrulline metabolic process". > > > Thanks very much, > Gowtham > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From fs5 at sanger.ac.uk Tue Dec 15 10:38:40 2009 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Tue, 15 Dec 2009 10:38:40 +0000 Subject: [Bioperl-l] parse EMBL Feature Table only In-Reply-To: <0F8203F6-06D8-43EF-BB35-EB723F4B9DFA@sbc.su.se> References: <1260793098.17180.184.camel@deskpro15336.dynamic.sanger.ac.uk> <0F8203F6-06D8-43EF-BB35-EB723F4B9DFA@sbc.su.se> Message-ID: <1260873520.17180.215.camel@deskpro15336.dynamic.sanger.ac.uk> Thanks Dave, good to know that I haven't overlooked something bleedingly obvious in Bioperl that already does this :-) No problem, I have already implemented a simple parser to do it, which works fine for my files. Thanks Frank On Mon, 2009-12-14 at 15:06 +0100, Dave Messina wrote: > Hi Frank, > > You will need to look at the feature table parsing code that Bio::SeqIO::embl itself uses to read those lines, probably the _read_FTHelper_EMBL method: > http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/SeqIO/embl.html#POD12 > > Since you're trying to parse what is effectively a part of an EMBL record, and a somewhat complicated part at that, as you might imagine this could be a little hairy. > > It might be easier to go the route you started down: add a fake header and a (relatively long) fake sequence, and go through Bio::SeqIO in the normal way. > > > Dave > > > PS ? I suspect you may already be familiar with it, but for an overview on how to get at data in feature tables, look at the Feature Annotation HOWTO: > > http://www.bioperl.org/wiki/HOWTO:Feature-Annotation > > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From rmb32 at cornell.edu Tue Dec 15 15:09:43 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Tue, 15 Dec 2009 07:09:43 -0800 Subject: [Bioperl-l] AGI's fpc stuff: Bio::Map::Physical, Bio::MapIO::fpc, etc Message-ID: <4B27A6B7.6090709@cornell.edu> Hi all, Recently I caught an interesting thing related to making GFF files out of FPC maps built recently using Bio::MapIO;:fpc. All of the coordinates in the resulting GFF3 and the sizes of the contigs and clones seem to be dilated by 4x from where they should be. This didn't happen with some earlier FPC datasets I ran through these modules. I haven't gone through any of this very thoroughly, but I notice in Bio::Map::Physical::print_gffstyle() at line 765 there's a line like 'my $basepair = 4096', and the routine goes on to use $basepair as a sort of multiplier for converting the native physical map units into basepairs for GFF-style output. This makes me wonder if the newer FPC datasets coming out require a different $basepairs value, maybe 1024? Are the original authors of these modules still around on this list? Rob -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From tristan.lefebure at gmail.com Tue Dec 15 17:18:26 2009 From: tristan.lefebure at gmail.com (Tristan Lefebure) Date: Tue, 15 Dec 2009 12:18:26 -0500 Subject: [Bioperl-l] ncurses and bioperl? Message-ID: <200912151218.26357.tristan.lefebure@gmail.com> Hello, (Be careful: the following is a very naive question) Something that I find myself missing is a simple way to look at alignments and trees on remote machines where I don't have access to X. Since, (1) one can make wonderful terminal programs like screen and emacs by using ncurses, (2) that alignment and tree objects are already well handled in bioperl, and (3) that there is a CPAN Curses module; doing 1+2+3, may I dream of a curse/bioperl perl program to render alignment and trees? I suppose a plain C program would be much better, but well I am a biologist... Thanks, --Tristan From jason at bioperl.org Tue Dec 15 17:50:52 2009 From: jason at bioperl.org (Jason Stajich) Date: Tue, 15 Dec 2009 09:50:52 -0800 Subject: [Bioperl-l] ncurses and bioperl? In-Reply-To: <200912151218.26357.tristan.lefebure@gmail.com> References: <200912151218.26357.tristan.lefebure@gmail.com> Message-ID: not to say this isn't a good idea, but currently for curses I would use the treeviewing with retree from PHYLIP and for short read alignments the samtools tview or Gambit (MarthLab) works great or something like ralee for viewing MSA alignments (though targeted for RNA editing) http://personalpages.manchester.ac.uk/staff/sam.griffiths-jones/software/ralee/ http://dx.doi.org/10.1093/bioinformatics/bth489 Just that there are prior examples so would be able to learn from them if you still wanted to roll your own here. -jason On Dec 15, 2009, at 9:18 AM, Tristan Lefebure wrote: > Hello, > > (Be careful: the following is a very naive question) > > Something that I find myself missing is a simple way to look > at alignments and trees on remote machines where I don't > have access to X. Since, > (1) one can make wonderful terminal programs like screen > and emacs by using ncurses, > (2) that alignment and tree objects are already well > handled in bioperl, and > (3) that there is a CPAN Curses module; > > doing 1+2+3, may I dream of a curse/bioperl perl program to > render alignment and trees? I suppose a plain C program > would be much better, but well I am a biologist... > > Thanks, > > --Tristan > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From roy.chaudhuri at gmail.com Tue Dec 15 17:47:26 2009 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Tue, 15 Dec 2009 17:47:26 +0000 Subject: [Bioperl-l] ncurses and bioperl? In-Reply-To: <200912151218.26357.tristan.lefebure@gmail.com> References: <200912151218.26357.tristan.lefebure@gmail.com> Message-ID: <4B27CBAE.5000303@gmail.com> Hi Tristan, Not a Bioperl solution, but retree from the Phylip package displays trees in a terminal. Roy. On 15/12/2009 17:18, Tristan Lefebure wrote: > Hello, > > (Be careful: the following is a very naive question) > > Something that I find myself missing is a simple way to look > at alignments and trees on remote machines where I don't > have access to X. Since, > (1) one can make wonderful terminal programs like screen > and emacs by using ncurses, > (2) that alignment and tree objects are already well > handled in bioperl, and > (3) that there is a CPAN Curses module; > > doing 1+2+3, may I dream of a curse/bioperl perl program to > render alignment and trees? I suppose a plain C program > would be much better, but well I am a biologist... > > Thanks, > > --Tristan > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From nml5566 at gmail.com Tue Dec 15 21:37:30 2009 From: nml5566 at gmail.com (Nathan Liles) Date: Tue, 15 Dec 2009 15:37:30 -0600 Subject: [Bioperl-l] Bio::Ontology::OBOEngine for parsing obo files? Message-ID: <81a20b1e0912151337q786b6c35se18328173ec27abd@mail.gmail.com> Is the Bio::Ontology::OBOEngine module working or being currently maintained? I tried following the documentation in the module: * use Bio::Ontology::OBOEngine; my $parser = Bio::Ontology::OBOEngine->new ( -file => "gene_ontology.obo" ); my $engine = $parser->parse(); *But, it throws an error when I run the file saying 'Can't locate object method "parse" '. Does anyone have any experience getting this module working; or, is there any alternative bioperl module to extract terms and relationships out of sequence ontology files? From hlapp at drycafe.net Tue Dec 15 22:05:10 2009 From: hlapp at drycafe.net (Hilmar Lapp) Date: Tue, 15 Dec 2009 17:05:10 -0500 Subject: [Bioperl-l] Bio::Ontology::OBOEngine for parsing obo files? In-Reply-To: <81a20b1e0912151337q786b6c35se18328173ec27abd@mail.gmail.com> References: <81a20b1e0912151337q786b6c35se18328173ec27abd@mail.gmail.com> Message-ID: That shouldn't happen I suppose, but you're not supposed really to use the engine directly. Rather it will be used as a backing parser by the Bio::OntologyIO parser you choose. Have you tried that route and found it not to work? -hilmar On Dec 15, 2009, at 4:37 PM, Nathan Liles wrote: > Is the Bio::Ontology::OBOEngine module working or being currently > maintained? I tried following the documentation in the module: > > * use Bio::Ontology::OBOEngine; > > my $parser = Bio::Ontology::OBOEngine->new > ( -file => "gene_ontology.obo" ); > > my $engine = $parser->parse(); > > *But, it throws an error when I run the file saying 'Can't locate > object > method "parse" '. Does anyone have any experience getting this module > working; or, is there any alternative bioperl module to extract > terms and > relationships out of sequence ontology files? > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From David.Messina at sbc.su.se Wed Dec 16 09:58:16 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 16 Dec 2009 10:58:16 +0100 Subject: [Bioperl-l] LocatableSeq NSE format - suggested changes In-Reply-To: <579AEE4E-A79C-496B-881C-226616BF0A2B@illinois.edu> References: <579AEE4E-A79C-496B-881C-226616BF0A2B@illinois.edu> Message-ID: I'd tend to be inclined more towards option 1 over option 2 because option 2 pollutes the name field. (Although that's not a huge problem if the '(strand)' is always just before the '/'.) It's a question of whether to optimize human-readability over machine-readabilitiy: option 2 favors the former over the latter, and option 1 the reverse. Whichever way you go, I think > a new method that creates this, and deprecate[s] out simple non-stranded NSE would be great. Dave From maj at fortinbras.us Wed Dec 16 12:51:24 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 16 Dec 2009 07:51:24 -0500 Subject: [Bioperl-l] LocatableSeq NSE format - suggested changes In-Reply-To: <579AEE4E-A79C-496B-881C-226616BF0A2B@illinois.edu> References: <579AEE4E-A79C-496B-881C-226616BF0A2B@illinois.edu> Message-ID: <6723123C0ABD447190639AE1F5D1A6A7@NewLife> I'm with Dave; option 1 is cleaner. The only problem might be the automatic interpretation of older output as always plus strand, but presumably these would have had to record the strandedness explicitly elsewhere, so they would be updatable. I'm definitely for making strandedness part of the spec in some way. cheers MAJ ----- Original Message ----- From: "Chris Fields" To: "BioPerl List" Sent: Monday, December 14, 2009 8:23 PM Subject: [Bioperl-l] LocatableSeq NSE format - suggested changes > All, > > The current output for NSE format (Name/Start-End) via > Bio::LocatableSeq::get_nse() currently doesn't allow for strandedness. I have > seen two variations of NSE that incorporate strandedness: > > 1) Stockholm Rfam reverses start and end if the strand == -1 > > chrY/598-1 > > 2) Sheldon McKay's Gbrowse_syn uses Name(strand)/start-end > > rice-3(+)/16598648-16600199 > > The former breaks fewer things within BioPerl, but the latter seems more > explicit. Any preferences? Do we want a new method that creates this, and > deprecate out simple non-stranded NSE? > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From tuco at pasteur.fr Wed Dec 16 14:14:28 2009 From: tuco at pasteur.fr (Emmanuel Quevillon) Date: Wed, 16 Dec 2009 15:14:28 +0100 Subject: [Bioperl-l] Data missing into Annotation object using Bio::SeqIO (Genbank) Message-ID: <4B28EB44.3080006@pasteur.fr> Hi, I've wrote a small Genbank parser few months ago before BioPerl release 1.6.0. I tried to use my code once again but now the output of my parser is empty. It looks like Annotation from seqfeatures is not filled anymore. Here is the code I used previously: while(my $seq = $streamer->next_seq()){ #We only want to retrieve CDS features... foreach my $feat (grep { $_->primary_tag() eq 'CDS' } $seq->get_SeqFeatures()){ print $ofh join("#", $feat->annotation()->get_Annotations('locus_tag'), # Acc num $feat->annotation()->get_Annotations('gene') ? $feat->annotation()->get_Annotations('gene') # Gene name : $feat->annotation()->get_Annotations('locus_tag'), $feat->annotation()->get_Annotations('product'), # Description ),"\n"; } } $feat is a Bio::SeqFeature::Generic object If I print Dumper($feat->annotation()) here is the output : $VAR1 = bless( { '_typemap' => bless( { '_type' => { 'comment' => 'Bio::Annotation::Comment', 'reference' => 'Bio::Annotation::Reference', 'dblink' => 'Bio::Annotation::DBLink' } }, 'Bio::Annotation::TypeManager' ), '_annotation' => {} }, 'Bio::Annotation::Collection' ); Have some changes been made into the way annotation object is populated? Thanks for any clue and sorry if my question look stupid Regards Emmanuel -- ------------------------- Emmanuel Quevillon Biological Software and Databases Group Institut Pasteur +33 1 44 38 95 98 tuco at_ pasteur dot fr ------------------------- From cjfields at illinois.edu Wed Dec 16 15:09:56 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 16 Dec 2009 09:09:56 -0600 Subject: [Bioperl-l] Data missing into Annotation object using Bio::SeqIO (Genbank) In-Reply-To: <4B28EB44.3080006@pasteur.fr> References: <4B28EB44.3080006@pasteur.fr> Message-ID: <29CB0088-99C1-417E-BB3B-56FE7EC135F9@illinois.edu> Emmanuel, The previous behavior in the 1.5.x series was to store feature tags as Bio::Annotation. The problem had been the way this was implemented was considered unsatisfactory for various reasons, so we reverted back to using simple tag-value pairs as the default. You can get at the data this way (from the Feature/Annotation HOWTO): for my $feat_object ($seq_object->get_SeqFeatures) { print "primary tag: ", $feat_object->primary_tag, "\n"; for my $tag ($feat_object->get_all_tags) { print " tag: ", $tag, "\n"; for my $value ($feat_object->get_tag_values($tag)) { print " value: ", $value, "\n"; } } } You can also convert all the tag-value data into a Bio::Annotation::Collection using the Bio::SeqFeature::AnnotationAdaptor, but this is completely optional. chris On Dec 16, 2009, at 8:14 AM, Emmanuel Quevillon wrote: > Hi, > > I've wrote a small Genbank parser few months ago before BioPerl release 1.6.0. > I tried to use my code once again but now the output of my parser is empty. > It looks like Annotation from seqfeatures is not filled anymore. > > Here is the code I used previously: > > while(my $seq = $streamer->next_seq()){ > > #We only want to retrieve CDS features... > foreach my $feat (grep { $_->primary_tag() eq 'CDS' } $seq->get_SeqFeatures()){ > print $ofh join("#", > $feat->annotation()->get_Annotations('locus_tag'), # Acc num > $feat->annotation()->get_Annotations('gene') > ? $feat->annotation()->get_Annotations('gene') # Gene name > : $feat->annotation()->get_Annotations('locus_tag'), > $feat->annotation()->get_Annotations('product'), # Description > ),"\n"; > } > } > > $feat is a Bio::SeqFeature::Generic object > > If I print Dumper($feat->annotation()) here is the output : > > $VAR1 = bless( { > '_typemap' => bless( { > '_type' => { > 'comment' => 'Bio::Annotation::Comment', > 'reference' => 'Bio::Annotation::Reference', > 'dblink' => 'Bio::Annotation::DBLink' > } > }, 'Bio::Annotation::TypeManager' ), > '_annotation' => {} > }, 'Bio::Annotation::Collection' ); > > Have some changes been made into the way annotation object is populated? > > Thanks for any clue and sorry if my question look stupid > > Regards > > Emmanuel > > -- > ------------------------- > Emmanuel Quevillon > Biological Software and Databases Group > Institut Pasteur > +33 1 44 38 95 98 > tuco at_ pasteur dot fr > ------------------------- > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From tuco at pasteur.fr Wed Dec 16 15:37:45 2009 From: tuco at pasteur.fr (Emmanuel Quevillon) Date: Wed, 16 Dec 2009 16:37:45 +0100 Subject: [Bioperl-l] Data missing into Annotation object using Bio::SeqIO (Genbank) In-Reply-To: <29CB0088-99C1-417E-BB3B-56FE7EC135F9@illinois.edu> References: <4B28EB44.3080006@pasteur.fr> <29CB0088-99C1-417E-BB3B-56FE7EC135F9@illinois.edu> Message-ID: <4B28FEC9.1080509@pasteur.fr> On 12/16/2009 04:09 PM, Chris Fields wrote: > Emmanuel, > > The previous behavior in the 1.5.x series was to store feature tags as Bio::Annotation. The problem had been the way this was implemented was considered unsatisfactory for various reasons, so we reverted back to using simple tag-value pairs as the default. You can get at the data this way (from the Feature/Annotation HOWTO): > > for my $feat_object ($seq_object->get_SeqFeatures) { > print "primary tag: ", $feat_object->primary_tag, "\n"; > for my $tag ($feat_object->get_all_tags) { > print " tag: ", $tag, "\n"; > for my $value ($feat_object->get_tag_values($tag)) { > print " value: ", $value, "\n"; > } > } > } > > You can also convert all the tag-value data into a Bio::Annotation::Collection using the Bio::SeqFeature::AnnotationAdaptor, but this is completely optional. > > chris > > Hi Chris Thanks for the infos. I indeed revert back to using $feat->get_tag_values() and it works as previously. For my small problem I can keep this solution which far adapted for my problem. Regards Emmanuel -- ------------------------- Emmanuel Quevillon Biological Software and Databases Group Institut Pasteur +33 1 44 38 95 98 tuco at_ pasteur dot fr ------------------------- From sung at bio.cc Wed Dec 16 17:55:16 2009 From: sung at bio.cc (Sungsam Gong) Date: Wed, 16 Dec 2009 17:55:16 +0000 Subject: [Bioperl-l] pdb.pm and annotations Message-ID: <2dade3480912160955h4f77277dv8e6b47b7b0fda23a@mail.gmail.com> Hi, Wanted to get pubmed identifier from a PDB file using Bio::Structure, so hacked the code. Knew that Bio::Structure::IO::pdb.pm get relevant info from either 'JRNL' or 'REMARK 1'. However could not see any actual code parsing 'PMID'. >From pdb.pm, what I see: sub _read_PDB_jrnl { ... $auth = $self->_concatenate_lines($auth,$rol) if ($subr eq "AUTH"); $titl = $self->_concatenate_lines($titl,$rol) if ($subr eq "TITL"); $edit = $self->_concatenate_lines($edit,$rol) if ($subr eq "EDIT"); $ref = $self->_concatenate_lines($ref ,$rol) if ($subr eq "REF"); $publ = $self->_concatenate_lines($publ,$rol) if ($subr eq "PUBL"); $refn = $self->_concatenate_lines($refn,$rol) if ($subr eq "REFN"); ... } sub _read_PDB_remark_1 { ... $auth = $self->_concatenate_lines($auth,$rol) if ($subr eq "AUTH"); $titl = $self->_concatenate_lines($titl,$rol) if ($subr eq "TITL"); $edit = $self->_concatenate_lines($edit,$rol) if ($subr eq "EDIT"); $ref = $self->_concatenate_lines($ref ,$rol) if ($subr eq "REF"); $publ = $self->_concatenate_lines($publ,$rol) if ($subr eq "PUBL"); $refn = $self->_concatenate_lines($refn,$rol) if ($subr eq "REFN"); ... } >From my script, I did: ($struc->annotation->get_Annotations('reference'))[0]->authors ($struc->annotation->get_Annotations('reference'))[0]->title or my $hash_ref=($struc->annotation->get_Annotations('reference'))[0]->hash_tree for my $key (keys %{$hash_ref}) { print $key,": ",$hash_ref->{$key},"\n"; } Any plan to include a code chopping 'PMID' out? Or did I miss something? Cheers, Sung From nml5566 at gmail.com Wed Dec 16 19:42:57 2009 From: nml5566 at gmail.com (Nathan Liles) Date: Wed, 16 Dec 2009 13:42:57 -0600 Subject: [Bioperl-l] Bio::Ontology::OBOEngine for parsing obo files? In-Reply-To: References: <81a20b1e0912151337q786b6c35se18328173ec27abd@mail.gmail.com> Message-ID: <81a20b1e0912161142m77051529se59b4621a0add13b@mail.gmail.com> Actually, yes I did find that and it works very well. Now I'm wondering, is it possible to search for similar terms using a string instead of a Bio::Ontology term object? For examle, I'd like to search for the synonym: "transcription start site" and have it return all similar terms. But, it throws an error if I pass in a simple query like that. -Nathan On Tue, Dec 15, 2009 at 4:05 PM, Hilmar Lapp wrote: > That shouldn't happen I suppose, but you're not supposed really to use the > engine directly. Rather it will be used as a backing parser by the > Bio::OntologyIO parser you choose. Have you tried that route and found it > not to work? > > -hilmar > > > On Dec 15, 2009, at 4:37 PM, Nathan Liles wrote: > > Is the Bio::Ontology::OBOEngine module working or being currently >> maintained? I tried following the documentation in the module: >> >> * use Bio::Ontology::OBOEngine; >> >> my $parser = Bio::Ontology::OBOEngine->new >> ( -file => "gene_ontology.obo" ); >> >> my $engine = $parser->parse(); >> >> *But, it throws an error when I run the file saying 'Can't locate object >> method "parse" '. Does anyone have any experience getting this module >> working; or, is there any alternative bioperl module to extract terms and >> relationships out of sequence ontology files? >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > =========================================================== > > > > > From cjfields1 at gmail.com Thu Dec 17 00:53:50 2009 From: cjfields1 at gmail.com (Chris Fields) Date: Wed, 16 Dec 2009 16:53:50 -0800 (PST) Subject: [Bioperl-l] Test post from Google Groups Message-ID: Howdy from Google Groups From cjfields1 at gmail.com Thu Dec 17 01:01:38 2009 From: cjfields1 at gmail.com (Chris Fields) Date: Wed, 16 Dec 2009 17:01:38 -0800 (PST) Subject: [Bioperl-l] bioperl-l Google Groups mirror Message-ID: <87fe54c9-766e-458a-bc09-0a658d050453@m3g2000yqf.googlegroups.com> I would like to announce (with the tremendous help of Hilmar Lapp) the creation of a mirror for the BioPerl mail list, if the last post didn't already give it away. http://groups.google.com/group/bioperl-l One can join the group and submit posts via the Google Groups web interface or via email. Have fun! chris From ocarnorsk138 at gmail.com Thu Dec 17 01:12:21 2009 From: ocarnorsk138 at gmail.com (Ocar Campos) Date: Wed, 16 Dec 2009 17:12:21 -0800 (PST) Subject: [Bioperl-l] Test post from Google Groups In-Reply-To: References: Message-ID: <03416808-ec4b-44b3-8269-6743a26b5368@k4g2000yqb.googlegroups.com> testing back from google group! On Dec 16, 9:53?pm, Chris Fields wrote: > Howdy from Google Groups > _______________________________________________ > Bioperl-l mailing list > Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l From p.j.a.cock at googlemail.com Thu Dec 17 10:50:23 2009 From: p.j.a.cock at googlemail.com (Peter) Date: Thu, 17 Dec 2009 02:50:23 -0800 (PST) Subject: [Bioperl-l] bioperl-l Google Groups mirror In-Reply-To: <87fe54c9-766e-458a-bc09-0a658d050453@m3g2000yqf.googlegroups.com> References: <87fe54c9-766e-458a-bc09-0a658d050453@m3g2000yqf.googlegroups.com> Message-ID: On Dec 17, 1:01?am, Chris Fields wrote: > I would like to announce (with the tremendous help of Hilmar Lapp) the > creation of a mirror for the BioPerl mail list, if the last post > didn't already give it away. > > http://groups.google.com/group/bioperl-l > > One can join the group and submit posts via the Google Groups web > interface or via email. ?Have fun! > > chris Sounds particularly good in the long run (once there is enough of an archive on Google Groups to make searching there useful). Does this mean a Google Groups user doesn't have to be subscribed to the mailing list to post (since the mailing list normally only allows subscribers to post)? Peter From David.Messina at sbc.su.se Thu Dec 17 12:25:49 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 17 Dec 2009 13:25:49 +0100 Subject: [Bioperl-l] bioperl-l Google Groups mirror In-Reply-To: References: <87fe54c9-766e-458a-bc09-0a658d050453@m3g2000yqf.googlegroups.com> Message-ID: <1D13A126-0A51-4815-89D6-664AC062C2AD@sbc.su.se> Very nice, Chris and Hilmar! That'll be great. > Does this mean a Google Groups user doesn't have to be subscribed > to the mailing list to post (since the mailing list normally only > allows subscribers to post)? I think that's right. From the Google groups page: > You can join (and post to) the list either here through Google Groups, or at the BioPerl-l mailing list home, using the web-interface or email, respectively. Dave From cjfields at illinois.edu Thu Dec 17 13:21:46 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 17 Dec 2009 07:21:46 -0600 Subject: [Bioperl-l] bioperl-l Google Groups mirror In-Reply-To: <1D13A126-0A51-4815-89D6-664AC062C2AD@sbc.su.se> References: <87fe54c9-766e-458a-bc09-0a658d050453@m3g2000yqf.googlegroups.com> <1D13A126-0A51-4815-89D6-664AC062C2AD@sbc.su.se> Message-ID: <209F1321-37DD-4B6C-A153-8A5AA0EF3E0A@illinois.edu> On Dec 17, 2009, at 6:25 AM, Dave Messina wrote: > Very nice, Chris and Hilmar! That'll be great. > > > >> Does this mean a Google Groups user doesn't have to be subscribed >> to the mailing list to post (since the mailing list normally only >> allows subscribers to post)? > > > I think that's right. From the Google groups page: > >> You can join (and post to) the list either here through Google Groups, or at the BioPerl-l mailing list home, using the web-interface or email, respectively. > > > > > Dave It is moderated by user to deal with spam. Hilmar's already a manager/co-owner, and either of us can add more as needed. chris From hlapp at drycafe.net Thu Dec 17 14:52:33 2009 From: hlapp at drycafe.net (Hilmar Lapp) Date: Thu, 17 Dec 2009 09:52:33 -0500 Subject: [Bioperl-l] bioperl-l Google Groups mirror In-Reply-To: References: <87fe54c9-766e-458a-bc09-0a658d050453@m3g2000yqf.googlegroups.com> Message-ID: <56214506-9BE7-4761-9E87-3A43D3707A29@drycafe.net> On Dec 17, 2009, at 5:50 AM, Peter wrote: > Does this mean a Google Groups user doesn't have to be subscribed > to the mailing list to post Yes. They can post through the Google Groups web interface. The email address for mirrored groups is the one of the list being mirrored though, bioperl-l at bioperl.org in this case, and so in order to post by email you still have to be subscribed at the bioperl-l list. At least that's what the docs at Google say. I haven't tried yet posting to the group at the bioperl-l at googlegroups dot com email under an email address that isn't subscribed to bioperl-l at bioperl dot org. Maybe it actually would work, contrary to docs. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From jay at jays.net Thu Dec 17 17:05:24 2009 From: jay at jays.net (Jay Hannah) Date: Thu, 17 Dec 2009 11:05:24 -0600 Subject: [Bioperl-l] bioperl-l Google Groups mirror In-Reply-To: <56214506-9BE7-4761-9E87-3A43D3707A29@drycafe.net> References: <87fe54c9-766e-458a-bc09-0a658d050453@m3g2000yqf.googlegroups.com> <56214506-9BE7-4761-9E87-3A43D3707A29@drycafe.net> Message-ID: <9BDF08A3-67E0-4F5E-8429-11AE586F6504@jays.net> On Dec 17, 2009, at 8:52 AM, Hilmar Lapp wrote: > I haven't tried yet posting to the group at the bioperl-l at googlegroups dot com email under an email address that isn't subscribed to bioperl-l at bioperl dot org. Maybe it actually would work, contrary to docs. In my experience (and ignoring a brief glitch this summer) moderation of new members works great. Almost zero spam gets through. Not as convenient for the admin as MailMan self-service email verification, but perhaps easier for some users and not too much admin work if you don't have too many new legitimate members every month. Here is the configuration set I recommend: http://clab.ist.unomaha.edu/~jhannah/tmp/google_groups.png Your membership roles will end up with quite a few junk accounts, but those bots can't post, so it's not that big a deal. I purge mine manually once a year or so. HTH, j http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah From robert.bradbury at gmail.com Thu Dec 17 19:42:54 2009 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Thu, 17 Dec 2009 14:42:54 -0500 Subject: [Bioperl-l] Remote blast fork errors / Process limit restrictions In-Reply-To: <39420695-FC14-446F-A4C2-C187E2221E44@bioperl.org> References: <39420695-FC14-446F-A4C2-C187E2221E44@bioperl.org> Message-ID: Just to close out the issue of bioperl forking (in particular accesses to external databases through get_sequence) which involves individual database sub-modules and not collecting its children. As it turns out the code does do an explicit fork, it looks like so the child process can read from the database while the parent process manipulates the data as it becomes available. Now, one could argue that a threaded model might be better since now threads are fairly standard OS tools in current environments. But I couldn't find any functions which actually wait for the forked process (presumably because they are created for "future" use). But nor is there any indication in the pages I've found in most of the documentation (which is spread across the web) or Wiki that explain that "creating child processes" is how these functions work and one *needs* to collect those children after each use or else zombie processes will accumulate, which on "reasonable" systems with per-user process limits will create problems for proper program functioning. Nor (it would appear) does the parent process setup a SIGCHLD "catcher" which could collect the processes once they exit (which I expect in the case of "get_sequence" would be after closing of the socket which actually fetched the sequence from Genbank. It can be resolved easily enough by adding a call after each use of these functions: $kid = waitpid(-1, WNOHANG); But typically, as a programmer, I should not be responsible for having to clean up the leftovers of library calls (unless said cleanup requirements are clearly documented). But to a "newbie" using the functions, coming from a functional background (C), not an OO background (which at least I would tend to view as a wart on the otherwise robust Perl language), there are two problems 1. The lack of documentation and examples explaining how the functions work and how they must be handled at a higher level (by executing explicit wait system calls). 2. The lack of code in the BioPerl functions to deal with the forked processes which they create. Functional programmers have a perspective -- if you create it -- you have to clean it up. It would appear that in the transition to OO programming (or perhaps simply for expediency) that detail was left out of both (either/and) the documentation and the code. From this standpoint one could view garbage collectors as being fundamentally evil -- because they gloss over the fact that programmers should know what they are doing and when they are doing it. So, everywhere in the documentation where there is a get_sequence call (or anything which accesses an external database which causes a fork to occur) there should be a modification as I have outlined above -- or else the code should be corrected so orphaned children are always collected and not allowed to accumulate. From robert.bradbury at gmail.com Thu Dec 17 20:23:38 2009 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Thu, 17 Dec 2009 15:23:38 -0500 Subject: [Bioperl-l] Remote blast fork errors / Process limit restrictions In-Reply-To: References: <39420695-FC14-446F-A4C2-C187E2221E44@bioperl.org> Message-ID: Oh, yes, in case it was not clear, the fork calls which fails is in DB/WebDBSeqI.pm: line 722 defined(my $pid = fork) or $self->throw("'Couldn't fork: $!"); And of course that is because Linux has reached the process limits for the user (due to accumulated background processes which are uncollected). And they could be resolved by simply executing a simple waitpid call for prior orphaned children before forking [1] But such a succinct solution would violate "functional" programming rules -- clean up what you create -- instead they would tend to fall into the OO camp -- "Oh don't worry the garbage collector will take care of it". Green programming is a little less cavalier. Robert 1. IMO, a very very real problem with programming today is that there is no connection between programmers and the cost of their programs. How many programmers know the instruction cycle time of their computers, what does an instruction cost in terms of W consumed, W wasted (heat generation), fruitless scanning over uncollected zombie processes, etc. It may be that only that programmers who grew up in the era when CPU cycles were expensive (300 ns/cycle) who know what each instruction required in terms of cycles consider these perspectives. Now things (cpu use, processor use, etc) tend to be swept under the rug and it appears that that is the case with the standard implementation of bioper. The documentation does not clearly state that additional sub-processes may be created and need to be collected. You are providing a utility that only works "this much". And guess what -- I happen to have run into the "this". From cjfields at illinois.edu Thu Dec 17 20:25:56 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 17 Dec 2009 14:25:56 -0600 Subject: [Bioperl-l] Remote blast fork errors / Process limit restrictions In-Reply-To: References: <39420695-FC14-446F-A4C2-C187E2221E44@bioperl.org> Message-ID: Robert, I have previously outlined specifically why you are seeing the fork issue, and a possible solution. IIRC it primarily has to do with you trying to do something more advanced using the (very basic) Bio::Perl procedural interface, something along the lines of pulling a sequence and using RemoteBlast. Retrieving a sequence from a remote database is a forked process on most OS's (I think Win is the sole exception) and occurs internally in Bio::Perl via Bio::DB::GenBank. Setting up your own pipeline, using Bio::DB::GenBank (set to use temp files), followed by Bio::Tools::Run::RemoteBlast or Bio::Perl, are options in the meantime. Trying to catch signals can be notoriously flaky cross-platform and cross perl versions; I recall running into problems with CygWin and OS X. We can modify Bio::Perl to use a temp file instead, which avoids the whole use of forks altogether, and is probably the best long-term solution. My last bit: I don't usually say this, primarily b/c it's misconstrued by some, but 'patches are always welcome'. What doesn't work is just telling us to arbitrarily change code w/o indicating exactly where to do so. The tone you use, which comes off a tad condescending, can be abrasive and may not garner any response (or at least will get you one you don't expect). Please keep that in mind. chris On Dec 17, 2009, at 1:42 PM, Robert Bradbury wrote: > Just to close out the issue of bioperl forking (in particular accesses to > external databases through get_sequence) which involves individual database > sub-modules and not collecting its children. > > As it turns out the code does do an explicit fork, it looks like so the > child process can read from the database while the parent process > manipulates the data as it becomes available. Now, one could argue that a > threaded model might be better since now threads are fairly standard OS > tools in current environments. > > But I couldn't find any functions which actually wait for the forked process > (presumably because they are created for "future" use). But nor is there > any indication in the pages I've found in most of the documentation (which > is spread across the web) or Wiki that explain that "creating child > processes" is how these functions work and one *needs* to collect those > children after each use or else zombie processes will accumulate, which on > "reasonable" systems with per-user process limits will create problems for > proper program functioning. Nor (it would appear) does the parent process > setup a SIGCHLD "catcher" which could collect the processes once they exit > (which I expect in the case of "get_sequence" would be after closing of the > socket which actually fetched the sequence from Genbank. > > It can be resolved easily enough by adding a call after each use of these > functions: > $kid = waitpid(-1, WNOHANG); > But typically, as a programmer, I should not be responsible for having to > clean up the leftovers of library calls (unless said cleanup requirements > are clearly documented). > > > But to a "newbie" using the functions, coming from a functional background > (C), not an OO background (which at least I would tend to view as a wart on > the otherwise robust Perl language), there are two problems > 1. The lack of documentation and examples explaining how the functions work > and how they must be handled at a higher level (by executing explicit wait > system calls). > 2. The lack of code in the BioPerl functions to deal with the forked > processes which they create. Functional programmers have a perspective -- > if you create it -- you have to clean it up. It would appear that in the > transition to OO programming (or perhaps simply for expediency) that detail > was left out of both (either/and) the documentation and the code. From this > standpoint one could view garbage collectors as being fundamentally evil -- > because they gloss over the fact that programmers should know what they are > doing and when they are doing it. > > So, everywhere in the documentation where there is a get_sequence call (or > anything which accesses an external database which causes a fork to occur) > there should be a modification as I have outlined above -- or else the code > should be corrected so orphaned children are always collected and not > allowed to accumulate. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu Dec 17 20:29:10 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 17 Dec 2009 14:29:10 -0600 Subject: [Bioperl-l] Remote blast fork errors / Process limit restrictions In-Reply-To: References: <39420695-FC14-446F-A4C2-C187E2221E44@bioperl.org> Message-ID: On Dec 17, 2009, at 2:23 PM, Robert Bradbury wrote: > Oh, yes, in case it was not clear, the fork calls which fails is in > DB/WebDBSeqI.pm: line 722 > defined(my $pid = fork) > or $self->throw("'Couldn't fork: $!"); Okay, that's a bit more helpful. > And of course that is because Linux has reached the process limits for the > user (due to accumulated background processes which are uncollected). Right, but again, we need to check this in a cross-platform compatible way. > And they could be resolved by simply executing a simple waitpid call for > prior orphaned children before forking [1] But such a succinct solution > would violate "functional" programming rules -- clean up what you create -- > instead they would tend to fall into the OO camp -- "Oh don't worry the > garbage collector will take care of it". Green programming is a little less > cavalier. > > Robert > > 1. IMO, a very very real problem with programming today is that there is no > connection between programmers and the cost of their programs. How many > programmers know the instruction cycle time of their computers, what does an > instruction cost in terms of W consumed, W wasted (heat generation), > fruitless scanning over uncollected zombie processes, etc. It may be that > only that programmers who grew up in the era when CPU cycles were expensive > (300 ns/cycle) who know what each instruction required in terms of cycles > consider these perspectives. Now things (cpu use, processor use, etc) tend > to be swept under the rug and it appears that that is the case with the > standard implementation of bioper. The documentation does not clearly state > that additional sub-processes may be created and need to be collected. You > are providing a utility that only works "this much". And guess what -- I > happen to have run into the "this". Um, yeah. Okay. chris From robfsouza at gmail.com Fri Dec 18 18:07:34 2009 From: robfsouza at gmail.com (Robson Francisco de Souza) Date: Fri, 18 Dec 2009 13:07:34 -0500 Subject: [Bioperl-l] Fwd: blast.pm patch In-Reply-To: References: Message-ID: Hi, I've been dealing with an apparent bug in the output of NCBI's BLAST programs (blastall, blastpgp) which sometimes produces output like the one below. I think I've managed to produce a work around for Bioperl blast.pm parser and would like to contribute it to Bioperl. The fix is based on blast.pm from the CVS tree (downloaded some months ago...) and is attached to this message. Best, Robson PS: what happened to the bioperl-bugs mailing list? It does not seem to be working... >gi|156552846|ref|XP_001600053.1| PREDICTED: similar to conserved ? ? ? ? ? hypothetical protein [Nasonia vitripennis] ? ? ? ? ?Length = 1774 ?Score = 75.9 bits (185), Expect = 1e-11, ? Method: Compositional matrix adjust. ?Identities = 85/393 (21%), Positives = 175/393 (44%), Gaps = 28/393 (7%) Query: 0 ? - Sbjct: 328 P ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328 Query: 0 Sbjct: 328 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328 Query: 0 Sbjct: 328 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328 Query: 0 Sbjct: 328 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328 Query: 0 Sbjct: 328 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328 Query: 0 Sbjct: 328 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328 Query: 0 Sbjct: 328 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328 Query: 0 Sbjct: 328 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328 Query: 0 Sbjct: 328 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328 Query: 0 Sbjct: 328 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?328 Query: 612 VPPPPGSGIPMPPGGGFFGMKTKLP-----KLPELKATKDTKKIHIAG ? ? ? ? ? ? 654 ? ? ? ? ? ?P PP + ? + P ? ? ? KTK+ ? ? ?K+P ?K ? ? ? ? + Sbjct: 329 TPEPPNNSAKLLPQQEIPTPKTKMKTINWNKIPNHKVIGKRNIWSLVA ? ? ? ? ? ? 376 Query: 655 DKINNKDIEGTGWMSILEENAEKMSKIFDKN-LFENNFQKKETRDAPSQEKENVPTLVSF 713 ? ? ? ? ? ++ ?N ?+ ? ?W ?+ ? ? +++ ?+ ? N ? ?NN ? ? ? D ? +E ? ?PT ++ Sbjct: 377 NEHQNSPMADLDWAEMEGLFCQQVPPMIPANTTCSNNLGNGVDTDKRRRE----PTEIAL 432 Query: 714 LDSKTSYQLALLLGFLKKNEREIRKHVIDLNEKELQKQTIHSLKDLCPEEDKFKEIESFV 773 ? ? ? ? ? LD K S ?+ + L ? + + ?+I + + D ? ?++ ?+ + ?L ?+ PE D+ + ++SF Sbjct: 433 LDGKRSLNVNIFLKQFRSSNEDIIQLIKDGGHDDIGAEKLRGLLKILPEVDELEMLKSF- 491 Query: 774 QKGDGYLEQLEPGDKLFYAMKDIPRLKQRFTAWSSQIYFEGSVISVEPDIESLNRACKNI 833 ? ? ? ? ? ? ?DG ? +L ? +K F ?+ ?+P ?K R ? ? ?+ ?F ?++ ?+EP I S+ ?A +++ Sbjct: 492 ---DGDKLKLGNAEKFFLQLIQVPNYKLRIECMLLKEEFAANMSYLEPSINSMILAGEDL 548 Query: 834 VQCKSLQRLMTLIVLLVNFLNKAKTDKDRVYGFKLNFLTKLGDIKSSSDPNRSMMNYLCE 893 ? ? ? ? ? + ?KSLQ ++ ++++ ?NFLN ? ? ?+ ? G KL+ L KL +I++ ? ?N+ ?MN L Sbjct: 549 MTNKSLQEVLYMVLVAGNFLNSGGYAGN-AAGVKLSSLQKLTEIRA----NKPGMN-LIH 602 Query: 894 FLLAKDDKLIPELLKELK--DYAEVGSRIELPELKKEIGKLNESLKVIQTELEFYKKEQK 951 ? ? ? ? ? ++ ?+ ++ ? +LL ? + ?+ ?+ ?++ ?+ +L ?E ? L+ ?+K I+++++ ? ?E Sbjct: 603 YVAMQAERKRKDLLNFARGMNALDSATKTTVEQLTNEFNALDTRIKKIRSQIQLPTTEA- 661 Query: 952 FINDKFPKQLDEFYQYAKSEMQKINKAQEKLEKILKEVAKFFGE 995 ? ? ? ? ? ? ? ? ?+Q+ +F Q A+ EM ++ + ?E+L+ + + +A+FF E Sbjct: 662 ----DIQEQMAQFLQMAEQEMSQLKRDMEELDGVRRTLAEFFCE 701 -------------- next part -------------- A non-text attachment was scrubbed... Name: blast_patched.pm Type: application/octet-stream Size: 91820 bytes Desc: not available URL: From cjfields at illinois.edu Fri Dec 18 18:33:44 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 18 Dec 2009 12:33:44 -0600 Subject: [Bioperl-l] Fwd: blast.pm patch In-Reply-To: References: Message-ID: Robson, Any chance you could check this against SVN? We haven't used the CVS tree for a few years (had a number of releases along the way as well). Not sure about bioperl-bugs, we have bugzilla still running though: http://bugzilla.open-bio.org/ chris On Dec 18, 2009, at 12:07 PM, Robson Francisco de Souza wrote: > Hi, > > I've been dealing with an apparent bug in the output of NCBI's BLAST > programs (blastall, blastpgp) which sometimes produces output like the > one below. > I think I've managed to produce a work around for Bioperl blast.pm > parser and would like to contribute it to Bioperl. > The fix is based on blast.pm from the CVS tree (downloaded some months > ago...) and is attached to this message. > Best, > Robson > > PS: what happened to the bioperl-bugs mailing list? It does not seem > to be working... > >> gi|156552846|ref|XP_001600053.1| PREDICTED: similar to conserved > hypothetical protein [Nasonia vitripennis] > Length = 1774 > > Score = 75.9 bits (185), Expect = 1e-11, Method: Compositional matrix adjust. > Identities = 85/393 (21%), Positives = 175/393 (44%), Gaps = 28/393 (7%) > > Query: 0 - > > Sbjct: 328 P 328 > > Query: 0 > > Sbjct: 328 328 > > Query: 0 > > Sbjct: 328 328 > > Query: 0 > > Sbjct: 328 328 > > Query: 0 > > Sbjct: 328 328 > > Query: 0 > > Sbjct: 328 328 > > Query: 0 > > Sbjct: 328 328 > > Query: 0 > > Sbjct: 328 328 > > Query: 0 > > Sbjct: 328 328 > > Query: 0 > > Sbjct: 328 328 > > Query: 612 VPPPPGSGIPMPPGGGFFGMKTKLP-----KLPELKATKDTKKIHIAG 654 > P PP + + P KTK+ K+P K + > Sbjct: 329 TPEPPNNSAKLLPQQEIPTPKTKMKTINWNKIPNHKVIGKRNIWSLVA 376 > > Query: 655 DKINNKDIEGTGWMSILEENAEKMSKIFDKN-LFENNFQKKETRDAPSQEKENVPTLVSF 713 > ++ N + W + +++ + N NN D +E PT ++ > Sbjct: 377 NEHQNSPMADLDWAEMEGLFCQQVPPMIPANTTCSNNLGNGVDTDKRRRE----PTEIAL 432 > > Query: 714 LDSKTSYQLALLLGFLKKNEREIRKHVIDLNEKELQKQTIHSLKDLCPEEDKFKEIESFV 773 > LD K S + + L + + +I + + D ++ + + L + PE D+ + ++SF > Sbjct: 433 LDGKRSLNVNIFLKQFRSSNEDIIQLIKDGGHDDIGAEKLRGLLKILPEVDELEMLKSF- 491 > > Query: 774 QKGDGYLEQLEPGDKLFYAMKDIPRLKQRFTAWSSQIYFEGSVISVEPDIESLNRACKNI 833 > DG +L +K F + +P K R + F ++ +EP I S+ A +++ > Sbjct: 492 ---DGDKLKLGNAEKFFLQLIQVPNYKLRIECMLLKEEFAANMSYLEPSINSMILAGEDL 548 > > Query: 834 VQCKSLQRLMTLIVLLVNFLNKAKTDKDRVYGFKLNFLTKLGDIKSSSDPNRSMMNYLCE 893 > + KSLQ ++ ++++ NFLN + G KL+ L KL +I++ N+ MN L > Sbjct: 549 MTNKSLQEVLYMVLVAGNFLNSGGYAGN-AAGVKLSSLQKLTEIRA----NKPGMN-LIH 602 > > Query: 894 FLLAKDDKLIPELLKELK--DYAEVGSRIELPELKKEIGKLNESLKVIQTELEFYKKEQK 951 > ++ + ++ +LL + + + ++ + +L E L+ +K I+++++ E > Sbjct: 603 YVAMQAERKRKDLLNFARGMNALDSATKTTVEQLTNEFNALDTRIKKIRSQIQLPTTEA- 661 > > Query: 952 FINDKFPKQLDEFYQYAKSEMQKINKAQEKLEKILKEVAKFFGE 995 > +Q+ +F Q A+ EM ++ + E+L+ + + +A+FF E > Sbjct: 662 ----DIQEQMAQFLQMAEQEMSQLKRDMEELDGVRRTLAEFFCE 701 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From biopython at maubp.freeserve.co.uk Fri Dec 18 23:00:47 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 18 Dec 2009 23:00:47 +0000 Subject: [Bioperl-l] Fwd: blast.pm patch In-Reply-To: References: Message-ID: <320fb6e00912181500r53c93284yc526ce654ca9050@mail.gmail.com> On Fri, Dec 18, 2009 at 6:07 PM, Robson Francisco de Souza wrote: > Hi, > > I've been dealing with an apparent bug in the output of NCBI's BLAST > programs (blastall, blastpgp) which sometimes produces output like the > one below. > I think I've managed to produce a work around for Bioperl blast.pm > parser and would like to contribute it to Bioperl. > The fix is based on blast.pm from the CVS tree (downloaded some months > ago...) and is attached to this message. > Best, > Robson Do you have a complete example of this kind of funny output? This problem has also been reported with blastpgp for the Biopython parser. I'd love an example for our unit tests (probably worth doing in BioPerl too). Could you upload a test case here?: http://bugzilla.open-bio.org/show_bug.cgi?id=2927 Thanks! Peter @ Biopython From biopython at maubp.freeserve.co.uk Sat Dec 19 11:19:53 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 19 Dec 2009 11:19:53 +0000 Subject: [Bioperl-l] Fwd: blast.pm patch In-Reply-To: References: <320fb6e00912181500r53c93284yc526ce654ca9050@mail.gmail.com> Message-ID: <320fb6e00912190319s75a0eb75m94dfbd7946a310e5@mail.gmail.com> On Sat, Dec 19, 2009 at 11:06 AM, Robson Francisco de Souza wrote: > > Hi Peter, > > I just upload my example. I also reported this bug to the NCBI > developers and I hope they can fix it, since it is easy to reproduce. > I just forgot to mention the blastpgp version: 2.2.18 > Best, > Robson Thank you, Peter From maj at fortinbras.us Sat Dec 19 19:52:45 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sat, 19 Dec 2009 14:52:45 -0500 Subject: [Bioperl-l] NCBI BlastPlus wrapper for your enjoyment Message-ID: Hi All, Your full-service BLAST wrapper, Bio::Tools::Run::StandAloneBlastPlus, is at beta in the bioperl-run trunk. It wraps all the programs of the NCBI's new blast+-2.2.22 suite ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ and integrates them, allowing you to create, mask, and query databases from within a single factory object. See the HOWTO http://www.bioperl.org/wiki/HOWTO:BlastPlus for the usual usage and implementation details. Happy coding-- MAJ From David.Messina at sbc.su.se Sat Dec 19 20:34:10 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Sat, 19 Dec 2009 21:34:10 +0100 Subject: [Bioperl-l] NCBI BlastPlus wrapper for your enjoyment In-Reply-To: References: Message-ID: <8F67673F-E71E-46A1-BD7C-6465C4D13398@sbc.su.se> Sweet! Thanks, Mark. Dave From cjfields at illinois.edu Sat Dec 19 22:44:46 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 19 Dec 2009 16:44:46 -0600 Subject: [Bioperl-l] NCBI BlastPlus wrapper for your enjoyment In-Reply-To: References: Message-ID: <3DC558C9-DD64-45F9-8A6F-EA4238D22EA5@illinois.edu> Very nice! We'll definitely give it a try here (along with the requisite feedback, of course). chris On Dec 19, 2009, at 1:52 PM, Mark A. Jensen wrote: > Hi All, > > Your full-service BLAST wrapper, Bio::Tools::Run::StandAloneBlastPlus, > is at beta in the bioperl-run trunk. It wraps all the programs of the > NCBI's new blast+-2.2.22 suite > ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ > and integrates them, allowing you to create, mask, and query > databases from within a single factory object. See the HOWTO > http://www.bioperl.org/wiki/HOWTO:BlastPlus > for the usual usage and implementation details. > > Happy coding-- > MAJ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Sun Dec 20 04:59:38 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 19 Dec 2009 22:59:38 -0600 Subject: [Bioperl-l] LocatableSeq NSE format - suggested changes In-Reply-To: <6723123C0ABD447190639AE1F5D1A6A7@NewLife> References: <579AEE4E-A79C-496B-881C-226616BF0A2B@illinois.edu> <6723123C0ABD447190639AE1F5D1A6A7@NewLife> Message-ID: <97DC7C2B-2433-4B8D-A16C-DF0507A29B22@illinois.edu> I think option 1 is cleaner as well; very easily added, so committed to main trunk as I consider this a bug, as one can potentially lose strand information when round-tripping data (original data with a -1 strand would be converted to +1). I'll work out the test fails on trunk along the way (ensure they're due to erroneous test data and not something else). chris On Dec 16, 2009, at 6:51 AM, Mark A. Jensen wrote: > I'm with Dave; option 1 is cleaner. The only problem might be the automatic interpretation of older output as always plus strand, but presumably these would have had to record the strandedness explicitly elsewhere, so they would be updatable. I'm definitely for making strandedness part of the spec in some way. cheers MAJ > ----- Original Message ----- From: "Chris Fields" > To: "BioPerl List" > Sent: Monday, December 14, 2009 8:23 PM > Subject: [Bioperl-l] LocatableSeq NSE format - suggested changes > > >> All, >> >> The current output for NSE format (Name/Start-End) via Bio::LocatableSeq::get_nse() currently doesn't allow for strandedness. I have seen two variations of NSE that incorporate strandedness: >> >> 1) Stockholm Rfam reverses start and end if the strand == -1 >> >> chrY/598-1 >> >> 2) Sheldon McKay's Gbrowse_syn uses Name(strand)/start-end >> >> rice-3(+)/16598648-16600199 >> >> The former breaks fewer things within BioPerl, but the latter seems more explicit. Any preferences? Do we want a new method that creates this, and deprecate out simple non-stranded NSE? >> >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From e.osimo at gmail.com Sun Dec 20 18:19:37 2009 From: e.osimo at gmail.com (Emanuele Osimo) Date: Sun, 20 Dec 2009 19:19:37 +0100 Subject: [Bioperl-l] Bio::Graphics and different Glyph sizes Message-ID: <2ac05d0f0912201019w278c1101q534749dd453fa1d1@mail.gmail.com> Hello everyone, I have a very particular problem: I'd like to draw in a single track different SNPs with a glyph that allows me to see graphically their importance. For example, if I have 10 SNPs 1 to 10 in importance, I'd like to have the first depicted small, and the last one big, with the ones in between with according sizes. I'd be satisfied also with a color gradient. What I cannot do is to set the option -height , for example, instead than in the add_track section, in the Bio::SeqFeature::Generic->new that I use for each of my objects. If I set it in the add_track section, all the glyphs are then of the same size (or color). If, otherwise, I add a different track for each object, my picture becomes too big. Please, help! Thanks Emanuele From ajmackey at gmail.com Sun Dec 20 18:41:14 2009 From: ajmackey at gmail.com (Aaron Mackey) Date: Sun, 20 Dec 2009 13:41:14 -0500 Subject: [Bioperl-l] Bio::Graphics and different Glyph sizes In-Reply-To: <2ac05d0f0912201019w278c1101q534749dd453fa1d1@mail.gmail.com> References: <2ac05d0f0912201019w278c1101q534749dd453fa1d1@mail.gmail.com> Message-ID: <24c96eca0912201041i37c32845k9e261414588b9bf4@mail.gmail.com> You can set the height as a callback sub, rather than a constant -- the callback will get passed the feature about to be drawn, from which you can calculate the "importance", and return the desired height, dynamically. -Aaron On Sun, Dec 20, 2009 at 1:19 PM, Emanuele Osimo wrote: > Hello everyone, > I have a very particular problem: I'd like to draw in a single track > different SNPs with a glyph that allows me to see graphically their > importance. > For example, if I have 10 SNPs 1 to 10 in importance, I'd like to have the > first depicted small, and the last one big, with the ones in between with > according sizes. > I'd be satisfied also with a color gradient. > What I cannot do is to set the option -height , for example, instead than > in > the add_track section, in the Bio::SeqFeature::Generic->new that I use for > each of my objects. > If I set it in the add_track section, all the glyphs are then of the same > size (or color). > If, otherwise, I add a different track for each object, my picture becomes > too big. > > Please, help! > Thanks > Emanuele > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From robfsouza at gmail.com Sat Dec 19 11:06:16 2009 From: robfsouza at gmail.com (Robson Francisco de Souza) Date: Sat, 19 Dec 2009 06:06:16 -0500 Subject: [Bioperl-l] Fwd: blast.pm patch In-Reply-To: <320fb6e00912181500r53c93284yc526ce654ca9050@mail.gmail.com> References: <320fb6e00912181500r53c93284yc526ce654ca9050@mail.gmail.com> Message-ID: Hi Peter, I just upload my example. I also reported this bug to the NCBI developers and I hope they can fix it, since it is easy to reproduce. I just forgot to mention the blastpgp version: 2.2.18 Best, Robson On Fri, Dec 18, 2009 at 6:00 PM, Peter wrote: > On Fri, Dec 18, 2009 at 6:07 PM, Robson Francisco de Souza > wrote: >> Hi, >> >> I've been dealing with an apparent bug in the output of NCBI's BLAST >> programs (blastall, blastpgp) which sometimes produces output like the >> one below. >> I think I've managed to produce a work around for Bioperl blast.pm >> parser and would like to contribute it to Bioperl. >> The fix is based on blast.pm from the CVS tree (downloaded some months >> ago...) and is attached to this message. >> Best, >> Robson > > Do you have a complete example of this kind of funny output? > This problem has also been reported with blastpgp for the > Biopython parser. I'd love an example for our unit tests > (probably worth doing in BioPerl too). Could you upload a > test case here?: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2927 > > Thanks! > > Peter @ Biopython > From biopython at maubp.freeserve.co.uk Mon Dec 21 15:27:47 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 21 Dec 2009 15:27:47 +0000 Subject: [Bioperl-l] Fwd: blast.pm patch In-Reply-To: References: <320fb6e00912181500r53c93284yc526ce654ca9050@mail.gmail.com> Message-ID: <320fb6e00912210727m522d2039if78891ab32fe0983@mail.gmail.com> On Sat, Dec 19, 2009 at 11:06 AM, Robson Francisco de Souza wrote: > > Hi Peter, > > I just upload my example. I also reported this bug to the NCBI > developers and I hope they can fix it, since it is easy to reproduce. > I just forgot to mention the blastpgp version: 2.2.18 > Best, > Robson Hi again Robson, Having a reproducible example to investigate this issue is incredibly helpful - thank you! I've been looking at the output, and while I can make sense of it "by hand", it would be very tricky to try and parse as a special case. It really does look like a bug in BLAST to me. The alignment includes an initial pair, a leading gap in the query (with a coordinate of zero), plus a residue from the match sequence (with a sensible coordinate). The alignment statistics include this (extra) pair in the alignment length. You said you were using blastpgp version 2.2.18, so I tried this with the latest (final?) version of the "legacy" BLAST suite, blastpgp 2.2.22, which I already had installed. It looks like my copy of NR is more recent (bigger), but the same odd output was produced: blastpgp -d nr -i Ngru1000013938.fa -o Ngru1000013938.fa.br -a 8 -j 1 -b 10000 I also tried what I think would be the equivalent command line on the new BLAST+ suite, using psiblast 2.2.22+ like this: psiblast -db nr -query Ngru1000013938.fa -out Ngru1000013938.fa.blast -num_threads 8 -parse_deflines -num_alignments 10000 This was much faster, and seems to output sensible alignments. I might therefore expect the NCBI so say "yes, this is a bug in the old blastpgp tool, just use the new psiblast tool instead". However, fingers crossed they will do another maintenance release of the "legacy" BLAST suite and fix this in blastpgp. Have you had any reply from the NCBI? Admittedly it is almost Christmas/New Year so we may not expect an answer until Jan. Peter From maj at fortinbras.us Mon Dec 21 18:52:01 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 21 Dec 2009 13:52:01 -0500 Subject: [Bioperl-l] test fail Message-ID: <5614E9FF133A47A694EF892D38A1717A@NewLife> fyi, getting following failure (Perl 5.10, GNU/Linux x86_64) t/SeqTools/SeqUtils..........................NOK 46/51# Failed test at t/SeqTools/SeqUtils.t line 275. # got: '1..4' # expected: 'complement(5..8)' t/SeqTools/SeqUtils..........................NOK 47/51# Failed test at t/SeqTools/SeqUtils.t line 276. # got: 'complement(5..8)' # expected: '1..4' # Looks like you failed 2 tests of 51. MAJ From cjfields at illinois.edu Mon Dec 21 19:20:32 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 21 Dec 2009 13:20:32 -0600 Subject: [Bioperl-l] test fail In-Reply-To: <5614E9FF133A47A694EF892D38A1717A@NewLife> References: <5614E9FF133A47A694EF892D38A1717A@NewLife> Message-ID: Saw that from the other day (LocatableSeq commit). I'll check it out. chris On Dec 21, 2009, at 12:52 PM, Mark A. Jensen wrote: > fyi, getting following failure (Perl 5.10, GNU/Linux x86_64) > > t/SeqTools/SeqUtils..........................NOK 46/51# Failed test at t/SeqTools/SeqUtils.t line 275. > # got: '1..4' > # expected: 'complement(5..8)' > > t/SeqTools/SeqUtils..........................NOK 47/51# Failed test at t/SeqTools/SeqUtils.t line 276. > # got: 'complement(5..8)' > # expected: '1..4' > # Looks like you failed 2 tests of 51. > > MAJ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From scott at scottcain.net Mon Dec 21 20:02:20 2009 From: scott at scottcain.net (Scott Cain) Date: Mon, 21 Dec 2009 15:02:20 -0500 Subject: [Bioperl-l] Bio::Graphics documentation Message-ID: <4536f7700912211202j4de81bb4k1e9039ed19b4ef97@mail.gmail.com> Hi All, Today it was pointed out to me that the Bio::Graphics documentation links on the BioPerl wiki are broken, no doubt because Bio::Graphics is no longer part of bioperl-core (is that how it should be referred to?). Anyway, the question is: what is the right way to rectify this problem? Since other things may get broken out in the future, I suppose we should get some sort of standard established. Can a release of Bio::Graphics be placed somewhere on the BioPerl wiki server to be processed? Thanks, Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From cjfields at illinois.edu Mon Dec 21 20:22:39 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 21 Dec 2009 14:22:39 -0600 Subject: [Bioperl-l] Bio::Graphics documentation In-Reply-To: <4536f7700912211202j4de81bb4k1e9039ed19b4ef97@mail.gmail.com> References: <4536f7700912211202j4de81bb4k1e9039ed19b4ef97@mail.gmail.com> Message-ID: <6FC2F08B-E902-449A-9E67-D1417A0BE20C@illinois.edu> We can come up with some standard wiki template for those modules no longer in svn, maybe with just CPAN links. Shouldn't be too hard to do. chris On Dec 21, 2009, at 2:02 PM, Scott Cain wrote: > Hi All, > > Today it was pointed out to me that the Bio::Graphics documentation > links on the BioPerl wiki are broken, no doubt because Bio::Graphics > is no longer part of bioperl-core (is that how it should be referred > to?). Anyway, the question is: what is the right way to rectify this > problem? Since other things may get broken out in the future, I > suppose we should get some sort of standard established. Can a > release of Bio::Graphics be placed somewhere on the BioPerl wiki > server to be processed? > > Thanks, > Scott > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Dec 21 21:12:45 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 21 Dec 2009 15:12:45 -0600 Subject: [Bioperl-l] test fail In-Reply-To: References: <5614E9FF133A47A694EF892D38A1717A@NewLife> Message-ID: T'was a bad test call. I basically changed the test to pull each feature directly by the primary tag, check it against the original sf prior to revcom, then check that the location was revcomp'ed correctly. chris On Dec 21, 2009, at 1:20 PM, Chris Fields wrote: > Saw that from the other day (LocatableSeq commit). I'll check it out. > > chris > > On Dec 21, 2009, at 12:52 PM, Mark A. Jensen wrote: > >> fyi, getting following failure (Perl 5.10, GNU/Linux x86_64) >> >> t/SeqTools/SeqUtils..........................NOK 46/51# Failed test at t/SeqTools/SeqUtils.t line 275. >> # got: '1..4' >> # expected: 'complement(5..8)' >> >> t/SeqTools/SeqUtils..........................NOK 47/51# Failed test at t/SeqTools/SeqUtils.t line 276. >> # got: 'complement(5..8)' >> # expected: '1..4' >> # Looks like you failed 2 tests of 51. >> >> MAJ >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Mon Dec 21 21:27:25 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 21 Dec 2009 16:27:25 -0500 Subject: [Bioperl-l] Bio::Graphics documentation In-Reply-To: <6FC2F08B-E902-449A-9E67-D1417A0BE20C@illinois.edu> References: <4536f7700912211202j4de81bb4k1e9039ed19b4ef97@mail.gmail.com> <6FC2F08B-E902-449A-9E67-D1417A0BE20C@illinois.edu> Message-ID: <1F54D94CE87E4238BC2C6128002FBC6A@NewLife> I've modified Template:Doclink ; if you now do {{Doclink|Bio::Graphics|cpan}} you'll get a page with only the cpan link. {{Doclink|Bio::SeqIO}} etc. works as usual. MAJ ----- Original Message ----- From: "Chris Fields" To: "Scott Cain" Cc: "BioPerl List" Sent: Monday, December 21, 2009 3:22 PM Subject: Re: [Bioperl-l] Bio::Graphics documentation > We can come up with some standard wiki template for those modules no longer in > svn, maybe with just CPAN links. Shouldn't be too hard to do. > > chris > > On Dec 21, 2009, at 2:02 PM, Scott Cain wrote: > >> Hi All, >> >> Today it was pointed out to me that the Bio::Graphics documentation >> links on the BioPerl wiki are broken, no doubt because Bio::Graphics >> is no longer part of bioperl-core (is that how it should be referred >> to?). Anyway, the question is: what is the right way to rectify this >> problem? Since other things may get broken out in the future, I >> suppose we should get some sort of standard established. Can a >> release of Bio::Graphics be placed somewhere on the BioPerl wiki >> server to be processed? >> >> Thanks, >> Scott >> >> >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D. scott at scottcain dot >> net >> GMOD Coordinator (http://gmod.org/) 216-392-3087 >> Ontario Institute for Cancer Research >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Mon Dec 21 21:34:40 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 21 Dec 2009 16:34:40 -0500 Subject: [Bioperl-l] Bio::Graphics documentation In-Reply-To: <6FC2F08B-E902-449A-9E67-D1417A0BE20C@illinois.edu> References: <4536f7700912211202j4de81bb4k1e9039ed19b4ef97@mail.gmail.com> <6FC2F08B-E902-449A-9E67-D1417A0BE20C@illinois.edu> Message-ID: <5081DC24D9AE46FF95075559898B2574@NewLife> Also, applied the new Doclink to Bio::Graphics on wiki. ----- Original Message ----- From: "Chris Fields" To: "Scott Cain" Cc: "BioPerl List" Sent: Monday, December 21, 2009 3:22 PM Subject: Re: [Bioperl-l] Bio::Graphics documentation > We can come up with some standard wiki template for those modules no longer in > svn, maybe with just CPAN links. Shouldn't be too hard to do. > > chris > > On Dec 21, 2009, at 2:02 PM, Scott Cain wrote: > >> Hi All, >> >> Today it was pointed out to me that the Bio::Graphics documentation >> links on the BioPerl wiki are broken, no doubt because Bio::Graphics >> is no longer part of bioperl-core (is that how it should be referred >> to?). Anyway, the question is: what is the right way to rectify this >> problem? Since other things may get broken out in the future, I >> suppose we should get some sort of standard established. Can a >> release of Bio::Graphics be placed somewhere on the BioPerl wiki >> server to be processed? >> >> Thanks, >> Scott >> >> >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D. scott at scottcain dot >> net >> GMOD Coordinator (http://gmod.org/) 216-392-3087 >> Ontario Institute for Cancer Research >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Tue Dec 22 02:51:32 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 21 Dec 2009 21:51:32 -0500 Subject: [Bioperl-l] pdb.pm and annotations In-Reply-To: <2dade3480912160955h4f77277dv8e6b47b7b0fda23a@mail.gmail.com> References: <2dade3480912160955h4f77277dv8e6b47b7b0fda23a@mail.gmail.com> Message-ID: <6292EDA0F05B48578AF7B7E5864C8707@NewLife> Hi Sung-- We didn't plan it, but we added it anyway: see revision 16559 of bioperl-live/trunk. You can then do $pmid = ($struct->annotation->get_Annotations('reference'))[0]->pubmed; and even $doi = ($struct->annotation->get_Annotations('reference'))[0]->doi; Thanks for the heads-up! cheers, MAJ ----- Original Message ----- From: "Sungsam Gong" To: Sent: Wednesday, December 16, 2009 12:55 PM Subject: [Bioperl-l] pdb.pm and annotations > Hi, > > Wanted to get pubmed identifier from a PDB file using Bio::Structure, > so hacked the code. > Knew that Bio::Structure::IO::pdb.pm get relevant info from either > 'JRNL' or 'REMARK 1'. > However could not see any actual code parsing 'PMID'. > >>From pdb.pm, what I see: > > sub _read_PDB_jrnl { > ... > $auth = $self->_concatenate_lines($auth,$rol) if ($subr eq "AUTH"); > $titl = $self->_concatenate_lines($titl,$rol) if ($subr eq "TITL"); > $edit = $self->_concatenate_lines($edit,$rol) if ($subr eq "EDIT"); > $ref = $self->_concatenate_lines($ref ,$rol) if ($subr eq "REF"); > $publ = $self->_concatenate_lines($publ,$rol) if ($subr eq "PUBL"); > $refn = $self->_concatenate_lines($refn,$rol) if ($subr eq "REFN"); > ... > } > > sub _read_PDB_remark_1 { > ... > $auth = $self->_concatenate_lines($auth,$rol) if > ($subr eq "AUTH"); > $titl = $self->_concatenate_lines($titl,$rol) if > ($subr eq "TITL"); > $edit = $self->_concatenate_lines($edit,$rol) if > ($subr eq "EDIT"); > $ref = $self->_concatenate_lines($ref ,$rol) if > ($subr eq "REF"); > $publ = $self->_concatenate_lines($publ,$rol) if > ($subr eq "PUBL"); > $refn = $self->_concatenate_lines($refn,$rol) if > ($subr eq "REFN"); > ... > } > >>From my script, I did: > > ($struc->annotation->get_Annotations('reference'))[0]->authors > ($struc->annotation->get_Annotations('reference'))[0]->title > > or > > my $hash_ref=($struc->annotation->get_Annotations('reference'))[0]->hash_tree > for my $key (keys %{$hash_ref}) { > print $key,": ",$hash_ref->{$key},"\n"; > } > > Any plan to include a code chopping 'PMID' out? > Or did I miss something? > > Cheers, > Sung > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From dan.kortschak at adelaide.edu.au Tue Dec 22 03:24:04 2009 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Tue, 22 Dec 2009 13:54:04 +1030 Subject: [Bioperl-l] call for help and comments on module Message-ID: <1261452244.9520.86.camel@zoidberg.mbs.adelaide.edu.au> Hi, I've been working on a Bio::Tools::Run module to handle the bowtie rapid alignment tool (and associated tools): Bio::Tools::Run::Bowtie (in bioperl-run tree). I have 90% of what I want included in the module and would like some advice from more experienced bioperlers. Feedback on approach is also welcomed (this is my first significant wrapper, and after a long gap from writing module, so I am rusty). The module has ended up being significantly more complicated than I had hoped. There are a few issues I'm having, so I apologise for the list: 1. Informal tests run correctly (outside the t/ tree and Test harness), but formal Test harness tests fail for reasons I cannot understand. (The module is still lacking a lot of tests, but since things were failing in the harness I have placed them as a lower priority and have been working to my micro-script tests - yes, bad form. 2. I am having a big problem with IPC::Run for one of the executables (the module can call 5 different excutables for 7 commands), bowtie-maptool (module command 'map'). All the other commands tested (this excludes bowtie-maqconvert [convert command]) work fine, but maptool fails with an illegal seek - presumably due to the redirection handling? I have no idea how to resolve this, so help would be greatly appreciated (a small script that demonstrates the use that results in the failure is below). There will be provision for returning a Bio::Assembly::IO object through samtools in the finished module, but currently the Bio::Assembly::IO::sam builder doesn't like what bowtie can provide. Thanks for any help. Dan #!/usr/bin/perl use strict; use warnings; use Bio::Tools::Run::Bowtie; # These files are in the bioperl-run t/data/ tree my $rdq = '/usr/local/src/bioperl-run/t/data/bowtie/reads/e_coli_1000.fq'; my $refseq = '/usr/local/src/bioperl-run/t/data/bowtie/indexes/e_coli'; my $bowtiefac = Bio::Tools::Run::Bowtie->new( -command => 'single', -max_seed_mismatches => 2, -seed_length => 28, -max_qual_mismatch => 70, -sam_format => 0 ); my $align = $bowtiefac->run($rdq,$refseq); # this runs fine my $bowtiemap = Bio::Tools::Run::Bowtie->new( -command => 'map' ); my $map = $bowtiemap->run($align); # throws Illegal seek print "$map\n"; open (IN,$map); my $lines =(my @lines)= ; print @lines; print "\n\n$lines\n"; close IN; From maj at fortinbras.us Tue Dec 22 05:19:35 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 22 Dec 2009 00:19:35 -0500 Subject: [Bioperl-l] call for help and comments on module In-Reply-To: <1261452244.9520.86.camel@zoidberg.mbs.adelaide.edu.au> References: <1261452244.9520.86.camel@zoidberg.mbs.adelaide.edu.au> Message-ID: Hey Dan, It looks like if the outfile isn't specified on the commandline for maptool, then the align is written to stdout. So, you could try this workaround in in Bowtie/Config.pm: our %command_files = ( 'single' => [qw( ind seq #out )], 'paired' => [qw( ind seq seq2 #out )], 'crossbow' => [qw( ind seq #out )], 'build' => [qw( ref out )], 'inspect' => [qw( ind >#out )], 'convert' => [qw( bwt out bfa )], - 'map' => [qw( bwt #out )] + 'map' => [qw( bwt >#out )] ); which should be transparent to the user. If this works, then there is probably something funky going on with IPC::Run + maptool; if it doesn't, then the funkiness is prob. in my code. I notice, however, that both bowtie-maptool and bowtie-maqconvert have been removed from the 0.12.0-beta release (http://bowtie-bio.sourceforge.net/index.shtml)... cheers MAJ ----- Original Message ----- From: "Dan Kortschak" To: Sent: Monday, December 21, 2009 10:24 PM Subject: [Bioperl-l] call for help and comments on module > Hi, > > I've been working on a Bio::Tools::Run module to handle the bowtie rapid > alignment tool (and associated tools): Bio::Tools::Run::Bowtie (in > bioperl-run tree). > > I have 90% of what I want included in the module and would like some > advice from more experienced bioperlers. Feedback on approach is also > welcomed (this is my first significant wrapper, and after a long gap > from writing module, so I am rusty). The module has ended up being > significantly more complicated than I had hoped. > > There are a few issues I'm having, so I apologise for the list: > > 1. Informal tests run correctly (outside the t/ tree and Test > harness), but formal Test harness tests fail for reasons I > cannot understand. (The module is still lacking a lot of tests, > but since things were failing in the harness I have placed them > as a lower priority and have been working to my micro-script > tests - yes, bad form. > 2. I am having a big problem with IPC::Run for one of the > executables (the module can call 5 different excutables for 7 > commands), bowtie-maptool (module command 'map'). All the other > commands tested (this excludes bowtie-maqconvert [convert > command]) work fine, but maptool fails with an illegal seek - > presumably due to the redirection handling? I have no idea how > to resolve this, so help would be greatly appreciated (a small > script that demonstrates the use that results in the failure is > below). > > There will be provision for returning a Bio::Assembly::IO object through > samtools in the finished module, but currently the > Bio::Assembly::IO::sam builder doesn't like what bowtie can provide. > > Thanks for any help. > Dan > > > #!/usr/bin/perl > > use strict; > use warnings; > > use Bio::Tools::Run::Bowtie; > > # These files are in the bioperl-run t/data/ tree > my $rdq = '/usr/local/src/bioperl-run/t/data/bowtie/reads/e_coli_1000.fq'; > my $refseq = '/usr/local/src/bioperl-run/t/data/bowtie/indexes/e_coli'; > > my $bowtiefac = Bio::Tools::Run::Bowtie->new( > -command => 'single', > -max_seed_mismatches => 2, > -seed_length => 28, > -max_qual_mismatch => 70, > -sam_format => 0 > ); > > my $align = $bowtiefac->run($rdq,$refseq); # this runs fine > > my $bowtiemap = Bio::Tools::Run::Bowtie->new( > -command => 'map' > ); > > my $map = $bowtiemap->run($align); # throws Illegal seek > > print "$map\n"; > > open (IN,$map); > my $lines =(my @lines)= ; > print @lines; > print "\n\n$lines\n"; > close IN; > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From dan.kortschak at adelaide.edu.au Tue Dec 22 05:51:30 2009 From: dan.kortschak at adelaide.edu.au (Dan Kortschak) Date: Tue, 22 Dec 2009 16:21:30 +1030 Subject: [Bioperl-l] call for help and comments on module In-Reply-To: References: <1261452244.9520.86.camel@zoidberg.mbs.adelaide.edu.au> Message-ID: <1261461090.4411.13.camel@epistle> Hi Mark, maptool either outputs to stdout or a specified file - I chose to use a specified file and run it that way, but I've tried the redirect a you suggest, with the same failure result. I think it's a strangeness of maptool (which may well be a reason for it being dropped - also note the maptool output doesn't seem reasonable for the test data provided even when run from the command line). It's probably a result of difficult interaction between IPC::Run and maptool. Any funkiness in your code is not likely to be a cause as I've deeply analysed what is being passed to IPC::Run, and I've quite extensively modified the IPC run handling method from your code to take into account the differences between a single executable with many commands as the base code managed from a cluster of executables each taking a small subset of different filespecs as bowtie needs. My funkiness will undoubtedly swamp yours. Resolution: Will drop bowtie-maptool from module. (Should test maqconvert - if it fails, this will be dropped also unless someone asks otherwise). When the module copes with 0.11.* properly I'll start thinking about 0.12.* which has colourspace handling to deal with. cheers Dan On Tue, 2009-12-22 at 00:19 -0500, Mark A. Jensen wrote: > Hey Dan, > It looks like if the outfile isn't specified on the commandline for > maptool, then the align is written to stdout. So, you could > try this workaround in in Bowtie/Config.pm: > > our %command_files = ( > 'single' => [qw( ind seq #out )], > 'paired' => [qw( ind seq seq2 #out )], > 'crossbow' => [qw( ind seq #out )], > 'build' => [qw( ref out )], > 'inspect' => [qw( ind >#out )], > 'convert' => [qw( bwt out bfa )], > - 'map' => [qw( bwt #out )] > + 'map' => [qw( bwt >#out )] > ); > > which should be transparent to the user. If this works, then > there is probably something funky going on with IPC::Run > + maptool; if it doesn't, then the funkiness is prob. in my code. > > I notice, however, that both bowtie-maptool and bowtie-maqconvert > have been removed from the 0.12.0-beta release > (http://bowtie-bio.sourceforge.net/index.shtml)... > > cheers MAJ From lovebaby39 at gmail.com Wed Dec 23 10:48:55 2009 From: lovebaby39 at gmail.com (Hsueh) Date: Wed, 23 Dec 2009 18:48:55 +0800 Subject: [Bioperl-l] About bioperl issue: get string In-Reply-To: <15F92119-7625-4491-899A-0D49CE1BC861@sbc.su.se> References: <5F281DC3E4514B3AAA8881169B240227@SHAPC> <107080B6-BC05-470C-B426-5DB69BD574C1@sbc.su.se> <9DEC7152C11A4F00B2F919B653E6D572@SHAPC> <15F92119-7625-4491-899A-0D49CE1BC861@sbc.su.se> Message-ID: <52CDD8F61DDC48B9BBADD020EF18E9E0@SHAPC> Dear all I use "$hit_u->name" to get "gnl|uv|Z46234.1:664-3444", but I don't know how to get "P.pastoris DNA for pPIC9K expression vector". while (my $result_u = $blast_report_u-> next_result ) { while (my $hit_u = $result_u->next_hit()){ while (my $hsp_u = $hit_u->next_hsp()){ $hit_u->name; $hsp_u->evalue; $hsp_u->score; } } } I will appreciate if you could tell me how to do it. P.S. How can I download the BioPerl's Manual? (BioPerl's Manual download link?) The flow is BLAST result: ------------------------------------------------------------------------------------------------------------------------------------- BLASTN 2.2.16 [Mar-25-2007] Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. Query= (458 letters) Database: UniVec (build 4.0) 2416 sequences; 597,480 total letters Searching..................................................done Score E Sequences producing significant alignments: (bits) Value gnl|uv|Z46234.1:664-3444 P.pastoris DNA for pPIC9K expression ve... 26 3.1 gnl|uv|U89673.1:863-1946 Cloning vector pIRES1neo 26 3.1 gnl|uv|U13843.1:1887-9923 pBPV cloning vector 26 3.1 >gnl|uv|Z46234.1:664-3444 P.pastoris DNA for pPIC9K expression vector Length = 2781 Score = 26.3 bits (13), Expect = 3.1 Identities = 13/13 (100%) Strand = Plus / Plus Query: 352 tactaccgccatt 364 ||||||||||||| Sbjct: 2209 tactaccgccatt 2221 ------------------------------------------------------------------------------------------------------------------------------------- Reginald Hsueh From hrh at fmi.ch Wed Dec 23 15:14:06 2009 From: hrh at fmi.ch (Hotz, Hans-Rudolf) Date: Wed, 23 Dec 2009 16:14:06 +0100 Subject: [Bioperl-l] About bioperl issue: get string In-Reply-To: <52CDD8F61DDC48B9BBADD020EF18E9E0@SHAPC> Message-ID: Hi Assuming you are using "SearchIO", try: $hit_u->description for more details see: http://www.bioperl.org/wiki/HOWTO:SearchIO Regards, Hans On 12/23/09 11:48 AM, "Hsueh" wrote: > Dear all > > I use "$hit_u->name" to get "gnl|uv|Z46234.1:664-3444", but I don't know how > to get "P.pastoris DNA for pPIC9K expression vector". > > while (my $result_u = $blast_report_u-> next_result ) { > while (my $hit_u = $result_u->next_hit()){ > while (my $hsp_u = $hit_u->next_hsp()){ > $hit_u->name; > $hsp_u->evalue; > $hsp_u->score; > } > } > } > > I will appreciate if you could tell me how to do it. > > P.S. How can I download the BioPerl's Manual? (BioPerl's Manual download > link?) > > > > The flow is BLAST result: > ------------------------------------------------------------------------------ > ------------------------------------------------------- > BLASTN 2.2.16 [Mar-25-2007] > Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, > Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), > "Gapped BLAST and PSI-BLAST: a new generation of protein database search > programs", Nucleic Acids Res. 25:3389-3402. > Query= > (458 letters) > > Database: UniVec (build 4.0) > 2416 sequences; 597,480 total letters > Searching..................................................done > > Score E > Sequences producing significant alignments: > (bits) Value > > gnl|uv|Z46234.1:664-3444 P.pastoris DNA for pPIC9K expression ve... > 26 3.1 > gnl|uv|U89673.1:863-1946 Cloning vector pIRES1neo > 26 3.1 > gnl|uv|U13843.1:1887-9923 pBPV cloning vector > 26 3.1 > >> gnl|uv|Z46234.1:664-3444 P.pastoris DNA for pPIC9K expression vector > Length = 2781 > > Score = 26.3 bits (13), Expect = 3.1 > Identities = 13/13 (100%) > Strand = Plus / Plus > > Query: 352 tactaccgccatt 364 > ||||||||||||| > Sbjct: 2209 tactaccgccatt 2221 > ------------------------------------------------------------------------------ > ------------------------------------------------------- > > Reginald Hsueh > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From pkuonline at gmail.com Wed Dec 23 18:36:49 2009 From: pkuonline at gmail.com (pkuonline) Date: Wed, 23 Dec 2009 12:36:49 -0600 Subject: [Bioperl-l] PAML parser failed for PAML 4.3b and 4.1 Message-ID: <200912231236490784820@gmail.com> Hi Everyone, I used the latest Bioperl build, http://www.bioperl.org/DIST/nightly_builds/bioperl-live.tar.gz and tried to parse CODEML result. I searched the mail list and found current PAML parser is compatible with PAML 4.3a, http://lists.open-bio.org/pipermail/bioperl-l/2009-November/031602.html. However, recently, Ziheng Yang updates his PAML to 4.3b. I found the parser does not work. More strangely, I tested it on the old PAML 4.1 result and also failed. I attached my CODEML outputs here to see whether you guys have some idea. Many thanks ahead! Best regards, ------------------------------------------------------------- Yong Zhang Ph.D, Research Scholar Manyuan Long's Lab University of Chicago -------------- next part -------------- A non-text attachment was scrubbed... Name: rst4.1 Type: application/octet-stream Size: 60616 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: mlc4.1 Type: application/octet-stream Size: 11635 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: mlc4.3b Type: application/octet-stream Size: 11330 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: rst4.3b Type: application/octet-stream Size: 60616 bytes Desc: not available URL: From cjfields at illinois.edu Wed Dec 23 21:19:48 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 23 Dec 2009 15:19:48 -0600 Subject: [Bioperl-l] PAML parser failed for PAML 4.3b and 4.1 In-Reply-To: <200912231236490784820@gmail.com> References: <200912231236490784820@gmail.com> Message-ID: Well, not completely unexpected, but very frustrating nonetheless. Changes to PAML output have broken in just about every PAML parser revision. Not sure when this will be addressed unfortunately, my hope is sooner than later. Can you file a bioperl bug report for this? It's the best place to keep track. http://bugzilla.open-bio.org/ chris On Dec 23, 2009, at 12:36 PM, pkuonline wrote: > Hi Everyone, > > I used the latest Bioperl build, http://www.bioperl.org/DIST/nightly_builds/bioperl-live.tar.gz and tried to parse CODEML result. I searched the mail list and found current PAML parser is compatible with PAML 4.3a, http://lists.open-bio.org/pipermail/bioperl-l/2009-November/031602.html. However, recently, Ziheng Yang updates his PAML to 4.3b. I found the parser does not work. More strangely, I tested it on the old PAML 4.1 result and also failed. > > I attached my CODEML outputs here to see whether you guys have some idea. > > Many thanks ahead! > > Best regards, > ------------------------------------------------------------- > Yong Zhang > Ph.D, Research Scholar > Manyuan Long's Lab > University of Chicago_______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From pkuonline at gmail.com Wed Dec 23 22:45:54 2009 From: pkuonline at gmail.com (pkuonline) Date: Wed, 23 Dec 2009 16:45:54 -0600 Subject: [Bioperl-l] PAML parser failed for PAML 4.3b and 4.1 References: <200912231236490784820@gmail.com>, Message-ID: <200912231645536094087@gmail.com> Hi Chris, Thanks for your reply and I just submitted this bug to bugzilla. Have a nice holiday! ------------------------------------------------------------- Yong Zhang Ph.D, Research Scholar Manyuan Long's Lab University of Chicago >------------------------------------------------------------- >From: Chris Fields >Time: 2009-12-23 15:19:50 >To: pkuonline bioperl-l >Subject: Re: [Bioperl-l] PAML parser failed for PAML 4.3b and 4.1 >Well, not completely unexpected, but very frustrating nonetheless. Changes to PAML output have broken in just about every PAML parser revision. Not sure when this will be addressed unfortunately, my hope is sooner than later. > >Can you file a bioperl bug report for this? It's the best place to keep track. > >http://bugzilla.open-bio.org/ > >chris > >On Dec 23, 2009, at 12:36 PM, pkuonline wrote: > >> Hi Everyone, >> >> I used the latest Bioperl build, http://www.bioperl.org/DIST/nightly_builds/bioperl-live.tar.gz and tried to parse CODEML result. I searched the mail list and found current PAML parser is compatible with PAML 4.3a, http://lists.open-bio.org/pipermail/bioperl-l/2009-November/031602.html. However, recently, Ziheng Yang updates his PAML to 4.3b. I found the parser does not work. More strangely, I tested it on the old PAML 4.1 result and also failed. >> >> I attached my CODEML outputs here to see whether you guys have some idea. >> >> Many thanks ahead! >> >> Best regards, >> ------------------------------------------------------------- >> Yong Zhang >> Ph.D, Research Scholar >> Manyuan Long's Lab >> University of Chicago_______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From David.Messina at sbc.su.se Wed Dec 23 23:23:44 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 24 Dec 2009 00:23:44 +0100 Subject: [Bioperl-l] PAML parser failed for PAML 4.3b and 4.1 In-Reply-To: <200912231645536094087@gmail.com> References: <200912231236490784820@gmail.com>, <200912231645536094087@gmail.com> Message-ID: <08E748F4-1398-4543-AB77-0640441BC323@sbc.su.se> Hi Yong, Could you attach your codeml output to the bug report, too? I'll take a look at this as soon as I can. Dave From maj at fortinbras.us Thu Dec 24 05:47:10 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 24 Dec 2009 00:47:10 -0500 Subject: [Bioperl-l] PAML parser failed for PAML 4.3b and 4.1 In-Reply-To: <200912231645536094087@gmail.com> References: <200912231236490784820@gmail.com>, <200912231645536094087@gmail.com> Message-ID: <2DF45CDC2BE44A85ADCD865A98CD13D6@NewLife> Yong-- say 'ni hao' to Manyuan for me --- cheers MAJ ----- Original Message ----- From: "pkuonline" To: "Chris Fields" Cc: "bioperl-l" Sent: Wednesday, December 23, 2009 5:45 PM Subject: Re: [Bioperl-l] PAML parser failed for PAML 4.3b and 4.1 > Hi Chris, > > Thanks for your reply and I just submitted this bug to bugzilla. > > Have a nice holiday! > ------------------------------------------------------------- > Yong Zhang > Ph.D, Research Scholar > Manyuan Long's Lab > University of Chicago > >>------------------------------------------------------------- >>From: Chris Fields >>Time: 2009-12-23 15:19:50 >>To: pkuonline bioperl-l >>Subject: Re: [Bioperl-l] PAML parser failed for PAML 4.3b and 4.1 > >>Well, not completely unexpected, but very frustrating nonetheless. Changes to >>PAML output have broken in just about every PAML parser revision. Not sure >>when this will be addressed unfortunately, my hope is sooner than later. >> >>Can you file a bioperl bug report for this? It's the best place to keep >>track. >> >>http://bugzilla.open-bio.org/ >> >>chris >> >>On Dec 23, 2009, at 12:36 PM, pkuonline wrote: >> >>> Hi Everyone, >>> >>> I used the latest Bioperl build, >>> http://www.bioperl.org/DIST/nightly_builds/bioperl-live.tar.gz and tried to >>> parse CODEML result. I searched the mail list and found current PAML parser >>> is compatible with PAML 4.3a, >>> http://lists.open-bio.org/pipermail/bioperl-l/2009-November/031602.html. >>> However, recently, Ziheng Yang updates his PAML to 4.3b. I found the parser >>> does not work. More strangely, I tested it on the old PAML 4.1 result and >>> also failed. >>> >>> I attached my CODEML outputs here to see whether you guys have some idea. >>> >>> Many thanks ahead! >>> >>> Best regards, >>> ------------------------------------------------------------- >>> Yong Zhang >>> Ph.D, Research Scholar >>> Manyuan Long's Lab >>> University of >>> Chicago_______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > -------------------------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bhakti.dwivedi at gmail.com Sat Dec 26 02:46:51 2009 From: bhakti.dwivedi at gmail.com (Bhakti Dwivedi) Date: Fri, 25 Dec 2009 21:46:51 -0500 Subject: [Bioperl-l] how to retrieve organism name from accession number? Message-ID: Hi, Does anyone know how to retrieve the "Source" or the "Species name" given the accession number using Bioperl. I have these 30,000 accession numbers for which I need to get the source organisms. Any kind of help will be appreciated. Thanks BD From maj at fortinbras.us Sat Dec 26 03:52:10 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 25 Dec 2009 22:52:10 -0500 Subject: [Bioperl-l] how to retrieve organism name from accession number? In-Reply-To: References: Message-ID: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife> Bhakti, The following example (using EUtilities) may serve your purpose: use Bio::DB::EUtilities; my (%taxa, @taxa); my (%names, %idmap); # these are protein ids; nuc ids will work by changing -dbfrom => 'nucleotide', # (probably) my @ids = qw(1621261 89318838 68536103 20807972 730439); my $factory = Bio::DB::EUtilities->new(-eutil => 'elink', -db => 'taxonomy', -dbfrom => 'protein', -correspondence => 1, -id => \@ids); # iterate through the LinkSet objects while (my $ds = $factory->next_LinkSet) { $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0] } @taxa = @taxa{@ids}; $factory = Bio::DB::EUtilities->new(-eutil => 'esummary', -db => 'taxonomy', -id => \@taxa ); while (local $_ = $factory->next_DocSum) { $names{($_->get_contents_by_name('TaxId'))[0]} = ($_->get_contents_by_name('ScientificName'))[0]; } foreach (@ids) { $idmap{$_} = $names{$taxa{$_}}; } # %idmap is # 1621261 => 'Mycobacterium tuberculosis H37Rv' # 20807972 => 'Thermoanaerobacter tengcongensis MB4' # 68536103 => 'Corynebacterium jeikeium K411' # 730439 => 'Bacillus caldolyticus' # 89318838 => undef (this record has been removed from the db) 1; You probably will need to break up your 30000 into chunks (say, 1000-3000 each), and do the above on each chunk with a sleep 3; or so separating the queries. MAJ ----- Original Message ----- From: "Bhakti Dwivedi" To: Sent: Friday, December 25, 2009 9:46 PM Subject: [Bioperl-l] how to retrieve organism name from accession number? > Hi, > > Does anyone know how to retrieve the "Source" or the "Species name" given > the accession number using Bioperl. I have these 30,000 accession numbers > for which I need to get the source organisms. Any kind of help will be > appreciated. > > Thanks > > BD > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Sat Dec 26 11:47:29 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 26 Dec 2009 05:47:29 -0600 Subject: [Bioperl-l] how to retrieve organism name from accession number? In-Reply-To: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife> References: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife> Message-ID: On Dec 25, 2009, at 9:52 PM, Mark A. Jensen wrote: > Bhakti, > The following example (using EUtilities) may serve your purpose: > > use Bio::DB::EUtilities; > > ... > You probably will need to break up your 30000 into chunks > (say, 1000-3000 each), and do the above on each chunk with a > > sleep 3; > > or so separating the queries. > MAJ The 'sleep 3' is built-in and now (on main trunk) matches NCBI's current spec of 3 queries/sec. chris From arpm9 at charter.net Sun Dec 27 21:42:09 2009 From: arpm9 at charter.net (arpm9) Date: Sun, 27 Dec 2009 16:42:09 -0500 Subject: [Bioperl-l] Should Bio::Tools::BPlite be deprecated? In-Reply-To: 4533A8D3.90709@sendu.me.uk Message-ID: <867A36FEE0244EF2950108C42BD2BE58@paulb0d5af35b9> hi chris, I was trying to make sense of this backpacking lite and just simply wanted to view the light...and got nowhere and very frustrated...please help if you can...or whoever can...thanks Pm From pengyu.ut at gmail.com Tue Dec 29 16:08:09 2009 From: pengyu.ut at gmail.com (Peng Yu) Date: Tue, 29 Dec 2009 10:08:09 -0600 Subject: [Bioperl-l] Comparison between bioperl and biopython? Message-ID: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com> May I ask somebody who are versitile in both bioperl and biopython comment on the pros and cons of bioperl and biopython? I'm sending this email to both bioperl and biopython mailing lists. But I hope that it will not result in any contention. I assume that the functionality between bioperl or biopython is the same, i.e., tasks can be done in bioperl can be done biopython and vice versa, as both libraries have been out there over 10 years. Please correct me if my understanding is not true. Given that a task that can be done with either bioperl or biopython, I, in particularly, want to know how long it will take to write the code for the task in bioperl and biopython, with the same readability requirement (see below) and the assumption that users have the same fluency in perl and python. python is claimed to be good for maintainability. But perl is criticized for there-are-many-ways-for-a-given-task. Since there are multiple ways in perl, let us assume that we always use perl in a readable way. From jason at bioperl.org Tue Dec 29 16:49:20 2009 From: jason at bioperl.org (Jason Stajich) Date: Tue, 29 Dec 2009 08:49:20 -0800 Subject: [Bioperl-l] Comparison between bioperl and biopython? In-Reply-To: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com> References: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com> Message-ID: <2B85EF86-8A84-491B-8C33-7EC16CCB8CBC@bioperl.org> Are you asking for the purposes of choosing a toolkit for your work or just curious about the advantages/disadvantages of language choice? -jason On Dec 29, 2009, at 8:08 AM, Peng Yu wrote: > May I ask somebody who are versitile in both bioperl and biopython > comment on the pros and cons of bioperl and biopython? I'm sending > this email to both bioperl and biopython mailing lists. But I hope > that it will not result in any contention. > > I assume that the functionality between bioperl or biopython is the > same, i.e., tasks can be done in bioperl can be done biopython and > vice versa, as both libraries have been out there over 10 years. > Please correct me if my understanding is not true. > > Given that a task that can be done with either bioperl or biopython, > I, in particularly, want to know how long it will take to write the > code for the task in bioperl and biopython, with the same readability > requirement (see below) and the assumption that users have the same > fluency in perl and python. > > python is claimed to be good for maintainability. But perl is > criticized for there-are-many-ways-for-a-given-task. Since there are > multiple ways in perl, let us assume that we always use perl in a > readable way. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ From ak at ebi.ac.uk Tue Dec 29 16:57:18 2009 From: ak at ebi.ac.uk (Andreas =?iso-8859-1?B?S+Ro5HJp?=) Date: Tue, 29 Dec 2009 16:57:18 +0000 Subject: [Bioperl-l] Comparison between bioperl and biopython? In-Reply-To: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com> References: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com> Message-ID: <20091229165718.GB30356@quux.windows.ebi.ac.uk> On Tue, Dec 29, 2009 at 10:08:09AM -0600, Peng Yu wrote: > May I ask somebody who are versitile in both bioperl and biopython > comment on the pros and cons of bioperl and biopython? I'm sending > this email to both bioperl and biopython mailing lists. But I hope > that it will not result in any contention. > > I assume that the functionality between bioperl or biopython is the > same, i.e., tasks can be done in bioperl can be done biopython and > vice versa, as both libraries have been out there over 10 years. > Please correct me if my understanding is not true. > > Given that a task that can be done with either bioperl or biopython, > I, in particularly, want to know how long it will take to write the > code for the task in bioperl and biopython, with the same readability > requirement (see below) and the assumption that users have the same > fluency in perl and python. > > python is claimed to be good for maintainability. But perl is > criticized for there-are-many-ways-for-a-given-task. Since there are > multiple ways in perl, let us assume that we always use perl in a > readable way. Assuming, as you do, that the functionality of BioPerl and BioPython is the same: Which of the two programming languages are you (or your team) most proficient in? Use that language. Regards, Andreas -- Andreas K?h?ri, Ensembl Software Developer European Bioinformatics Institute (EMBL-EBI) Wellcome Trust Genome Campus, Hinxton Cambridge CB10 1SD, United Kingdom From sdavis2 at mail.nih.gov Tue Dec 29 17:03:40 2009 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Tue, 29 Dec 2009 12:03:40 -0500 Subject: [Bioperl-l] [Biopython] Comparison between bioperl and biopython? In-Reply-To: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com> References: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com> Message-ID: <264855a00912290903m213d7cc4l607e8fa0bad55571@mail.gmail.com> On Tue, Dec 29, 2009 at 11:08 AM, Peng Yu wrote: > May I ask somebody who are versitile in both bioperl and biopython > comment on the pros and cons of bioperl and biopython? I'm sending > this email to both bioperl and biopython mailing lists. But I hope > that it will not result in any contention. > > I assume that the functionality between bioperl or biopython is the > same, i.e., tasks can be done in bioperl can be done biopython and > vice versa, as both libraries have been out there over 10 years. > Please correct me if my understanding is not true. The two projects have similar goals, but saying that the functionality is the same would be an extreme oversimplification. You will need to define what you want to do and then check to see what the two projects have to offer. This will, in general, require perusing the websites for both projects as well as the relevant documentation. > Given that a task that can be done with either bioperl or biopython, > I, in particularly, want to know how long it will take to write the > code for the task in bioperl and biopython, with the same readability > requirement (see below) and the assumption that users have the same > fluency in perl and python. Again, you will want to define the task(s) to be accomplished and then weigh the pros and cons of each project combined with local expertise. If you don't know what you want to do, then you can certainly read some examples on the websites and see which project strikes you as a "winner" for you. > python is claimed to be good for maintainability. But perl is > criticized for there-are-many-ways-for-a-given-task. Since there are > multiple ways in perl, let us assume that we always use perl in a > readable way. These two statements are generalizations that provide little insight into the strengths or weaknesses of the languages. In other words, one can write good or bad code in both languages. Hope that helps. Sean From wenzhiwang1983 at yahoo.com.cn Tue Dec 29 18:30:02 2009 From: wenzhiwang1983 at yahoo.com.cn (WangWenzhi) Date: Wed, 30 Dec 2009 02:30:02 +0800 (CST) Subject: [Bioperl-l] Comparison between bioperl and biopython? In-Reply-To: <2B85EF86-8A84-491B-8C33-7EC16CCB8CBC@bioperl.org> Message-ID: <658770.25534.qm@web15204.mail.cnb.yahoo.com> Dear Jason, Plink is a very useful program in the population genetics, especially in the Genome-Wide SNP scan era. Is there any plan to add the Plink (ped or tped) format to Bio::PopGen::IO? Thanks. Wenzhi Wang State Key Laboratory of Genetic Resources and Evolution Kunming Institute of Zoology, Chinese Academy of Sciences Kunming, Yunnan 650223 P. R. China Tel: 86 871 5198 993 Fax: 86 871 5195 430 E-mail: wenzhiwang1983 at yahoo.com.cn ___________________________________________________________ ????????????????? http://card.mail.cn.yahoo.com/ From pengyu.ut at gmail.com Tue Dec 29 18:58:59 2009 From: pengyu.ut at gmail.com (Peng Yu) Date: Tue, 29 Dec 2009 12:58:59 -0600 Subject: [Bioperl-l] Comparison between bioperl and biopython? In-Reply-To: <2B85EF86-8A84-491B-8C33-7EC16CCB8CBC@bioperl.org> References: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com> <2B85EF86-8A84-491B-8C33-7EC16CCB8CBC@bioperl.org> Message-ID: <366c6f340912291058t6c601e57re0c35e69fe81e09d@mail.gmail.com> To choose a toolkit for my work. On Tue, Dec 29, 2009 at 10:49 AM, Jason Stajich wrote: > Are you asking for the purposes of choosing a toolkit for your work or just > curious about the advantages/disadvantages of language choice? > > -jason > On Dec 29, 2009, at 8:08 AM, Peng Yu wrote: > >> May I ask somebody who are versitile in both bioperl and biopython >> comment on the pros and cons of bioperl and biopython? I'm sending >> this email to both bioperl and biopython mailing lists. But I hope >> that it will not result in any contention. >> >> I assume that the functionality between bioperl or biopython is the >> same, i.e., tasks can be done in bioperl can be done biopython and >> vice versa, as both libraries have been out there over 10 years. >> Please correct me if my understanding is not true. >> >> Given that a task that can be done with either bioperl or biopython, >> I, in particularly, want to know how long it will take to write the >> code for the task in bioperl and biopython, with the same readability >> requirement (see below) and the assumption that users have the same >> fluency in perl and python. >> >> python is claimed to be good for maintainability. But perl is >> criticized for there-are-many-ways-for-a-given-task. Since there are >> multiple ways in perl, let us assume that we always use perl in a >> readable way. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > http://fungalgenomes.org/ > > From pengyu.ut at gmail.com Tue Dec 29 19:15:14 2009 From: pengyu.ut at gmail.com (Peng Yu) Date: Tue, 29 Dec 2009 13:15:14 -0600 Subject: [Bioperl-l] [Biopython] Comparison between bioperl and biopython? In-Reply-To: <264855a00912290903m213d7cc4l607e8fa0bad55571@mail.gmail.com> References: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com> <264855a00912290903m213d7cc4l607e8fa0bad55571@mail.gmail.com> Message-ID: <366c6f340912291115o58ba0b82kce74e18fecd833c8@mail.gmail.com> On Tue, Dec 29, 2009 at 11:03 AM, Sean Davis wrote: > On Tue, Dec 29, 2009 at 11:08 AM, Peng Yu wrote: >> May I ask somebody who are versitile in both bioperl and biopython >> comment on the pros and cons of bioperl and biopython? I'm sending >> this email to both bioperl and biopython mailing lists. But I hope >> that it will not result in any contention. >> >> I assume that the functionality between bioperl or biopython is the >> same, i.e., tasks can be done in bioperl can be done biopython and >> vice versa, as both libraries have been out there over 10 years. >> Please correct me if my understanding is not true. > > The two projects have similar goals, but saying that the functionality > is the same would be an extreme oversimplification. ?You will need to > define what you want to do and then check to see what the two projects > have to offer. ?This will, in general, require perusing the websites > for both projects as well as the relevant documentation. According to your experience, are there some tasks that are easier with one than with another? >> Given that a task that can be done with either bioperl or biopython, >> I, in particularly, want to know how long it will take to write the >> code for the task in bioperl and biopython, with the same readability >> requirement (see below) and the assumption that users have the same >> fluency in perl and python. > > Again, you will want to define the task(s) to be accomplished and then > weigh the pros and cons of each project combined with local expertise. > ?If you don't know what you want to do, then you can certainly read > some examples on the websites and see which project strikes you as a > "winner" for you. > >> python is claimed to be good for maintainability. But perl is >> criticized for there-are-many-ways-for-a-given-task. Since there are >> multiple ways in perl, let us assume that we always use perl in a >> readable way. > > These two statements are generalizations that provide little insight > into the strengths or weaknesses of the languages. ?In other words, > one can write good or bad code in both languages. > > Hope that helps. > > Sean > From alperyilmaz at gmail.com Tue Dec 29 19:36:03 2009 From: alperyilmaz at gmail.com (Alper Yilmaz) Date: Tue, 29 Dec 2009 14:36:03 -0500 Subject: [Bioperl-l] Bio::TreeIO, Bio::Tree::Draw::Cladogram and phyloxml issues.. Message-ID: Hello, I have a tree in phyloxml format, and am trying to draw a subtree by using a spefic node as the root. I used Bio::Tree::Draw::Cladogram for drawing and encountered some problems. When I use whole tree and draw it, everything is fine; but, when I pick a particular node and construct the subtree from that node's ancestor by using "my $subtree = Bio::Tree::Tree->new(-root => $new_root, -nodelete => 1);", Bio::Tree::Draw::Cladogram creates a faulty EPS file, which contains extra lines added in the middle of the file. For instance: . . . 72.0820393261372 126 moveto (OsIBCD006509) show 30 81.25 moveto 81.25 lineto lineto 48.5410196630686 120 moveto 30 120 lineto . . . Should read: 72.0820393261372 126 moveto (OsIBCD006509) show 48.5410196630686 120 moveto 30 120 lineto Also, I tried to write the subtree into a new phyloxml file first, then draw it. The code is shown as follows: my $savefile = "save.phyloxml"; my $treeout = Bio::TreeIO->new(-format =>'phyloxml', -file => ">$savefile"); $treeout->write_tree($subtree); my $tree2 = Bio::TreeIO->new(-format =>'phyloxml', -file => "save.phyloxml"); my $t1 = $tree2->next_tree; my $image_output = "test.eps"; my $obj1 = Bio::Tree::Draw::Cladogram->new(-tree => $t1, -top => 10, -bottom => 10,); $obj1->print(-file => $image_output); The generated phyloxml file, which is named save.phyloxml, has an additional new line between "" and "" at the end of the file. And this additional new line lead an error when doing the parsing(open file and draw eps). I removed the new line, manually, then Bio::Tree::Draw::Cladogram gave me the eps file successfully. Anyone knows how to fix these problems: 1- faulty eps file generation 2- additional newline character in phyloxml output Is it the problem about the way I create the subtree? The phyloxml file I used can be downloaded from: http://grassius.org/download/HSF.phyloxml Run this code with the phyloxml file to see newline character problem: http://pastebin.com/f87ee1ee Run this code with the phyloxml file to see faulty eps file problem: http://pastebin.com/fc4715a1 Alper Yilmaz Post-doctoral Researcher Plant Biotechnology Center The Ohio State University 1060 Carmack Rd Columbus, OH 43210 (614)688-4954 From pengyu.ut at gmail.com Tue Dec 29 21:32:17 2009 From: pengyu.ut at gmail.com (Peng Yu) Date: Tue, 29 Dec 2009 15:32:17 -0600 Subject: [Bioperl-l] Document missing on Core/Latest/modules.html Message-ID: <366c6f340912291332i257f8061r6fa277324137033b@mail.gmail.com> http://bioperl.org/Core/Latest/modules.html Many links if not all are broken on the above pages. Could somebody fix it? For example, on http://www.bioperl.org/wiki/HOWTOs/txt/Beginners.txt, I see the following error. There is currently no text in this page. You can search for this page title in other pages, search the related logs, or edit this page. From jason at bioperl.org Tue Dec 29 21:49:00 2009 From: jason at bioperl.org (Jason Stajich) Date: Tue, 29 Dec 2009 13:49:00 -0800 Subject: [Bioperl-l] Document missing on Core/Latest/modules.html In-Reply-To: <366c6f340912291332i257f8061r6fa277324137033b@mail.gmail.com> References: <366c6f340912291332i257f8061r6fa277324137033b@mail.gmail.com> Message-ID: That is an outdated URL I am not sure where you are linking it from. We can probably now disable all old '/Core' URLs. All documentation links are in the /wiki/ The beginner's howto is here for example http://bioperl.org/wiki/HOWTO:Beginners > http://www.bioperl.org/wiki/HOWTOs On Dec 29, 2009, at 1:32 PM, Peng Yu wrote: > http://bioperl.org/Core/Latest/modules.html > > Many links if not all are broken on the above pages. Could somebody > fix it? > > For example, on http://www.bioperl.org/wiki/HOWTOs/txt/Beginners.txt, > I see the following error. > > There is currently no text in this page. You can search for this page > title in other pages, search the related logs, or edit this page. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ From jason at bioperl.org Tue Dec 29 21:50:26 2009 From: jason at bioperl.org (Jason Stajich) Date: Tue, 29 Dec 2009 13:50:26 -0800 Subject: [Bioperl-l] Comparison between bioperl and biopython? In-Reply-To: <658770.25534.qm@web15204.mail.cnb.yahoo.com> References: <658770.25534.qm@web15204.mail.cnb.yahoo.com> Message-ID: yep - be great if someone were to write it. This being a volunteer project we welcome your contribution. No I don't specifically have plans to do it, but maybe you can give it a try or another population genetics interested bioperl user/developer? -jason On Dec 29, 2009, at 10:30 AM, WangWenzhi wrote: > Dear Jason, > > Plink is a very useful program in the population genetics, > especially in the Genome-Wide SNP scan era. Is there any plan to add > the Plink (ped or tped) format to Bio::PopGen::IO? > > Thanks. > > Wenzhi Wang > State Key Laboratory of Genetic Resources and Evolution > Kunming Institute of Zoology, Chinese Academy of Sciences > Kunming, Yunnan 650223 P. R. China > Tel: 86 871 5198 993 > Fax: 86 871 5195 430 > E-mail: wenzhiwang1983 at yahoo.com.cn > > > ___________________________________________________________ > ????????????????? > http://card.mail.cn.yahoo.com/ -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ From jason at bioperl.org Tue Dec 29 21:57:49 2009 From: jason at bioperl.org (Jason Stajich) Date: Tue, 29 Dec 2009 13:57:49 -0800 Subject: [Bioperl-l] [Biopython] Comparison between bioperl and biopython? In-Reply-To: <366c6f340912291115o58ba0b82kce74e18fecd833c8@mail.gmail.com> References: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com> <264855a00912290903m213d7cc4l607e8fa0bad55571@mail.gmail.com> <366c6f340912291115o58ba0b82kce74e18fecd833c8@mail.gmail.com> Message-ID: <02851B8A-E74E-453E-9725-6FA8F3995F82@bioperl.org> On Dec 29, 2009, at 11:15 AM, Peng Yu wrote: > On Tue, Dec 29, 2009 at 11:03 AM, Sean Davis > wrote: >> On Tue, Dec 29, 2009 at 11:08 AM, Peng Yu >> wrote: >>> May I ask somebody who are versitile in both bioperl and biopython >>> comment on the pros and cons of bioperl and biopython? I'm sending >>> this email to both bioperl and biopython mailing lists. But I hope >>> that it will not result in any contention. >>> >>> I assume that the functionality between bioperl or biopython is the >>> same, i.e., tasks can be done in bioperl can be done biopython and >>> vice versa, as both libraries have been out there over 10 years. >>> Please correct me if my understanding is not true. >> >> The two projects have similar goals, but saying that the >> functionality >> is the same would be an extreme oversimplification. You will need to >> define what you want to do and then check to see what the two >> projects >> have to offer. This will, in general, require perusing the websites >> for both projects as well as the relevant documentation. > > According to your experience, are there some tasks that are easier > with one than with another? As you have still failed to give much insight into the 'tasks' it is hard to give you a better answer. If there is a module or set of routines already written then yes one might be easier than the other. Otherwise it just depends on your strengths in the programming language. We discussed the strengths of the different toolkits briefly on the podcast last month. http://twit.tv/floss96 I echo Sean. Use whichever language you are a better programmer in. BioPerl is more mature in some facets than is BioPython, but BioPython has some components that are more heavily developed and supported than BioPerl (structures being one of those and interfacing that to pyMol would be a strength). I personally think the Gbrowse, Bio-Graphics, and Bio::DB::GFF/Bio::DB::SeqFeature::Store interface to Sequence databases and Features is a critical aspect of mining genomic data and features and use these heavily in my work, making BioPerl easy and powerful for my tasks. That and sequence and alignment parsing and reformatting. But there are comparable tools written in python with and without BioPython that you can also use so mainly it is about building up an expertise in a toolkit and going forward. The BioPerl faithful will probably say it is more useful toolkit to us, but we are of course a biased sample. Both projects can benefit from more users and developers contributing code and documentation so I would just jump in and give it a try if you are unsure which will be easier for you. > >>> Given that a task that can be done with either bioperl or biopython, >>> I, in particularly, want to know how long it will take to write the >>> code for the task in bioperl and biopython, with the same >>> readability >>> requirement (see below) and the assumption that users have the same >>> fluency in perl and python. >> >> Again, you will want to define the task(s) to be accomplished and >> then >> weigh the pros and cons of each project combined with local >> expertise. >> If you don't know what you want to do, then you can certainly read >> some examples on the websites and see which project strikes you as a >> "winner" for you. >> >>> python is claimed to be good for maintainability. But perl is >>> criticized for there-are-many-ways-for-a-given-task. Since there are >>> multiple ways in perl, let us assume that we always use perl in a >>> readable way. >> >> These two statements are generalizations that provide little insight >> into the strengths or weaknesses of the languages. In other words, >> one can write good or bad code in both languages. >> >> Hope that helps. >> >> Sean >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ From pengyu.ut at gmail.com Tue Dec 29 22:01:05 2009 From: pengyu.ut at gmail.com (Peng Yu) Date: Wed, 30 Dec 2009 16:01:05 +1800 Subject: [Bioperl-l] How to download the exon sequences, and the exon and CDS boundary for a RefSeq ID? Message-ID: <366c6f340912291401t3ff173fbrc44fe0d4078be148@mail.gmail.com> I see the following example. But it is not clear to me how to get the exon sequences. I also want to get the exon boundaries and associated CDS boundaries. Although, I can get the boundary information from ucsc table browser, but it would be convenient if I can get it in bioperl along with the sequence. Could somebody let me know how do it? http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/DB/RefSeq.html From sdavis2 at mail.nih.gov Tue Dec 29 22:13:30 2009 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Tue, 29 Dec 2009 17:13:30 -0500 Subject: [Bioperl-l] Document missing on Core/Latest/modules.html In-Reply-To: <366c6f340912291332i257f8061r6fa277324137033b@mail.gmail.com> References: <366c6f340912291332i257f8061r6fa277324137033b@mail.gmail.com> Message-ID: <264855a00912291413r7ce37e2h673dec7c2624db6@mail.gmail.com> On Tue, Dec 29, 2009 at 4:32 PM, Peng Yu wrote: > http://bioperl.org/Core/Latest/modules.html > > Many links if not all are broken on the above pages. Could somebody fix it? > > For example, on http://www.bioperl.org/wiki/HOWTOs/txt/Beginners.txt, > I see the following error. > > There is currently no text in this page. You can search for this page > title in other pages, search the related logs, or edit this page. It is unfortunate that the links are broken on that page. However, I believe that page is somewhat outdated, anyway. Here are the HOWTO pages: http://www.bioperl.org/wiki/HOWTOs Sean From pengyu.ut at gmail.com Tue Dec 29 22:21:16 2009 From: pengyu.ut at gmail.com (Peng Yu) Date: Wed, 30 Dec 2009 16:21:16 +1800 Subject: [Bioperl-l] Document missing on Core/Latest/modules.html In-Reply-To: References: <366c6f340912291332i257f8061r6fa277324137033b@mail.gmail.com> Message-ID: <366c6f340912291421m38bb8348oe6b224f29208f9f4@mail.gmail.com> On Wed, Dec 30, 2009 at 3:49 PM, Jason Stajich wrote: > That is an outdated URL I am not sure where you are linking it from. We can > probably now disable all old '/Core' URLs. I'm linked from here. http://www.bioperl.org/wiki/BioPerl_Tutorial Since those URLs are outdated. Could you please fix the links on the above link? > All documentation links are in the /wiki/ > > The beginner's howto is here for example > ?http://bioperl.org/wiki/HOWTO:Beginners > >> http://www.bioperl.org/wiki/HOWTOs > > > On Dec 29, 2009, at 1:32 PM, Peng Yu wrote: > >> http://bioperl.org/Core/Latest/modules.html >> >> Many links if not all are broken on the above pages. Could somebody fix >> it? >> >> For example, on http://www.bioperl.org/wiki/HOWTOs/txt/Beginners.txt, >> I see the following error. >> >> There is currently no text in this page. You can search for this page >> title in other pages, search the related logs, or edit this page. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > http://fungalgenomes.org/ > > From sdavis2 at mail.nih.gov Tue Dec 29 23:06:17 2009 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Tue, 29 Dec 2009 18:06:17 -0500 Subject: [Bioperl-l] How to download the exon sequences, and the exon and CDS boundary for a RefSeq ID? In-Reply-To: <366c6f340912291401t3ff173fbrc44fe0d4078be148@mail.gmail.com> References: <366c6f340912291401t3ff173fbrc44fe0d4078be148@mail.gmail.com> Message-ID: <264855a00912291506s13c32d5dg7b46f0cc34c20f94@mail.gmail.com> On Tue, Dec 29, 2009 at 5:01 PM, Peng Yu wrote: > I see the following example. But it is not clear to me how to get the > exon sequences. I also want to get the exon boundaries and associated > CDS boundaries. Although, I can get the boundary information from ucsc > table browser, but it would be convenient if I can get it in bioperl > along with the sequence. > > Could somebody let me know how do it? > > http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/DB/RefSeq.html Hi, Peng. There may be some confusion, as the UCSC database aligns RefSeq sequence to a genome to generate exon start and end coordinates. However, the RefSeq records retrieved by Bio::DB::RefSeq are not in genomic context and so do not have start and end locations on the genome. That is, if you want the starts and ends along the genome, that information is not available from the RefSeq record itself, I don't think. If that is what you need (genomic coordinates), you can download the information directly from UCSC, download flat files from NCBI mapview, or even from ensembl (using biomart, for instance). If you are looking for a bioperl-compliant way of doing this, look at the Ensembl Perl API. Sean From jkhilmer at gmail.com Tue Dec 29 19:55:18 2009 From: jkhilmer at gmail.com (Jonathan Hilmer) Date: Tue, 29 Dec 2009 12:55:18 -0700 Subject: [Bioperl-l] [Biopython] Comparison between bioperl and biopython? In-Reply-To: <366c6f340912291115o58ba0b82kce74e18fecd833c8@mail.gmail.com> References: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com> <264855a00912290903m213d7cc4l607e8fa0bad55571@mail.gmail.com> <366c6f340912291115o58ba0b82kce74e18fecd833c8@mail.gmail.com> Message-ID: <81277ce10912291155x6dde10ewe2055b9692d077c1@mail.gmail.com> Personally, I think that the differences between Python and Perl (although substantial) are not large enough to make the language itself the deciding factor. Instead, consider the larger community of software. I haven't yet found a situation in which Python cannot be applied: it can be used with R (statistics); lower-level code C or fortran; visualization software such as PyMol, Chimera, Blender, VTK; plotting with matplotlib; and scipy/numpy or sage, which provide innumerable benefits for computation, data-processing, etc. Although I don't claim to have a great deal of experience with Perl, I haven't seen the same integration with that language: I'm assuming it can be used with R and VTK (not sure about C or fortran?). For this reason, unless your work is highly targeted and you have no use programming language integration with other software, I would recommend Python. For perl experts, I would truly appreciate any corrections you could offer to these observations of mine, since I wouldn't mind using perl if it offers benefits either in general or for specific applications. Jonathan On Tue, Dec 29, 2009 at 12:15 PM, Peng Yu wrote: > On Tue, Dec 29, 2009 at 11:03 AM, Sean Davis wrote: >> On Tue, Dec 29, 2009 at 11:08 AM, Peng Yu wrote: >>> May I ask somebody who are versitile in both bioperl and biopython >>> comment on the pros and cons of bioperl and biopython? I'm sending >>> this email to both bioperl and biopython mailing lists. But I hope >>> that it will not result in any contention. >>> >>> I assume that the functionality between bioperl or biopython is the >>> same, i.e., tasks can be done in bioperl can be done biopython and >>> vice versa, as both libraries have been out there over 10 years. >>> Please correct me if my understanding is not true. >> >> The two projects have similar goals, but saying that the functionality >> is the same would be an extreme oversimplification. ?You will need to >> define what you want to do and then check to see what the two projects >> have to offer. ?This will, in general, require perusing the websites >> for both projects as well as the relevant documentation. > > According to your experience, are there some tasks that are easier > with one than with another? > >>> Given that a task that can be done with either bioperl or biopython, >>> I, in particularly, want to know how long it will take to write the >>> code for the task in bioperl and biopython, with the same readability >>> requirement (see below) and the assumption that users have the same >>> fluency in perl and python. >> >> Again, you will want to define the task(s) to be accomplished and then >> weigh the pros and cons of each project combined with local expertise. >> ?If you don't know what you want to do, then you can certainly read >> some examples on the websites and see which project strikes you as a >> "winner" for you. >> >>> python is claimed to be good for maintainability. But perl is >>> criticized for there-are-many-ways-for-a-given-task. Since there are >>> multiple ways in perl, let us assume that we always use perl in a >>> readable way. >> >> These two statements are generalizations that provide little insight >> into the strengths or weaknesses of the languages. ?In other words, >> one can write good or bad code in both languages. >> >> Hope that helps. >> >> Sean >> > > _______________________________________________ > Biopython mailing list ?- ?Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From wgheath at gmail.com Tue Dec 29 20:16:39 2009 From: wgheath at gmail.com (William Heath) Date: Tue, 29 Dec 2009 12:16:39 -0800 Subject: [Bioperl-l] [Biopython] Comparison between bioperl and biopython? In-Reply-To: <81277ce10912291155x6dde10ewe2055b9692d077c1@mail.gmail.com> References: <366c6f340912290808q6edea4d8ncb59a270f9d11f1a@mail.gmail.com> <264855a00912290903m213d7cc4l607e8fa0bad55571@mail.gmail.com> <366c6f340912291115o58ba0b82kce74e18fecd833c8@mail.gmail.com> <81277ce10912291155x6dde10ewe2055b9692d077c1@mail.gmail.com> Message-ID: The biggest reason to go with python is the ease of use. Biologists are not programmers and the learning curve for python is much smaller than that of perl. I like perl but choose python because of this issue. Perl 6 does address some of these issues however but this has not been fully implemented as of yet. -Tim P.S. I love, love, love cpan though which is only for perl right now :( On Tue, Dec 29, 2009 at 11:55 AM, Jonathan Hilmer wrote: > Personally, I think that the differences between Python and Perl > (although substantial) are not large enough to make the language > itself the deciding factor. > > Instead, consider the larger community of software. I haven't yet > found a situation in which Python cannot be applied: it can be used > with R (statistics); lower-level code C or fortran; visualization > software such as PyMol, Chimera, Blender, VTK; plotting with > matplotlib; and scipy/numpy or sage, which provide innumerable > benefits for computation, data-processing, etc. > > Although I don't claim to have a great deal of experience with Perl, I > haven't seen the same integration with that language: I'm assuming it > can be used with R and VTK (not sure about C or fortran?). For this > reason, unless your work is highly targeted and you have no use > programming language integration with other software, I would > recommend Python. > > For perl experts, I would truly appreciate any corrections you could > offer to these observations of mine, since I wouldn't mind using perl > if it offers benefits either in general or for specific applications. > > > Jonathan > > On Tue, Dec 29, 2009 at 12:15 PM, Peng Yu wrote: > > On Tue, Dec 29, 2009 at 11:03 AM, Sean Davis > wrote: > >> On Tue, Dec 29, 2009 at 11:08 AM, Peng Yu wrote: > >>> May I ask somebody who are versitile in both bioperl and biopython > >>> comment on the pros and cons of bioperl and biopython? I'm sending > >>> this email to both bioperl and biopython mailing lists. But I hope > >>> that it will not result in any contention. > >>> > >>> I assume that the functionality between bioperl or biopython is the > >>> same, i.e., tasks can be done in bioperl can be done biopython and > >>> vice versa, as both libraries have been out there over 10 years. > >>> Please correct me if my understanding is not true. > >> > >> The two projects have similar goals, but saying that the functionality > >> is the same would be an extreme oversimplification. You will need to > >> define what you want to do and then check to see what the two projects > >> have to offer. This will, in general, require perusing the websites > >> for both projects as well as the relevant documentation. > > > > According to your experience, are there some tasks that are easier > > with one than with another? > > > >>> Given that a task that can be done with either bioperl or biopython, > >>> I, in particularly, want to know how long it will take to write the > >>> code for the task in bioperl and biopython, with the same readability > >>> requirement (see below) and the assumption that users have the same > >>> fluency in perl and python. > >> > >> Again, you will want to define the task(s) to be accomplished and then > >> weigh the pros and cons of each project combined with local expertise. > >> If you don't know what you want to do, then you can certainly read > >> some examples on the websites and see which project strikes you as a > >> "winner" for you. > >> > >>> python is claimed to be good for maintainability. But perl is > >>> criticized for there-are-many-ways-for-a-given-task. Since there are > >>> multiple ways in perl, let us assume that we always use perl in a > >>> readable way. > >> > >> These two statements are generalizations that provide little insight > >> into the strengths or weaknesses of the languages. In other words, > >> one can write good or bad code in both languages. > >> > >> Hope that helps. > >> > >> Sean > >> > > > > _______________________________________________ > > Biopython mailing list - Biopython at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython > > > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From pengyu.ut at gmail.com Wed Dec 30 17:26:45 2009 From: pengyu.ut at gmail.com (Peng Yu) Date: Thu, 31 Dec 2009 11:26:45 +1800 Subject: [Bioperl-l] How to read in the whole fasta file in the memory? Message-ID: <366c6f340912300926k5af5cc88nc3c3babda541fd1@mail.gmail.com> With Bio::SeqIO, I can only read in the records in a fasta file one by one. This is preferable if there are many records in a file. But I also want to read all the records in. I could use a while loop to read all records in. But could somebody let me know if there is a function in bioperl that can read in all the record at once and return me an object? http://www.bioperl.org/wiki/HOWTO:SeqIO From sdavis2 at mail.nih.gov Wed Dec 30 18:04:53 2009 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Wed, 30 Dec 2009 13:04:53 -0500 Subject: [Bioperl-l] How to read in the whole fasta file in the memory? In-Reply-To: <366c6f340912300926k5af5cc88nc3c3babda541fd1@mail.gmail.com> References: <366c6f340912300926k5af5cc88nc3c3babda541fd1@mail.gmail.com> Message-ID: <264855a00912301004t396e0d4fwf9d291c5d82c3fb9@mail.gmail.com> On Wed, Dec 30, 2009 at 12:26 PM, Peng Yu wrote: > With Bio::SeqIO, I can only read in the records in a fasta file one by > one. This is preferable if there are many records in a file. > > But I also want to read all the records in. I could use a while loop > to read all records in. But could somebody let me know if there is a > function in bioperl that can read in all the record at once and return > me an object? In perl, you can use an array to store the records. You could also use a hash if you have reasonable keys for the entries. Sean > http://www.bioperl.org/wiki/HOWTO:SeqIO > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jason at bioperl.org Wed Dec 30 19:58:54 2009 From: jason at bioperl.org (Jason Stajich) Date: Wed, 30 Dec 2009 11:58:54 -0800 Subject: [Bioperl-l] How to read in the whole fasta file in the memory? In-Reply-To: <264855a00912301004t396e0d4fwf9d291c5d82c3fb9@mail.gmail.com> References: <366c6f340912300926k5af5cc88nc3c3babda541fd1@mail.gmail.com> <264855a00912301004t396e0d4fwf9d291c5d82c3fb9@mail.gmail.com> Message-ID: <3550F192-111F-48A7-B1B7-113FFFAC105B@bioperl.org> or use a database object so you can retrieve sequences that have a particular id. See Bio::DB::Fasta On Dec 30, 2009, at 10:04 AM, Sean Davis wrote: > On Wed, Dec 30, 2009 at 12:26 PM, Peng Yu wrote: >> With Bio::SeqIO, I can only read in the records in a fasta file one >> by >> one. This is preferable if there are many records in a file. >> >> But I also want to read all the records in. I could use a while loop >> to read all records in. But could somebody let me know if there is a >> function in bioperl that can read in all the record at once and >> return >> me an object? > > In perl, you can use an array to store the records. You could also > use a hash if you have reasonable keys for the entries. > > Sean > > >> http://www.bioperl.org/wiki/HOWTO:SeqIO >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org http://fungalgenomes.org/ From maj at fortinbras.us Wed Dec 30 21:20:31 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 30 Dec 2009 16:20:31 -0500 Subject: [Bioperl-l] How to read in the whole fasta file in the memory? In-Reply-To: <366c6f340912300926k5af5cc88nc3c3babda541fd1@mail.gmail.com> References: <366c6f340912300926k5af5cc88nc3c3babda541fd1@mail.gmail.com> Message-ID: <2646F627E6D14AADB412A6E6B51E24DA@NewLife> I think you might want Bio::AlignIO: $alnio = Bio::AlignIO->new(-file=> 'my.fas' ); $aln = $alnio->next_aln; @seqs = $aln->each_seqs; MAJ ----- Original Message ----- From: "Peng Yu" To: Sent: Wednesday, December 30, 2009 12:26 PM Subject: [Bioperl-l] How to read in the whole fasta file in the memory? > With Bio::SeqIO, I can only read in the records in a fasta file one by > one. This is preferable if there are many records in a file. > > But I also want to read all the records in. I could use a while loop > to read all records in. But could somebody let me know if there is a > function in bioperl that can read in all the record at once and return > me an object? > > http://www.bioperl.org/wiki/HOWTO:SeqIO > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From David.Messina at sbc.su.se Thu Dec 31 10:55:32 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 31 Dec 2009 11:55:32 +0100 Subject: [Bioperl-l] question about a PAML module In-Reply-To: <31992102.1262223390984.JavaMail.oracle@rif2.s.upf.edu> References: <17885902.1262198478831.JavaMail.oracle@rif1.s.upf.edu> <31992102.1262223390984.JavaMail.oracle@rif2.s.upf.edu> Message-ID: Hi Rui and Sandra, Could you file this as a bug report at http://bugzilla.open-bio.org/enter_bug.cgi?product=Bioperl ? Once you've created the bug report with a brief description of the problem and submitted it, please attach the following to the bug report: - sample input files (a sequence file and a tree file, probably) - a script which reproduces the problem - the output (error messages) like you show below When I updated the code to work with the current version, I didn't exhaustively test all of the different modes of running codeml, so I appreciate you reporting this. There was another, similar issue reported a few days ago. I will try to take a look at both of these bug reports soon. Dave