From maj at fortinbras.us Sun Nov 1 23:47:15 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sun, 1 Nov 2009 23:47:15 -0500 Subject: [Bioperl-l] annotations Message-ID: <5150801225E0484D95DC51B2D00AE519@NewLife> I'm cogitating on features and annotations. For a RichSeq, one gets the set of annotations by $seq->annotation->get_Annotations while getting features by $seq->get_Features Is there a reason not to have a method in SeqI sub get_Annotations { shift->annotation->get_Annotations } to allow a user to do what seems natural from a user's perspective, viz. $seq->get_Annotations? I imagine this might save hundreds of hours of frustration, integrated over all newbies. MAJ From cjfields at illinois.edu Mon Nov 2 08:08:54 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 2 Nov 2009 07:08:54 -0600 Subject: [Bioperl-l] annotations In-Reply-To: <5150801225E0484D95DC51B2D00AE519@NewLife> References: <5150801225E0484D95DC51B2D00AE519@NewLife> Message-ID: <6920A9E1-D221-4CF8-9866-0ADBDB254C19@illinois.edu> On Nov 1, 2009, at 10:47 PM, Mark A. Jensen wrote: > I'm cogitating on features and annotations. For a RichSeq, one gets > the set of annotations by > > $seq->annotation->get_Annotations > > while getting features by > > $seq->get_Features > > Is there a reason not to have a method in SeqI > > sub get_Annotations { shift->annotation->get_Annotations } > > to allow a user to do what seems natural from a user's perspective, > viz. $seq->get_Annotations? I imagine this might save hundreds of > hours of frustration, integrated over all newbies. > MAJ One could add the methods to delegate to annotation() (that's essentially what I'm planning on doing for Biome). chris From kiekyon.huang at gmail.com Tue Nov 3 10:14:39 2009 From: kiekyon.huang at gmail.com (Kie Kyon Huang) Date: Tue, 3 Nov 2009 23:14:39 +0800 Subject: [Bioperl-l] render_blast problem Message-ID: Hi, I was trying to follow the HOWTO:Graphics at http://www.bioperl.org/wiki/HOWTO:Graphics When running the command line in cygwin $ perl render_blast1.pl data1.txt | display - I get the following error line, bash: display: command not found I also tried $ perl render_blast1.pl data1.txt > data1.png however, I was unable to open the data1.png file using Microsoft Office Picture Manager or windows Photo Gallery Thanks Huang From biopython at maubp.freeserve.co.uk Tue Nov 3 10:45:37 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 3 Nov 2009 15:45:37 +0000 Subject: [Bioperl-l] render_blast problem In-Reply-To: References: Message-ID: <320fb6e00911030745s68331ef7n729505f460863e21@mail.gmail.com> On Tue, Nov 3, 2009 at 3:14 PM, Kie Kyon Huang wrote: > Hi, > > I was trying to follow the HOWTO:Graphics at > http://www.bioperl.org/wiki/HOWTO:Graphics > > When running the command line in cygwin > > $ perl render_blast1.pl data1.txt | display - > > I get the following error line, > > bash: display: command not found That makes sense on Windows, since display is a Unix command line tool. > I also tried > > $ perl render_blast1.pl data1.txt > data1.png Based on the wiki, I think that ought to have worked. > however, I was unable to open the data1.png file using Microsoft > Office Picture Manager or windows Photo Gallery Did you do this step?: >> Important! If you are on a Windows platform, you need to put >> STDOUT into binary mode so that the PNG file does not go >> through Window's carriage return/linefeed transformations. >> Before the final print statement, put the statement >> binmode(STDOUT). This advice also applies to certain older >> versions of RedHat, which ship with a patched (and possibly >> broken) version of Perl. (BioPerl devs - couldn't that be added to the default render_blast1.pl script with an if statement checking for Windows?) Peter From biopython at maubp.freeserve.co.uk Tue Nov 3 11:04:59 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 3 Nov 2009 16:04:59 +0000 Subject: [Bioperl-l] render_blast problem In-Reply-To: References: <320fb6e00911030745s68331ef7n729505f460863e21@mail.gmail.com> Message-ID: <320fb6e00911030804r62e50da6w373bbb61e9823f28@mail.gmail.com> Mailing list CC'd - solved :) On Tue, Nov 3, 2009 at 3:55 PM, Kie Kyon Huang wrote: > > ok, that fix it > i forget sometimes what platform am i on. > thanks Great. Peter From amackey at virginia.edu Tue Nov 3 12:09:00 2009 From: amackey at virginia.edu (Aaron Mackey) Date: Tue, 3 Nov 2009 12:09:00 -0500 Subject: [Bioperl-l] svn errors? Message-ID: <24c96eca0911030909p7cfbf858h4de5a345cf8a0782@mail.gmail.com> [ajm6q at lc4 bioperl-live]$ svn update svn: Decompression of svndiff data failed I'll admit to not having svn updated in awhile; A clean, anonymous svn co failed with the same message: [...] A bioperl-live/Bio/Structure/StructureI.pm A bioperl-live/Bio/Structure/IO svn: Decompression of svndiff data failed -Aaron P.S. I used this command: svn co svn:// code.open-bio.org/bioperl/bioperl-live/trunk bioperl-live From cjfields at illinois.edu Tue Nov 3 12:17:10 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 3 Nov 2009 11:17:10 -0600 Subject: [Bioperl-l] svn errors? In-Reply-To: <24c96eca0911030909p7cfbf858h4de5a345cf8a0782@mail.gmail.com> References: <24c96eca0911030909p7cfbf858h4de5a345cf8a0782@mail.gmail.com> Message-ID: <8C5FC42D-F957-45AC-9AAC-876ACC9D77E0@illinois.edu> Aaron, Yep, this was reported to support (a couple of users on #bioperl reported the same problem). Chris D. is looking into it. I'm wondering if it's worth setting up a second mirror to github for this purpose. chris On Nov 3, 2009, at 11:09 AM, Aaron Mackey wrote: > [ajm6q at lc4 bioperl-live]$ svn update > svn: Decompression of svndiff data failed > > > I'll admit to not having svn updated in awhile; A clean, anonymous > svn co > failed with the same message: > > [...] > A bioperl-live/Bio/Structure/StructureI.pm > A bioperl-live/Bio/Structure/IO > svn: Decompression of svndiff data failed > > -Aaron > > P.S. I used this command: svn co svn:// > code.open-bio.org/bioperl/bioperl-live/trunk bioperl-live > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Nov 3 12:19:56 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 3 Nov 2009 11:19:56 -0600 Subject: [Bioperl-l] render_blast problem In-Reply-To: <320fb6e00911030745s68331ef7n729505f460863e21@mail.gmail.com> References: <320fb6e00911030745s68331ef7n729505f460863e21@mail.gmail.com> Message-ID: <8336341C-C7B4-4740-A7C3-E2DE5FDAF651@illinois.edu> On Nov 3, 2009, at 9:45 AM, Peter wrote: > ... > Did you do this step?: >>> Important! If you are on a Windows platform, you need to put >>> STDOUT into binary mode so that the PNG file does not go >>> through Window's carriage return/linefeed transformations. >>> Before the final print statement, put the statement >>> binmode(STDOUT). This advice also applies to certain older >>> versions of RedHat, which ship with a patched (and possibly >>> broken) version of Perl. > > (BioPerl devs - couldn't that be added to the default > render_blast1.pl script with an if statement checking for > Windows?) > > Peter Yes, that should be added. I'll work on it. chris From mauricio at open-bio.org Tue Nov 3 12:20:52 2009 From: mauricio at open-bio.org (Mauricio Herrera Cuadra) Date: Tue, 03 Nov 2009 11:20:52 -0600 Subject: [Bioperl-l] svn errors? In-Reply-To: <24c96eca0911030909p7cfbf858h4de5a345cf8a0782@mail.gmail.com> References: <24c96eca0911030909p7cfbf858h4de5a345cf8a0782@mail.gmail.com> Message-ID: <4AF06674.30506@open-bio.org> Hi Aaron, This was reported a few days ago. Chris Dagdigian is working today on a fix for it. Mauricio. Aaron Mackey wrote: > [ajm6q at lc4 bioperl-live]$ svn update > svn: Decompression of svndiff data failed > > > I'll admit to not having svn updated in awhile; A clean, anonymous svn co > failed with the same message: > > [...] > A bioperl-live/Bio/Structure/StructureI.pm > A bioperl-live/Bio/Structure/IO > svn: Decompression of svndiff data failed > > -Aaron > > P.S. I used this command: svn co svn:// > code.open-bio.org/bioperl/bioperl-live/trunk bioperl-live > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From rachitasharma at gmail.com Tue Nov 3 17:12:11 2009 From: rachitasharma at gmail.com (Rachita Sharma) Date: Tue, 3 Nov 2009 14:12:11 -0800 Subject: [Bioperl-l] Trouble parsing PSI-BLAST Message-ID: <48f9c0d0911031412v26935097ib06d13c2266cfd8a@mail.gmail.com> I am having trouble parsing PSI-BLAST results. Please help. The code is: my $in = new Bio::SearchIO( -format => 'blast', -file => "BS_XFpsiRblastoutputs/e${ev}/bloutput${i}.txt"); while( my $result = $in->next_result ) { while( my $hit = $result->next_hit ) { $sth->execute($result->query_name, $hit->name, $hit->significance); print "Query executed!\n"; } } The error is: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: no data for midline ***** No hits found ****** STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:359 STACK: Bio::SearchIO::blast::next_result /usr/lib/perl5/site_perl/5.8.8/Bio/SearchIO/blast.pm:1813 STACK: BSubVCpsiRblast.pl:92 ----------------------------------------------------------- From cjfields at illinois.edu Tue Nov 3 22:42:55 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 3 Nov 2009 21:42:55 -0600 Subject: [Bioperl-l] Trouble parsing PSI-BLAST In-Reply-To: <48f9c0d0911031412v26935097ib06d13c2266cfd8a@mail.gmail.com> References: <48f9c0d0911031412v26935097ib06d13c2266cfd8a@mail.gmail.com> Message-ID: Rachita, You'll have to give us more to go on than this. The best thing to do is file a bug report and attach an example PSI-BLAST report and code that causes the problem. The $sth->execute(...) is a bit odd, but that shouldn't cause the error in question. Also, make sure to stipulate the OS, version of BioPerl, and perl version. chris On Nov 3, 2009, at 4:12 PM, Rachita Sharma wrote: > I am having trouble parsing PSI-BLAST results. Please help. > > The code is: > my $in = new Bio::SearchIO( -format => 'blast', > -file => > "BS_XFpsiRblastoutputs/e${ev}/bloutput${i}.txt"); > > > while( my $result = $in->next_result ) { > while( my $hit = $result->next_hit ) { > > $sth->execute($result->query_name, $hit->name, $hit->significance); > print "Query executed!\n"; > > } > } > > The error is: > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: no data for midline ***** No hits found ****** > STACK: Error::throw > STACK: Bio::Root::Root::throw > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:359 > STACK: Bio::SearchIO::blast::next_result > /usr/lib/perl5/site_perl/5.8.8/Bio/SearchIO/blast.pm:1813 > STACK: BSubVCpsiRblast.pl:92 > ----------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From alexl at users.sourceforge.net Wed Nov 4 02:30:21 2009 From: alexl at users.sourceforge.net (Alex Lancaster) Date: Wed, 04 Nov 2009 02:30:21 -0500 Subject: [Bioperl-l] version of ExtUtils::Manifest too strict? Message-ID: Does the version of ExtUtils::Manifest really need to be strictly greater than or equal to 1.52? Currently this blocks me updating the Fedora package of BioPerl to 1.6.1, because the version of perl that Fedora ships is on 1.51 and hence the build fails with: Checking prerequisites... - ERROR: ExtUtils::Manifest (1.51_01) is installed, but we need version >= 1.52 Full logs are here: http://koji.fedoraproject.org/koji/taskinfo?taskID=1787483 http://koji.fedoraproject.org/koji/getfile?taskID=1787483&name=build.log This is true even with the version of Perl in rawhide/F-12 etc. (ExtUtils::Manifest is in the base perl package). If it really is necessary, I would like to be armed with a good argument why it needs to be updated, since the Perl package maintainer would have to update the entire Perl package simply to get a more recent version of one small subpackage. Regards, Alex From jluis.lavin at unavarra.es Wed Nov 4 03:43:35 2009 From: jluis.lavin at unavarra.es (jluis.lavin at unavarra.es) Date: Wed, 4 Nov 2009 09:43:35 +0100 (CET) Subject: [Bioperl-l] Trouble retrieving multiple sequences from NCBI in a single list query Message-ID: <1386.130.206.164.153.1257324215.squirrel@webmail.unavarra.es> Hello all, I?m a newbie who is having terrible troubles trying to retrieve a list multiple sequences from the NCBI and write them to a single file in Fasta format. The code I?ve written seems to read mylist and retrive the sequences, but it kinda overwrites them so that I only get the last sequence on the list. I?ve been told to ask the people on this mailing list for help, since you may have come across this problem also or at last will know how to solve it... Here is my code, which basically consist on an STDIN for the list to be read into an array and a loop to read each sequence (stopping when the list ends) and retrieve a sequence each time the loop is launched, writting that sequence to a fasta file. I only get a sequence back although it seems to perform the retrieving process with each of the sequences of the list... #!/usr/bin/perl -w use strict; use Bio::DB::GenPept; use Bio::DB::GenBank; use Bio::SeqIO; print "Enter your list name:"; my $archivo=; chomp $archivo; die ("Can?t open input\n") unless (open(INFILE, $archivo)); my @lista = ; foreach my $seq (@lista) { if ($seq eq '') { die ("empty list") } else { my $db = new Bio::DB::GenPept("-format" => "Fasta"); my $seqobj = $db->get_Seq_by_acc($seq); my $out = new Bio::SeqIO (-file => ">extracted_seqs.fasta", -format => 'fasta'); $out->write_seq($seqobj); } } exit; An example list of sequences can be this one: YP_003107578.1 YP_003106103.1 YP_003106552.1 YP_003106560.1 YP_003107053.1 YP_003107450.1 YP_003108000.1 YP_003105023.1 YP_003105264.1 Thanks in advance for your help ;) -- Jos? Luis Lav?n Trueba, PhD Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN From e.osimo at gmail.com Wed Nov 4 04:54:52 2009 From: e.osimo at gmail.com (Emanuele Osimo) Date: Wed, 4 Nov 2009 10:54:52 +0100 Subject: [Bioperl-l] Bio::Graphics and picture format Message-ID: <2ac05d0f0911040154h4eed4a1j8108f78e6e4761f3@mail.gmail.com> Hello everyone, do you know if it is possible to generate an image with Bio::Graphics in a vector format? Is there a list of available formats? Thanks Emanuele From David.Messina at sbc.su.se Wed Nov 4 04:52:53 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 4 Nov 2009 10:52:53 +0100 Subject: [Bioperl-l] Trouble retrieving multiple sequences from NCBI in a single list query In-Reply-To: <1386.130.206.164.153.1257324215.squirrel@webmail.unavarra.es> References: <1386.130.206.164.153.1257324215.squirrel@webmail.unavarra.es> Message-ID: <628aabb70911040152r19ed79dfnbc54f346295d28a8@mail.gmail.com> > > The code I?ve written seems to read mylist and retrive the sequences, but > it kinda overwrites them so that I only get the last sequence on the list. > With this line my $out = new Bio::SeqIO (-file => ">extracted_seqs.fasta", -format => 'fasta'); you are opening the filehandle for the output file inside your loop, so each time it is writing over the previous file with an empty file. Then, you write a single sequence to that file with this line $out->write_seq($seqobj); So when you are done, you just have the last sequence in the output file. If you move the opening of the output filehandle outside the loop (it needs to be done only once), then it should work as you expect. Also, I notice the newline characters are not being removed from your sequence IDs (actually I'm a little surprised that the sequences are being retrieved). Just to be safe, you may want to add the line chomp @lista; after my @lista = ; Dave From jluis.lavin at unavarra.es Wed Nov 4 05:14:40 2009 From: jluis.lavin at unavarra.es (jluis.lavin at unavarra.es) Date: Wed, 4 Nov 2009 11:14:40 +0100 (CET) Subject: [Bioperl-l] Trouble retrieving multiple sequences from NCBI in a single list query In-Reply-To: <628aabb70911040152r19ed79dfnbc54f346295d28a8@mail.gmail.com> References: <1386.130.206.164.153.1257324215.squirrel@webmail.unavarra.es> <628aabb70911040152r19ed79dfnbc54f346295d28a8@mail.gmail.com> Message-ID: <1791.130.206.164.153.1257329680.squirrel@webmail.unavarra.es> Thank you very very much Dave, I?ve had a really frustrating time trying to find out what I was doing wrong, it has been so frustrating that I was about to quit Bioperl. Now I can try to focus on BLAST parsing for my comparative genomic analysis You?re great in this mailing list, because you give a fast and neat advice to all the questions asked here by newbies like me ;) El Mie, 4 de Noviembre de 2009, 10:52, Dave Messina escribi?: >> >> The code I??ve written seems to read mylist and retrive the sequences, >> but >> it kinda overwrites them so that I only get the last sequence on the >> list. >> > > With this line > > my $out = new Bio::SeqIO (-file => ">extracted_seqs.fasta", -format => > 'fasta'); > > > you are opening the filehandle for the output file inside your loop, so > each > time it is writing over the previous file with an empty file. Then, you > write a single sequence to that file with this line > > $out->write_seq($seqobj); > > > So when you are done, you just have the last sequence in the output file. > > If you move the opening of the output filehandle outside the loop (it > needs > to be done only once), then it should work as you expect. > > Also, I notice the newline characters are not being removed from your > sequence IDs (actually I'm a little surprised that the sequences are > being > retrieved). Just to be safe, you may want to add the line > > chomp @lista; > > > after > > my @lista = ; > > > > > Dave > -- Dr. Jos? Luis Lav?n Trueba Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN From hrh at fmi.ch Wed Nov 4 05:05:17 2009 From: hrh at fmi.ch (Hotz, Hans-Rudolf) Date: Wed, 04 Nov 2009 11:05:17 +0100 Subject: [Bioperl-l] Trouble retrieving multiple sequences from NCBI in a single list query In-Reply-To: <1386.130.206.164.153.1257324215.squirrel@webmail.unavarra.es> Message-ID: Hi try my $out = new Bio::SeqIO (-file => ">>extracted_seqs.fasta", ^ this way you no longer overwrite your existing file, but append the next sequence. Regards, Hans On 11/4/09 9:43 AM, "jluis.lavin at unavarra.es" wrote: > > Hello all, > > I?m a newbie who is having terrible troubles trying to retrieve a list > multiple sequences from the NCBI and write them to a single file in Fasta > format. > The code I?ve written seems to read mylist and retrive the sequences, but > it kinda overwrites them so that I only get the last sequence on the list. > I?ve been told to ask the people on this mailing list for help, since you > may have come across this problem also or at last will know how to solve > it... > > Here is my code, which basically consist on an STDIN for the list to be > read into an array and a loop to read each sequence (stopping when the > list ends) and retrieve a sequence each time the loop is launched, > writting that sequence to a fasta file. I only get a sequence back > although it seems to perform the retrieving process with each of the > sequences of the list... > > > #!/usr/bin/perl -w > use strict; > use Bio::DB::GenPept; > use Bio::DB::GenBank; > use Bio::SeqIO; > print "Enter your list name:"; > my $archivo=; > chomp $archivo; > die ("Can?t open input\n") unless (open(INFILE, $archivo)); > my @lista = ; > foreach my $seq (@lista) { > if ($seq eq '') { > die ("empty list") > } > else { > my $db = new Bio::DB::GenPept("-format" => "Fasta"); > my $seqobj = $db->get_Seq_by_acc($seq); > my $out = new Bio::SeqIO (-file => ">extracted_seqs.fasta", > -format => 'fasta'); > $out->write_seq($seqobj); > } > } > exit; > > > An example list of sequences can be this one: > > YP_003107578.1 > YP_003106103.1 > YP_003106552.1 > YP_003106560.1 > YP_003107053.1 > YP_003107450.1 > YP_003108000.1 > YP_003105023.1 > YP_003105264.1 > > Thanks in advance for your help ;) From jluis.lavin at unavarra.es Wed Nov 4 05:25:38 2009 From: jluis.lavin at unavarra.es (jluis.lavin at unavarra.es) Date: Wed, 4 Nov 2009 11:25:38 +0100 (CET) Subject: [Bioperl-l] Trouble retrieving multiple sequences from NCBI in asingle list query In-Reply-To: References: <1386.130.206.164.153.1257324215.squirrel@webmail.unavarra.es> Message-ID: <1834.130.206.164.153.1257330338.squirrel@webmail.unavarra.es> Thank you very much for your answer Hans!!! It works perfectly,also a neat and fast solution, like Dave?s. Blessings to you all ;) El Mie, 4 de Noviembre de 2009, 11:05, Hotz, Hans-Rudolf escribi?: > Hi > > try > > my $out = new Bio::SeqIO (-file => ">>extracted_seqs.fasta", > ^ > > this way you no longer overwrite your existing file, but append the next > sequence. > > Regards, Hans > > > > On 11/4/09 9:43 AM, "jluis.lavin at unavarra.es" > wrote: > >> >> Hello all, >> >> I?m a newbie who is having terrible troubles trying to retrieve a list >> multiple sequences from the NCBI and write them to a single file in >> Fasta >> format. >> The code I?ve written seems to read mylist and retrive the sequences, >> but >> it kinda overwrites them so that I only get the last sequence on the >> list. >> I?ve been told to ask the people on this mailing list for help, since >> you >> may have come across this problem also or at last will know how to solve >> it... >> >> Here is my code, which basically consist on an STDIN for the list to be >> read into an array and a loop to read each sequence (stopping when the >> list ends) and retrieve a sequence each time the loop is launched, >> writting that sequence to a fasta file. I only get a sequence back >> although it seems to perform the retrieving process with each of the >> sequences of the list... >> >> >> #!/usr/bin/perl -w >> use strict; >> use Bio::DB::GenPept; >> use Bio::DB::GenBank; >> use Bio::SeqIO; >> print "Enter your list name:"; >> my $archivo=; >> chomp $archivo; >> die ("Can?t open input\n") unless (open(INFILE, $archivo)); >> my @lista = ; >> foreach my $seq (@lista) { >> if ($seq eq '') { >> die ("empty list") >> } >> else { >> my $db = new Bio::DB::GenPept("-format" => "Fasta"); >> my $seqobj = $db->get_Seq_by_acc($seq); >> my $out = new Bio::SeqIO (-file => ">extracted_seqs.fasta", >> -format => 'fasta'); >> $out->write_seq($seqobj); >> } >> } >> exit; >> >> >> An example list of sequences can be this one: >> >> YP_003107578.1 >> YP_003106103.1 >> YP_003106552.1 >> YP_003106560.1 >> YP_003107053.1 >> YP_003107450.1 >> YP_003108000.1 >> YP_003105023.1 >> YP_003105264.1 >> >> Thanks in advance for your help ;) > > -- Dr. Jos? Luis Lav?n Trueba Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN From scott at scottcain.net Wed Nov 4 08:26:02 2009 From: scott at scottcain.net (Scott Cain) Date: Wed, 4 Nov 2009 08:26:02 -0500 Subject: [Bioperl-l] Bio::Graphics and picture format In-Reply-To: <2ac05d0f0911040154h4eed4a1j8108f78e6e4761f3@mail.gmail.com> References: <2ac05d0f0911040154h4eed4a1j8108f78e6e4761f3@mail.gmail.com> Message-ID: <0FB17FBC-16BE-4A9F-AC75-983D3B4ECE7D@scottcain.net> Hi Emanuele, It is possible to use GD::SVG instead of GD to generate SVG graphics. To use it, you provide an argument of "-image_class GD::SVG" to the constructor of Bio::Graphics::Panel. See the perldoc of Bio::Graphics::Panel for more info. Scott On Nov 4, 2009, at 4:54 AM, Emanuele Osimo wrote: > Hello everyone, > do you know if it is possible to generate an image with > Bio::Graphics in a > vector format? Is there a list of available formats? > Thanks > Emanuele > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ----------------------------------------------------------------------- Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From b3sn7 at UNB.ca Tue Nov 3 12:30:24 2009 From: b3sn7 at UNB.ca (Sharma, Rachita) Date: Tue, 3 Nov 2009 13:30:24 -0400 Subject: [Bioperl-l] Trouble parsing PSI-BLAST Message-ID: <1257269424.4af068b045434@webmail.unb.ca> I am having trouble parsing PSI-BLAST results. Please help. The code is: my $in = new Bio::SearchIO( -format => 'blast', -file => "BS_XFpsiRblastoutputs/e${ev}/bloutput${i}.txt"); while( my $result = $in->next_result ) { while( my $hit = $result->next_hit ) { $sth->execute($result->query_name, $hit->name, $hit->significance); print "Query executed!\n"; } } The error is: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: no data for midline ***** No hits found ****** STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:359 STACK: Bio::SearchIO::blast::next_result /usr/lib/perl5/site_perl/5.8.8/Bio/SearchIO/blast.pm:1813 STACK: BSubVCpsiRblast.pl:92 ----------------------------------------------------------- ******************************* Rachita Sharma Research Assistant (PhD Student) University of New Brunswick, NB, CANADA email: Rachita.Sharma at unb.ca Phone no: 503-895-3619 ******************************* From cjfields at illinois.edu Wed Nov 4 08:53:35 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 4 Nov 2009 07:53:35 -0600 Subject: [Bioperl-l] version of ExtUtils::Manifest too strict? In-Reply-To: References: Message-ID: <1D9E943F-2EDC-49AB-83DE-78DED5A8AC23@illinois.edu> Alex, Not sure why ExtUtils::Manifest can't be bundled as a separate perl package alone. It is part of perl core but it's also available on CPAN separately from perl itself: http://search.cpan.org/~rkobes/ExtUtils-Manifest-1.57/lib/ExtUtils/Manifest.pm This is the commit message for that BTW. This allows spaces in file names for the MANIFEST. v1.52 is a bug fix and is required. http://code.open-bio.org/svnweb/index.cgi/bioperl/revision/?rev=15673 chris On Nov 4, 2009, at 1:30 AM, Alex Lancaster wrote: > Does the version of ExtUtils::Manifest really need to be strictly > greater than or equal to 1.52? > > Currently this blocks me updating the Fedora package of BioPerl to > 1.6.1, because the version of perl that Fedora ships is on 1.51 and > hence the build fails with: > > Checking prerequisites... > - ERROR: ExtUtils::Manifest (1.51_01) is installed, but we need > version >= 1.52 > > Full logs are here: > http://koji.fedoraproject.org/koji/taskinfo?taskID=1787483 > http://koji.fedoraproject.org/koji/getfile?taskID=1787483&name=build.log > > This is true even with the version of Perl in rawhide/F-12 etc. > (ExtUtils::Manifest is in the base perl package). > > If it really is necessary, I would like to be armed with a good > argument why this ca > why it needs to be updated, since the Perl package maintainer would > have > to update the entire Perl package simply to get a more recent > version of > one small subpackage. > > Regards, > Alex > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Nov 4 08:55:34 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 4 Nov 2009 07:55:34 -0600 Subject: [Bioperl-l] Trouble parsing PSI-BLAST In-Reply-To: <1257269424.4af068b045434@webmail.unb.ca> References: <1257269424.4af068b045434@webmail.unb.ca> Message-ID: <70E34111-4E70-463D-86EE-06926EA57073@illinois.edu> Rachita, Asked and answered yesterday. Please submit as a bug. chris On Nov 3, 2009, at 11:30 AM, Sharma, Rachita wrote: > > I am having trouble parsing PSI-BLAST results. Please help. > > The code is: > my $in = new Bio::SearchIO( -format => 'blast', > -file => "BS_XFpsiRblastoutputs/e${ev}/bloutput${i}.txt"); > > > while( my $result = $in->next_result ) { > while( my $hit = $result->next_hit ) { > > $sth->execute($result->query_name, $hit->name, $hit->significance); > print "Query executed!\n"; > > } > } > > The error is: > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: no data for midline ***** No hits found ****** > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/ > Root/Root.pm:359 > STACK: Bio::SearchIO::blast::next_result > /usr/lib/perl5/site_perl/5.8.8/Bio/SearchIO/blast.pm:1813 > STACK: BSubVCpsiRblast.pl:92 > ----------------------------------------------------------- > > > > > ******************************* > Rachita Sharma > Research Assistant (PhD Student) > University of New Brunswick, NB, CANADA > email: Rachita.Sharma at unb.ca > Phone no: 503-895-3619 > ******************************* > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Wed Nov 4 09:11:43 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 4 Nov 2009 15:11:43 +0100 Subject: [Bioperl-l] Trouble retrieving multiple sequences from NCBI in a single list query In-Reply-To: <1791.130.206.164.153.1257329680.squirrel@webmail.unavarra.es> References: <1386.130.206.164.153.1257324215.squirrel@webmail.unavarra.es> <628aabb70911040152r19ed79dfnbc54f346295d28a8@mail.gmail.com> <1791.130.206.164.153.1257329680.squirrel@webmail.unavarra.es> Message-ID: <628aabb70911040611q56b441c8o6888f326d0b314d@mail.gmail.com> Aw shucks, Jos?, glad I could be of help. There are plenty of people who answer questions around here, but my timezone sometimes gives me an advantage for the European ones. :) Dave From daniel.gaston at gmail.com Wed Nov 4 09:45:04 2009 From: daniel.gaston at gmail.com (Daniel Gaston) Date: Wed, 4 Nov 2009 10:45:04 -0400 Subject: [Bioperl-l] SwissProt and Subcellular localization information Message-ID: <50c615ba0911040645j1b28e727p5d7bf47a04db160b@mail.gmail.com> Hi Everyone, I have recently been playing around with SwissProt format flatfiles and want to extract sequences based on subcellular localization. I notice in going through the code for swiss.pm and swissdriver.pm that in both (more so in swissdriver.pm) there are several steps where organelle information based on the OG line could be extracted and added to data structure but isn't. It seems that in both cases the OG line is being added in to the generic lumping of data from the OC, OS, and OX lines in order to extract species names and taxonomy information but getting rid of everything else. Is there a particular reason for this or just a simple oversight? On the surface at least it looks like a relatively simple modification to make although I admit that I am not terribly adept at manipulating these SeqIO datastructures. Thanks for your time, Dan From daniel.gaston at gmail.com Wed Nov 4 12:12:10 2009 From: daniel.gaston at gmail.com (Daniel Gaston) Date: Wed, 4 Nov 2009 13:12:10 -0400 Subject: [Bioperl-l] SwissProt and Subcellular localization information Message-ID: <50c615ba0911040912pfd2483fwe44cd098beed73c7@mail.gmail.com> Sorry folks, it appears I was just being a bonehead and didn't look close enough into Bio:Annotations and Bio:Species objects that store all of this data. Dan On Wed, Nov 4, 2009 at 1:00 PM, wrote: > Send Bioperl-l mailing list submissions to > bioperl-l at lists.open-bio.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://lists.open-bio.org/mailman/listinfo/bioperl-l > or, via email, send a message with subject or body 'help' to > bioperl-l-request at lists.open-bio.org > > You can reach the person managing the list at > bioperl-l-owner at lists.open-bio.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Bioperl-l digest..." > > Today's Topics: > > 1. SwissProt and Subcellular localization information > (Daniel Gaston) > > > ---------- Forwarded message ---------- > From: Daniel Gaston > To: bioperl-l at lists.open-bio.org > Date: Wed, 4 Nov 2009 10:45:04 -0400 > Subject: [Bioperl-l] SwissProt and Subcellular localization information > Hi Everyone, > > I have recently been playing around with SwissProt format flatfiles and > want > to extract sequences based on subcellular localization. I notice in going > through the code for swiss.pm and swissdriver.pm that in both (more so in > swissdriver.pm) there are several steps where organelle information based > on > the OG line could be extracted and added to data structure but isn't. It > seems that in both cases the OG line is being added in to the generic > lumping of data from the OC, OS, and OX lines in order to extract species > names and taxonomy information but getting rid of everything else. Is there > a particular reason for this or just a simple oversight? On the surface at > least it looks like a relatively simple modification to make although I > admit that I am not terribly adept at manipulating these SeqIO > datastructures. > > Thanks for your time, > > Dan > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jluis.lavin at unavarra.es Thu Nov 5 10:28:23 2009 From: jluis.lavin at unavarra.es (jluis.lavin at unavarra.es) Date: Thu, 5 Nov 2009 16:28:23 +0100 (CET) Subject: [Bioperl-l] A question about iBio::Index: and its correct use Message-ID: <2120.130.206.164.153.1257434903.squirrel@webmail.unavarra.es> Hello to all, I?m trying to write a script to retrieve a list of sequences from a local FASTA file (for example a fasta archive where all the protein models of an organism are stored). This file would be used by me as some kind "local database" (sorry if I mistake a few concepts...) I?ve been reading the BioPerl HOWTOs and I came across the Bio::Index::Fasta tool. If I didn?t misunderstood what I read (which can be easy because my low level on programming) this Indexing tool should do the job. I wrote a couple of scripts based on the documentation i read about this tool, but I don?t seem to be able to create the index file to be used later (to retrieve the sequences from). -First of all, I want to ask the people in this forum if the Bio::Index::Fasta is the right one to chose for this tasks. -Then I?ll beg you to take a look at my scripts, because I don?t seem to catch the bug... Best wishes to you all and thanks in advance ;) -- Jos? Luis Lav?n Trueba, PhD Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN From maj at fortinbras.us Thu Nov 5 10:39:05 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 5 Nov 2009 10:39:05 -0500 Subject: [Bioperl-l] A question about iBio::Index: and its correct use In-Reply-To: <2120.130.206.164.153.1257434903.squirrel@webmail.unavarra.es> References: <2120.130.206.164.153.1257434903.squirrel@webmail.unavarra.es> Message-ID: Jos? -- It looks like this is a good solution to your problem. Please send you script so we can look at it- cheers Mark ----- Original Message ----- From: To: Sent: Thursday, November 05, 2009 10:28 AM Subject: [Bioperl-l] A question about iBio::Index: and its correct use Hello to all, I?m trying to write a script to retrieve a list of sequences from a local FASTA file (for example a fasta archive where all the protein models of an organism are stored). This file would be used by me as some kind "local database" (sorry if I mistake a few concepts...) I?ve been reading the BioPerl HOWTOs and I came across the Bio::Index::Fasta tool. If I didn?t misunderstood what I read (which can be easy because my low level on programming) this Indexing tool should do the job. I wrote a couple of scripts based on the documentation i read about this tool, but I don?t seem to be able to create the index file to be used later (to retrieve the sequences from). -First of all, I want to ask the people in this forum if the Bio::Index::Fasta is the right one to chose for this tasks. -Then I?ll beg you to take a look at my scripts, because I don?t seem to catch the bug... Best wishes to you all and thanks in advance ;) -- Jos? Luis Lav?n Trueba, PhD Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From jluis.lavin at unavarra.es Thu Nov 5 10:46:36 2009 From: jluis.lavin at unavarra.es (jluis.lavin at unavarra.es) Date: Thu, 5 Nov 2009 16:46:36 +0100 (CET) Subject: [Bioperl-l] [Fwd: Re: A question about iBio::Index: and its correct use] Message-ID: <2642.130.206.164.153.1257435996.squirrel@webmail.unavarra.es> ---------------------------- Mensaje original ---------------------------- Subject: Re: [Bioperl-l] A question about iBio::Index: and its correct use From: jluis.lavin at unavarra.es Fecha: Jue, 5 de Noviembre de 2009, 16:46 To: "Mark A. Jensen" -------------------------------------------------------------------------- Hi Mark, I?ve actually got two scripts, the first one is to create the index and the second one is to retrieve the sequence lis from the indexed file. 1)Here is the Index creation script: #!/c:/Perl -w use strict; use Bio::Index::Fasta; use strict; print "Enter file for indexing: \n"; my $Index_File_Name = ; my $inx = Bio::Index::Fasta->new(-filename => $Index_File_Name.".idx", -write_flag => 1); $inx->make_index(my $File_Name); 2)And here is the sequence retrieval script: #!/c:/Perl -w use Bio::Index::Fasta; use strict; #PC9.fasta is my genomic file my $Index_File_Name ="PC9.fasta"; my $inx = Bio::Index::Fasta->new($Index_File_Name); #LCS.txt is my sequences list @ARGV = ; foreach my $id (@ARGV) { if ($id eq ''){ die ("empty list") } else { my $seqobj = $inx->fetch($id); my $out = new Bio::SeqIO (-file => ">>extracted_seqs.fasta", -format => 'fasta'); $out->write_seq($seqobj); } } exit; } I hope this code is not a total scum... Thanks in advance ;) El Jue, 5 de Noviembre de 2009, 16:39, Mark A. Jensen escribi?: > Jos? -- It looks like this is a good solution to your problem. Please send > you > script so we can look at it- > cheers Mark > ----- Original Message ----- > From: > To: > Sent: Thursday, November 05, 2009 10:28 AM > Subject: [Bioperl-l] A question about iBio::Index: and its correct use > > > > Hello to all, > > I?m trying to write a script to retrieve a list of sequences from a local > FASTA file (for example a fasta archive where all the protein models of an > organism are stored). This file would be used by me as some kind "local > database" (sorry if I mistake a few concepts...) > I?ve been reading the BioPerl HOWTOs and I came across the > Bio::Index::Fasta tool. > If I didn?t misunderstood what I read (which can be easy because my low > level on programming) this Indexing tool should do the job. > I wrote a couple of scripts based on the documentation i read about this > tool, but I don?t seem to be able to create the index file to be used > later (to retrieve the sequences from). > -First of all, I want to ask the people in this forum if the > Bio::Index::Fasta is the right one to chose for this tasks. > -Then I?ll beg you to take a look at my scripts, because I don?t seem to > catch the bug... > > Best wishes to you all and thanks in advance ;) > > -- > Jos? Luis Lav?n Trueba, PhD > > Dpto. de Producci?n Agraria > Grupo de Gen?tica y Microbiolog?a > Universidad P?blica de Navarra > 31006 Pamplona > Navarra > SPAIN > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > -- Dr. Jos? Luis Lav?n Trueba Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN -- Dr. Jos? Luis Lav?n Trueba Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN From maj at fortinbras.us Thu Nov 5 10:37:53 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 5 Nov 2009 10:37:53 -0500 Subject: [Bioperl-l] Trouble retrieving multiple sequences from NCBI ina single list query In-Reply-To: <628aabb70911040611q56b441c8o6888f326d0b314d@mail.gmail.com> References: <1386.130.206.164.153.1257324215.squirrel@webmail.unavarra.es> <628aabb70911040152r19ed79dfnbc54f346295d28a8@mail.gmail.com> <1791.130.206.164.153.1257329680.squirrel@webmail.unavarra.es> <628aabb70911040611q56b441c8o6888f326d0b314d@mail.gmail.com> Message-ID: <49075FDFF6764EE48E932D95EB994221@NewLife> True, Dave, you compete only with crazed east coast core developers who're doing "just one more thing" at 2am.... ----- Original Message ----- From: "Dave Messina" To: Cc: Sent: Wednesday, November 04, 2009 9:11 AM Subject: Re: [Bioperl-l] Trouble retrieving multiple sequences from NCBI ina single list query > Aw shucks, Jos?, glad I could be of help. There are plenty of people who > answer questions around here, but my timezone sometimes gives me an > advantage for the European ones. :) > > > Dave > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hrh at fmi.ch Thu Nov 5 11:02:48 2009 From: hrh at fmi.ch (Hotz, Hans-Rudolf) Date: Thu, 05 Nov 2009 17:02:48 +0100 Subject: [Bioperl-l] A question about iBio::Index: and its correct use In-Reply-To: <2120.130.206.164.153.1257434903.squirrel@webmail.unavarra.es> Message-ID: Jluis > -Then I?ll beg you to take a look at my scripts, because I don?t seem to > catch the bug... you haven't attached/included any scripts, have you? Anyway, have you considered using BLAST indices (created with the additional flag "-o") together with the tool 'fastacmd' (which also included in the NCBI blast binaries) as a simple (and very fast) alternative for fetching sequences. Regards, Hans From maj at fortinbras.us Thu Nov 5 11:02:09 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 5 Nov 2009 11:02:09 -0500 Subject: [Bioperl-l] [Fwd: Re: A question about iBio::Index: and its correct use] In-Reply-To: <2642.130.206.164.153.1257435996.squirrel@webmail.unavarra.es> References: <2642.130.206.164.153.1257435996.squirrel@webmail.unavarra.es> Message-ID: <1984ED07F36C446284B25F617964B6C6@NewLife> Hey Jos?, The first thing that jumps out it the index file name. Looks like you create it as PC9.fasta.idx But you read it as PC9.fasta Not an unusual mistake. Do my $inx = Bio::Index::Fasta->new('PC9.fasta.idx'); and see if it works. MAJ ----- Original Message ----- From: To: Sent: Thursday, November 05, 2009 10:46 AM Subject: [Bioperl-l] [Fwd: Re: A question about iBio::Index: and its correct use] ---------------------------- Mensaje original ---------------------------- Subject: Re: [Bioperl-l] A question about iBio::Index: and its correct use From: jluis.lavin at unavarra.es Fecha: Jue, 5 de Noviembre de 2009, 16:46 To: "Mark A. Jensen" -------------------------------------------------------------------------- Hi Mark, I?ve actually got two scripts, the first one is to create the index and the second one is to retrieve the sequence lis from the indexed file. 1)Here is the Index creation script: #!/c:/Perl -w use strict; use Bio::Index::Fasta; use strict; print "Enter file for indexing: \n"; my $Index_File_Name = ; my $inx = Bio::Index::Fasta->new(-filename => $Index_File_Name.".idx", -write_flag => 1); $inx->make_index(my $File_Name); 2)And here is the sequence retrieval script: #!/c:/Perl -w use Bio::Index::Fasta; use strict; #PC9.fasta is my genomic file my $Index_File_Name ="PC9.fasta"; my $inx = Bio::Index::Fasta->new($Index_File_Name); #LCS.txt is my sequences list @ARGV = ; foreach my $id (@ARGV) { if ($id eq ''){ die ("empty list") } else { my $seqobj = $inx->fetch($id); my $out = new Bio::SeqIO (-file => ">>extracted_seqs.fasta", -format => 'fasta'); $out->write_seq($seqobj); } } exit; } I hope this code is not a total scum... Thanks in advance ;) El Jue, 5 de Noviembre de 2009, 16:39, Mark A. Jensen escribi?: > Jos? -- It looks like this is a good solution to your problem. Please send > you > script so we can look at it- > cheers Mark > ----- Original Message ----- > From: > To: > Sent: Thursday, November 05, 2009 10:28 AM > Subject: [Bioperl-l] A question about iBio::Index: and its correct use > > > > Hello to all, > > I?m trying to write a script to retrieve a list of sequences from a local > FASTA file (for example a fasta archive where all the protein models of an > organism are stored). This file would be used by me as some kind "local > database" (sorry if I mistake a few concepts...) > I?ve been reading the BioPerl HOWTOs and I came across the > Bio::Index::Fasta tool. > If I didn?t misunderstood what I read (which can be easy because my low > level on programming) this Indexing tool should do the job. > I wrote a couple of scripts based on the documentation i read about this > tool, but I don?t seem to be able to create the index file to be used > later (to retrieve the sequences from). > -First of all, I want to ask the people in this forum if the > Bio::Index::Fasta is the right one to chose for this tasks. > -Then I?ll beg you to take a look at my scripts, because I don?t seem to > catch the bug... > > Best wishes to you all and thanks in advance ;) > > -- > Jos? Luis Lav?n Trueba, PhD > > Dpto. de Producci?n Agraria > Grupo de Gen?tica y Microbiolog?a > Universidad P?blica de Navarra > 31006 Pamplona > Navarra > SPAIN > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > -- Dr. Jos? Luis Lav?n Trueba Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN -- Dr. Jos? Luis Lav?n Trueba Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From jluis.lavin at unavarra.es Thu Nov 5 11:21:57 2009 From: jluis.lavin at unavarra.es (jluis.lavin at unavarra.es) Date: Thu, 5 Nov 2009 17:21:57 +0100 (CET) Subject: [Bioperl-l] [Fwd: Re: A question about iBio::Index: and its correct use] In-Reply-To: <1984ED07F36C446284B25F617964B6C6@NewLife> References: <2642.130.206.164.153.1257435996.squirrel@webmail.unavarra.es> <1984ED07F36C446284B25F617964B6C6@NewLife> Message-ID: <2969.130.206.164.153.1257438117.squirrel@webmail.unavarra.es> Thank you very much Mark, that?s a good point :$ I guess your correction is referred to the second script, isn?t it? If it is so, there is still a problem with the first script, it doesn?t create the PC9.fasta.idx file, instead it creates two files named: -PC9.fasta.idx.pag -PC9.fasta.idx.dir which seem to be clearly related with some kind of indexing process...but, unless the PC9.fasta.idx file is only virtual or remains hidden, I can?t find it anywhere... Forgive me if I?m talking nosense... Thank you very much again for your help ;) El Jue, 5 de Noviembre de 2009, 17:02, Mark A. Jensen escribi?: > Hey Jos?, > The first thing that jumps out it the index file name. Looks > like you create it as > PC9.fasta.idx > But you read it as > PC9.fasta > Not an unusual mistake. Do > my $inx = Bio::Index::Fasta->new('PC9.fasta.idx'); > and see if it works. > MAJ > ----- Original Message ----- > From: > To: > Sent: Thursday, November 05, 2009 10:46 AM > Subject: [Bioperl-l] [Fwd: Re: A question about iBio::Index: and its > correct > use] > > > > > ---------------------------- Mensaje original ---------------------------- > Subject: Re: [Bioperl-l] A question about iBio::Index: and its correct use > From: jluis.lavin at unavarra.es > Fecha: Jue, 5 de Noviembre de 2009, 16:46 > To: "Mark A. Jensen" > -------------------------------------------------------------------------- > > Hi Mark, > > I?ve actually got two scripts, the first one is to create the index and > the second one is to retrieve the sequence lis from the indexed file. > > 1)Here is the Index creation script: > > #!/c:/Perl -w > use strict; > use Bio::Index::Fasta; > use strict; > > print "Enter file for indexing: \n"; > my $Index_File_Name = ; > my $inx = Bio::Index::Fasta->new(-filename => $Index_File_Name.".idx", > -write_flag => 1); > $inx->make_index(my $File_Name); > > 2)And here is the sequence retrieval script: > > #!/c:/Perl -w > use Bio::Index::Fasta; > use strict; > #PC9.fasta is my genomic file > my $Index_File_Name ="PC9.fasta"; > my $inx = Bio::Index::Fasta->new($Index_File_Name); > #LCS.txt is my sequences list > @ARGV = ; > foreach my $id (@ARGV) { > if ($id eq ''){ > die ("empty list") > } > else { > my $seqobj = $inx->fetch($id); > my $out = new Bio::SeqIO (-file => ">>extracted_seqs.fasta", > -format => 'fasta'); > $out->write_seq($seqobj); > } > } > exit; > } > > I hope this code is not a total scum... > > Thanks in advance ;) > > > > El Jue, 5 de Noviembre de 2009, 16:39, Mark A. Jensen escribi?: >> Jos? -- It looks like this is a good solution to your problem. Please >> send >> you >> script so we can look at it- >> cheers Mark >> ----- Original Message ----- >> From: >> To: >> Sent: Thursday, November 05, 2009 10:28 AM >> Subject: [Bioperl-l] A question about iBio::Index: and its correct use >> >> >> >> Hello to all, >> >> I?m trying to write a script to retrieve a list of sequences from a >> local >> FASTA file (for example a fasta archive where all the protein models of >> an >> organism are stored). This file would be used by me as some kind "local >> database" (sorry if I mistake a few concepts...) >> I?ve been reading the BioPerl HOWTOs and I came across the >> Bio::Index::Fasta tool. >> If I didn?t misunderstood what I read (which can be easy because my low >> level on programming) this Indexing tool should do the job. >> I wrote a couple of scripts based on the documentation i read about this >> tool, but I don?t seem to be able to create the index file to be used >> later (to retrieve the sequences from). >> -First of all, I want to ask the people in this forum if the >> Bio::Index::Fasta is the right one to chose for this tasks. >> -Then I?ll beg you to take a look at my scripts, because I don?t seem to >> catch the bug... >> >> Best wishes to you all and thanks in advance ;) >> >> -- >> Jos? Luis Lav?n Trueba, PhD >> >> Dpto. de Producci?n Agraria >> Grupo de Gen?tica y Microbiolog?a >> Universidad P?blica de Navarra >> 31006 Pamplona >> Navarra >> SPAIN >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > > > -- > Dr. Jos? Luis Lav?n Trueba > > Dpto. de Producci?n Agraria > Grupo de Gen?tica y Microbiolog?a > Universidad P?blica de Navarra > 31006 Pamplona > Navarra > SPAIN > > > > -- > Dr. Jos? Luis Lav?n Trueba > > Dpto. de Producci?n Agraria > Grupo de Gen?tica y Microbiolog?a > Universidad P?blica de Navarra > 31006 Pamplona > Navarra > SPAIN > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > -- Dr. Jos? Luis Lav?n Trueba Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN From maj at fortinbras.us Thu Nov 5 11:39:09 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 5 Nov 2009 11:39:09 -0500 Subject: [Bioperl-l] [Fwd: Re: A question about iBio::Index: and its correct use] In-Reply-To: <2969.130.206.164.153.1257438117.squirrel@webmail.unavarra.es> References: <2642.130.206.164.153.1257435996.squirrel@webmail.unavarra.es> <1984ED07F36C446284B25F617964B6C6@NewLife> <2969.130.206.164.153.1257438117.squirrel@webmail.unavarra.es> Message-ID: Yes, these are files created by the SDBM, Perl's internal db manager. You should be able to open the index by simply $inx = Bio::Index::Fasta->new('PC9.fasta.idx'); and the dbm will know what to do-- cheers MAJ ----- Original Message ----- From: To: "Mark A. Jensen" Cc: ; Sent: Thursday, November 05, 2009 11:21 AM Subject: Re: [Bioperl-l] [Fwd: Re: A question about iBio::Index: and its correct use] > Thank you very much Mark, that?s a good point :$ > I guess your correction is referred to the second script, isn?t it? > > If it is so, there is still a problem with the first script, it doesn?t > create the PC9.fasta.idx file, instead it creates two files named: > -PC9.fasta.idx.pag > -PC9.fasta.idx.dir > > which seem to be clearly related with some kind of indexing process...but, > unless the PC9.fasta.idx file is only virtual or remains hidden, I can?t > find it anywhere... > Forgive me if I?m talking nosense... > > Thank you very much again for your help ;) > > > El Jue, 5 de Noviembre de 2009, 17:02, Mark A. Jensen escribi?: >> Hey Jos?, >> The first thing that jumps out it the index file name. Looks >> like you create it as >> PC9.fasta.idx >> But you read it as >> PC9.fasta >> Not an unusual mistake. Do >> my $inx = Bio::Index::Fasta->new('PC9.fasta.idx'); >> and see if it works. >> MAJ >> ----- Original Message ----- >> From: >> To: >> Sent: Thursday, November 05, 2009 10:46 AM >> Subject: [Bioperl-l] [Fwd: Re: A question about iBio::Index: and its >> correct >> use] >> >> >> >> >> ---------------------------- Mensaje original ---------------------------- >> Subject: Re: [Bioperl-l] A question about iBio::Index: and its correct use >> From: jluis.lavin at unavarra.es >> Fecha: Jue, 5 de Noviembre de 2009, 16:46 >> To: "Mark A. Jensen" >> -------------------------------------------------------------------------- >> >> Hi Mark, >> >> I?ve actually got two scripts, the first one is to create the index and >> the second one is to retrieve the sequence lis from the indexed file. >> >> 1)Here is the Index creation script: >> >> #!/c:/Perl -w >> use strict; >> use Bio::Index::Fasta; >> use strict; >> >> print "Enter file for indexing: \n"; >> my $Index_File_Name = ; >> my $inx = Bio::Index::Fasta->new(-filename => $Index_File_Name.".idx", >> -write_flag => 1); >> $inx->make_index(my $File_Name); >> >> 2)And here is the sequence retrieval script: >> >> #!/c:/Perl -w >> use Bio::Index::Fasta; >> use strict; >> #PC9.fasta is my genomic file >> my $Index_File_Name ="PC9.fasta"; >> my $inx = Bio::Index::Fasta->new($Index_File_Name); >> #LCS.txt is my sequences list >> @ARGV = ; >> foreach my $id (@ARGV) { >> if ($id eq ''){ >> die ("empty list") >> } >> else { >> my $seqobj = $inx->fetch($id); >> my $out = new Bio::SeqIO (-file => ">>extracted_seqs.fasta", >> -format => 'fasta'); >> $out->write_seq($seqobj); >> } >> } >> exit; >> } >> >> I hope this code is not a total scum... >> >> Thanks in advance ;) >> >> >> >> El Jue, 5 de Noviembre de 2009, 16:39, Mark A. Jensen escribi?: >>> Jos? -- It looks like this is a good solution to your problem. Please >>> send >>> you >>> script so we can look at it- >>> cheers Mark >>> ----- Original Message ----- >>> From: >>> To: >>> Sent: Thursday, November 05, 2009 10:28 AM >>> Subject: [Bioperl-l] A question about iBio::Index: and its correct use >>> >>> >>> >>> Hello to all, >>> >>> I?m trying to write a script to retrieve a list of sequences from a >>> local >>> FASTA file (for example a fasta archive where all the protein models of >>> an >>> organism are stored). This file would be used by me as some kind "local >>> database" (sorry if I mistake a few concepts...) >>> I?ve been reading the BioPerl HOWTOs and I came across the >>> Bio::Index::Fasta tool. >>> If I didn?t misunderstood what I read (which can be easy because my low >>> level on programming) this Indexing tool should do the job. >>> I wrote a couple of scripts based on the documentation i read about this >>> tool, but I don?t seem to be able to create the index file to be used >>> later (to retrieve the sequences from). >>> -First of all, I want to ask the people in this forum if the >>> Bio::Index::Fasta is the right one to chose for this tasks. >>> -Then I?ll beg you to take a look at my scripts, because I don?t seem to >>> catch the bug... >>> >>> Best wishes to you all and thanks in advance ;) >>> >>> -- >>> Jos? Luis Lav?n Trueba, PhD >>> >>> Dpto. de Producci?n Agraria >>> Grupo de Gen?tica y Microbiolog?a >>> Universidad P?blica de Navarra >>> 31006 Pamplona >>> Navarra >>> SPAIN >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >> >> >> -- >> Dr. Jos? Luis Lav?n Trueba >> >> Dpto. de Producci?n Agraria >> Grupo de Gen?tica y Microbiolog?a >> Universidad P?blica de Navarra >> 31006 Pamplona >> Navarra >> SPAIN >> >> >> >> -- >> Dr. Jos? Luis Lav?n Trueba >> >> Dpto. de Producci?n Agraria >> Grupo de Gen?tica y Microbiolog?a >> Universidad P?blica de Navarra >> 31006 Pamplona >> Navarra >> SPAIN >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > > > -- > Dr. Jos? Luis Lav?n Trueba > > Dpto. de Producci?n Agraria > Grupo de Gen?tica y Microbiolog?a > Universidad P?blica de Navarra > 31006 Pamplona > Navarra > SPAIN > > > > From jluis.lavin at unavarra.es Thu Nov 5 12:48:12 2009 From: jluis.lavin at unavarra.es (jluis.lavin at unavarra.es) Date: Thu, 5 Nov 2009 18:48:12 +0100 (CET) Subject: [Bioperl-l] A question about iBio::Index: and its correct use In-Reply-To: References: <2120.130.206.164.153.1257434903.squirrel@webmail.unavarra.es> Message-ID: <3313.130.206.164.153.1257443292.squirrel@webmail.unavarra.es> Thanks a lot for your help Hans, It's a little bit to hard to understand and turn into script this awesome information you've just given me...I hope I can use it in a near future anyway ;) The issue here is that the sequences I,m indexing are not generated by the NCBI nor stored there...although I belive you?re just refering to the tool itself and not to a retrieval from the NCBI. Thanks again you?re all great giving advice to newbies like me ;) Best wishes to you all El Jue, 5 de Noviembre de 2009, 17:02, Hotz, Hans-Rudolf escribi?: > > > > Jluis > >> -Then I?ll beg you to take a look at my scripts, because I don?t seem to >> catch the bug... > > you haven't attached/included any scripts, have you? > > > Anyway, have you considered using BLAST indices (created with the > additional > flag "-o") together with the tool 'fastacmd' (which also included in the > NCBI blast binaries) as a simple (and very fast) alternative for fetching > sequences. > > > Regards, Hans > > > -- Dr. Jos? Luis Lav?n Trueba Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN From florent.angly at gmail.com Thu Nov 5 13:00:19 2009 From: florent.angly at gmail.com (Florent Angly) Date: Thu, 05 Nov 2009 10:00:19 -0800 Subject: [Bioperl-l] A question about iBio::Index: and its correct use In-Reply-To: <3313.130.206.164.153.1257443292.squirrel@webmail.unavarra.es> References: <2120.130.206.164.153.1257434903.squirrel@webmail.unavarra.es> <3313.130.206.164.153.1257443292.squirrel@webmail.unavarra.es> Message-ID: <4AF312B3.9060009@gmail.com> Hans-Rudolf was talking about a way to retrieve sequences from a BLAST database. If you use BLAST locally, then your database is local too. More info here: http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/formatdb_fastacmd.html Florent jluis.lavin at unavarra.es wrote: > Thanks a lot for your help Hans, > It's a little bit to hard to understand and turn into script this awesome > information you've just given me...I hope I can use it in a near future > anyway ;) > The issue here is that the sequences I,m indexing are not generated by the > NCBI nor stored there...although I belive you?re just refering to the tool > itself and not to a retrieval from the NCBI. > > Thanks again you?re all great giving advice to newbies like me ;) > > Best wishes to you all > > > El Jue, 5 de Noviembre de 2009, 17:02, Hotz, Hans-Rudolf escribi?: > >> >> Jluis >> >> >>> -Then I?ll beg you to take a look at my scripts, because I don?t seem to >>> catch the bug... >>> >> you haven't attached/included any scripts, have you? >> >> >> Anyway, have you considered using BLAST indices (created with the >> additional >> flag "-o") together with the tool 'fastacmd' (which also included in the >> NCBI blast binaries) as a simple (and very fast) alternative for fetching >> sequences. >> >> >> Regards, Hans >> >> >> >> > > > From valiente at lsi.upc.edu Fri Nov 6 03:06:48 2009 From: valiente at lsi.upc.edu (valiente at lsi.upc.edu) Date: Fri, 6 Nov 2009 09:06:48 +0100 (CET) Subject: [Bioperl-l] Bio::SeqIO::genbank.pm Message-ID: <45737.147.83.59.225.1257494808.squirrel@webmail.lsi.upc.edu> There is a line in Bio::SeqIO::genbank.pm to convert data in classification lines into a classification array by splitting only on ';' or '.' so that a classification that is 2 or more words will still get matched,my @class = map { s/^\s+//; s/\s+$//; s/\s{2,}/ /g; $_; } split /(? References: <2120.130.206.164.153.1257434903.squirrel@webmail.unavarra.es> < C718B5B8.5561%hrh@fmi.ch> <3313.130.206.164.153.1257443292.squirrel@webmail.unavarra.es> <4AF312B3.9060009@gmail.com> Message-ID: <1222.130.206.164.153.1257497085.squirrel@webmail.unavarra.es> Thank you for the info Florent! I?ll try to read al the information on the link you provided and try to figure out how to make it work and if it is worthy for me, I mean, I work with several sequence files that come from multiple databases (JGI, BROAD, Genolevures or NCBI). Protein IDs from each of those databases is different from NCBI. Maybe it could be easier to write a script that allows me to enter a fasta file with all the protein models of a single organism, parse it and then extract the sequences of a given list (using the "ID style" of the particular database) than creating a BLAST index for each organism I need to work with...Did I explain the issue correctly? Anyway, since I don?t know anything about this tool Hans and you provided me, I can easily be wrong... Thank you for showing me the local BLAST Index tool, I?ll read the documentation carefully and study all its possibilities. Best wishes JL El Jue, 5 de Noviembre de 2009, 19:00, Florent Angly escribi?: > Hans-Rudolf was talking about a way to retrieve sequences from a BLAST > database. If you use BLAST locally, then your database is local too. > More info here: > http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/formatdb_fastacmd.html > Florent > > > jluis.lavin at unavarra.es wrote: >> Thanks a lot for your help Hans, >> It's a little bit to hard to understand and turn into script this >> awesome >> information you've just given me...I hope I can use it in a near future >> anyway ;) >> The issue here is that the sequences I,m indexing are not generated by >> the >> NCBI nor stored there...although I belive you?re just refering to the >> tool >> itself and not to a retrieval from the NCBI. >> >> Thanks again you?re all great giving advice to newbies like me ;) >> >> Best wishes to you all >> >> >> El Jue, 5 de Noviembre de 2009, 17:02, Hotz, Hans-Rudolf escribi?: >> >>> >>> Jluis >>> >>> >>>> -Then I?ll beg you to take a look at my scripts, because I don?t seem >>>> to >>>> catch the bug... >>>> >>> you haven't attached/included any scripts, have you? >>> >>> >>> Anyway, have you considered using BLAST indices (created with the >>> additional >>> flag "-o") together with the tool 'fastacmd' (which also included in >>> the >>> NCBI blast binaries) as a simple (and very fast) alternative for >>> fetching >>> sequences. >>> >>> >>> Regards, Hans >>> >>> >>> >>> >> >> >> > > -- Dr. Jos? Luis Lav?n Trueba Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN From maj at fortinbras.us Fri Nov 6 07:45:01 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 6 Nov 2009 07:45:01 -0500 Subject: [Bioperl-l] Bioperl In-Reply-To: <16842715.26316.1257510446095.JavaMail.root@durga.amrita.ac.in> References: <16842715.26316.1257510446095.JavaMail.root@durga.amrita.ac.in> Message-ID: Hi Resmi- You should look at http://bioperl.org/ under "Installation" for information on getting and installing BioPerl. An introduction to working with trees in BioPerl is at this link: http://www.bioperl.org/wiki/HOWTO:Trees cheers, Mark ----- Original Message ----- From: Resmi S. To: maj at fortinbras.us Sent: Friday, November 06, 2009 7:27 AM Subject: Bioperl Respected Sir, I am Resmi S studying II MSc Bioinformatics.Now am doing my project in Phylogenetic Tree Construction using BioPerl.I am not much familiar on BioPerl modules.So could please send me the names of the Bioperl modules needed for my project.I also need to know , from where i will get these modules.If that is from CPAN,then send me the location or link.I kindly request you to send me the details soon. Yours Sincerely, Resmi S, II MSc Bioinformatics, School of Biotechnology, Amrita Vishwa Vidyapeetham, Email : amm08bi019 at students.amrita.ac.in ------------------------------------------------------------------------------ ------------------------------------------------------------------- This mail has been scanned by Amrita GAV Server, Amrita Vishwa Vidyapeetham, Amritapuri Campus From robert.bradbury at gmail.com Fri Nov 6 12:35:22 2009 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Fri, 6 Nov 2009 12:35:22 -0500 Subject: [Bioperl-l] Function that determines serious mutations Message-ID: Is there a function in the library (or has someone written one) that can take a genbank entry and determine which mutations are harmful? It would be used to produce a table summary of: GENE # SNP # BadSNP One kind of gets this from NCBI if you lookup in the "GENE" db a gene name and then go to the "GeneView" om dbSNP page it has the information I want but largely in a graphical format while I simply want numbers I can dump into a spreadsheet. I don't think it would be hard, fetch the gene, run through the features for the SNP database, figure out whether they are good or bad SNPs, accumulate the statistics and dump it. I think the functions available are flexible enough to do it but I can't believe nobody has already done it. It could be a bit more complex in that one could do an analysis to see if the mutations are in a conserved domain or mutations that code for Cysteine or Methionine (or othe potentially "critical" amino acids) but since "critical" is in the eye of the beholder there would have to be some kind of callback to a scoring function. Thanks, Robert From nevoband at igb.uiuc.edu Fri Nov 6 15:58:05 2009 From: nevoband at igb.uiuc.edu (kleenix) Date: Fri, 6 Nov 2009 12:58:05 -0800 (PST) Subject: [Bioperl-l] StandAloneBlast Unallowed parameter Message-ID: <26230896.post@talk.nabble.com> I'm not sure if i'm doing this wrong. I am trying to use the -m parameter in blastall using the StandAloneBlast bioperl class. when i add 'm'=>0 to @params i get Unallowed parameter: error. Am I adding the parameter wrong? i'm using StandAloneBlast version 1.51 Thanks -Nevo -- View this message in context: http://old.nabble.com/StandAloneBlast-Unallowed-parameter-tp26230896p26230896.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From veronica.xiaoyu at gmail.com Fri Nov 6 17:25:04 2009 From: veronica.xiaoyu at gmail.com (Xiaoyu Liang) Date: Fri, 6 Nov 2009 17:25:04 -0500 Subject: [Bioperl-l] Parsing BLAST out file to HTML. How to change the description's name of each hit? Message-ID: Hi, I'm using Bio::SearchIO::Writer HTMLResultWriter help me parse BLAST out file into HTML. Anybody knows how to parse and change the description name of each hit? By using hit->description can call hits' description, but it is not allowed to be modified. Thank you very much, Xiaoyu From maj at fortinbras.us Fri Nov 6 19:40:17 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 6 Nov 2009 19:40:17 -0500 Subject: [Bioperl-l] Parsing BLAST out file to HTML. How to change thedescription's name of each hit? In-Reply-To: References: Message-ID: <11592B31D9924FA7A8638D90AE4A3F4A@NewLife> Xiaoyu- That method should work to change the description; are you doing $hit->description('This is my new description'); This method returns the old description when you change the value: $hit->description('old'); $str = $hit->description('new'); # $str eq 'old' $str = $hit->description; # $str eq 'new' MAJ ----- Original Message ----- From: "Xiaoyu Liang" To: Sent: Friday, November 06, 2009 5:25 PM Subject: [Bioperl-l] Parsing BLAST out file to HTML. How to change thedescription's name of each hit? > Hi, > > I'm using Bio::SearchIO::Writer HTMLResultWriter help me parse BLAST out > file into HTML. > > Anybody knows how to parse and change the description name of each hit? > > By using hit->description can call hits' description, but it is not allowed > to be modified. > > Thank you very much, > Xiaoyu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From Daniel.Lang at biologie.uni-freiburg.de Sun Nov 8 09:50:48 2009 From: Daniel.Lang at biologie.uni-freiburg.de (Daniel Lang) Date: Sun, 08 Nov 2009 15:50:48 +0100 Subject: [Bioperl-l] arguments to call back functions in GBrowse2 Message-ID: <4AF6DAC8.8070204@biologie.uni-freiburg.de> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi Lincoln, a while back (May 29, 2009; 09:08pm) you replied to an even older thread ("Re: Access the parent of a Bio::DB::SeqFeature within a gbrowse config callback function"). I missed your reply and did follow it up back then, sorry! I'm currently facing the same issue again with gbrowse2. I have a callback function for "balloon click". Following your last reply I expected 5 arguments, but I am getting only three: $feature,$panel,$track. In principle, I am using the latest releases/checkouts... Which modules do I need to look at/update for this functionality? Furthermore, is there a possibility to share global variables between gbrowse2 and slaves? Should this work via init_code? Should modules initialized in a conf be in the scope of a slave? If not can I introduce modules via the slave config files, or do I need to alter the slave scripts? Thanks, again! Cheers, Daniel PS: gbrowse2 rocks! -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkr22sUACgkQmJnbCpJAG3A2MgCdG61bNRGMFVWExagzMFejKMjO FiUAn16nQNemDGSy8nJBS5dUHQMnDgrP =ODxn -----END PGP SIGNATURE----- From maj at fortinbras.us Sun Nov 8 11:09:43 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sun, 8 Nov 2009 11:09:43 -0500 Subject: [Bioperl-l] GuessSeqFormat: fastq? Message-ID: Hi All- Any plans in the works for a _possibly_fastq sequence guesser? MAJ From maj at fortinbras.us Sun Nov 8 11:20:55 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sun, 8 Nov 2009 11:20:55 -0500 Subject: [Bioperl-l] GuessSeqFormat: fastq? In-Reply-To: References: Message-ID: Never mind; got it covered-- MAJ ----- Original Message ----- From: "Mark A. Jensen" To: "bioperl-l" Sent: Sunday, November 08, 2009 11:09 AM Subject: [Bioperl-l] GuessSeqFormat: fastq? > Hi All- > Any plans in the works for a _possibly_fastq sequence guesser? > MAJ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From saikari78 at gmail.com Mon Nov 9 10:47:10 2009 From: saikari78 at gmail.com (saikari keitele) Date: Mon, 9 Nov 2009 15:47:10 +0000 Subject: [Bioperl-l] Retrieving link to protein from PubChem Message-ID: Hi, I'm using Bioperl to retrieve records from PubChem. I'm trying to find a way-but have been unsuccessful- to retrieve from a compound record, the reference to the protein(s) that can synthesize the compound. Thanks very much. saikari From saikari78 at gmail.com Mon Nov 9 11:05:57 2009 From: saikari78 at gmail.com (saikari keitele) Date: Mon, 9 Nov 2009 16:05:57 +0000 Subject: [Bioperl-l] [bioperl newbie] Retrieving link to protein from PubChem Message-ID: Hi, I'm using Bioperl to retrieve records from PubChem. I'm trying to find a way-but have been unsuccessful- to retrieve from a compound record, the reference to the protein(s) that can synthesize the compound. Thanks very much. saikari From cjfields at illinois.edu Mon Nov 9 11:27:10 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 9 Nov 2009 10:27:10 -0600 Subject: [Bioperl-l] [bioperl newbie] Retrieving link to protein from PubChem In-Reply-To: References: Message-ID: <1ECC543A-F923-4D5E-A0C1-5BBD35ECAAE8@illinois.edu> On Nov 9, 2009, at 10:05 AM, saikari keitele wrote: > Hi, > > I'm using Bioperl to retrieve records from PubChem. > I'm trying to find a way-but have been unsuccessful- to retrieve > from a > compound record, the reference to the protein(s) that can synthesize > the > compound. > Thanks very much. > > saikari The below bioperl script returns the GI for proteins that correspond to the substance passed on the command line; invoke using 'perl pc_substance.pl substance_requested'. It probably needs more fiddling to catch everything but it should get you started. For other bits and pieces (such as how to retrieve the raw sequence files), please see the EUtilities HOWTO: http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook chris ---------------------------------------- #!/usr/bin/perl -w use 5.010; use strict; use warnings; use Bio::DB::EUtilities; my $substance = shift; my $eutil = Bio::DB::EUtilities->new(-eutil => 'esearch', -db => 'pcsubstance', -term => $substance, -usehistory => 'y'); my $hist = $eutil->next_History || die; $eutil->reset_parameters(-eutil => 'elink', -history => $hist, -db => 'protein', -dbfrom => 'pcsubstance', -retmax => 1000); say join(',',$eutil->get_ids); From saikari78 at gmail.com Mon Nov 9 11:41:20 2009 From: saikari78 at gmail.com (saikari keitele) Date: Mon, 9 Nov 2009 16:41:20 +0000 Subject: [Bioperl-l] [bioperl newbie] Retrieving link to protein from PubChem In-Reply-To: <1ECC543A-F923-4D5E-A0C1-5BBD35ECAAE8@illinois.edu> References: <1ECC543A-F923-4D5E-A0C1-5BBD35ECAAE8@illinois.edu> Message-ID: Fabulous!. Huge help. saikari On Mon, Nov 9, 2009 at 4:27 PM, Chris Fields wrote: > On Nov 9, 2009, at 10:05 AM, saikari keitele wrote: > > Hi, >> >> I'm using Bioperl to retrieve records from PubChem. >> I'm trying to find a way-but have been unsuccessful- to retrieve from a >> compound record, the reference to the protein(s) that can synthesize the >> compound. >> Thanks very much. >> >> saikari >> > > The below bioperl script returns the GI for proteins that correspond to the > substance passed on the command line; invoke using 'perl pc_substance.plsubstance_requested'. It probably needs more fiddling to catch everything > but it should get you started. > > For other bits and pieces (such as how to retrieve the raw sequence files), > please see the EUtilities HOWTO: > > http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook > > chris > > ---------------------------------------- > > #!/usr/bin/perl -w > > use 5.010; > use strict; > use warnings; > use Bio::DB::EUtilities; > > my $substance = shift; > > my $eutil = Bio::DB::EUtilities->new(-eutil => 'esearch', > -db => 'pcsubstance', > -term => $substance, > -usehistory => 'y'); > > my $hist = $eutil->next_History || die; > > $eutil->reset_parameters(-eutil => 'elink', > -history => $hist, > -db => 'protein', > -dbfrom => 'pcsubstance', > -retmax => 1000); > > say join(',',$eutil->get_ids); > From gc11song at gmail.com Mon Nov 9 13:08:48 2009 From: gc11song at gmail.com (Guangchun Song) Date: Mon, 9 Nov 2009 12:08:48 -0600 Subject: [Bioperl-l] how to get the protein sequences from DNA sequences around novel SNPs? Message-ID: <794eafc20911091008g1f98b944ncbd66ac4962a85a3@mail.gmail.com> Hello, I'm new bioperl user. I' working on a project: To determine the status of all tutative SNPs such as non-synonymous vs. synonymous, and predict the tranlational effect of non-synonymous mutations as benign or malicious. I'm trying to use bioperl to get the DNA sequence and translate to protein sequence for the SNPs that are in gene's coding region. Could someone tell me how to do it? Thanks, -Guangchun Song From robert.bradbury at gmail.com Mon Nov 9 16:15:33 2009 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Mon, 9 Nov 2009 16:15:33 -0500 Subject: [Bioperl-l] how to get the protein sequences from DNA sequences around novel SNPs? In-Reply-To: <794eafc20911091008g1f98b944ncbd66ac4962a85a3@mail.gmail.com> References: <794eafc20911091008g1f98b944ncbd66ac4962a85a3@mail.gmail.com> Message-ID: On Mon, Nov 9, 2009 at 1:08 PM, Guangchun Song wrote: > > I'm new bioperl user. I' working on a project: To determine the > status of all tutative SNPs such as non-synonymous vs. synonymous, and > predict the tranlational effect of non-synonymous mutations as benign > or malicious. I'm trying to use bioperl to get the DNA sequence and > translate to protein sequence for the SNPs that are in gene's coding > region. Could someone tell me how to do it? > > I too would like to know if this information is available. I've recently been working with the dbSNP results from NCBI but they display the results in a graphical format rather than data that one can play with and ask questions of like "What is the most disease causing gene in the Human Genome?" or "What are the critical proteins damaged by gene defects in the Human Genome?" ... "In terms of premature deaths, extended health care requirements, loss of quality of life, etc.?" The same types of questions can be applied to the dog and cat genomes where there is emotional value or the cow, horse, pig, etc. genomes where there is economic value? The value of BioPerl would increase significantly if there were functionality that would allow easy access to "these mutations may have negative/positive impact" (which means you need a function that qualifies mutations by degree) and allow for impact to be subjectively determined (implying there must be some callback function to provide a user quality/impact rating). For example: $/@differences = protein_compare($mygene, $refseq_gene, @critical_aa, @critical_domain, $callback) Where $callback could "rate" differences about the protein and position and the "type of interest" (e.g. metal binding amino acids, structural changing amino acids, critical catalysis amino acids, etc.). A default callback would be based on some evolving definition of "critical" changes which result in human disease for example. This is a "required" capability to be able to determine things like the "adaptability" of a species -- those with fewest critical mutation points may have better adaptability to mutation increasing circumstances. Please pardon any errors in perl syntax/usage its been a while since I've written perl and I'd really rather be coding in C. Robert From maj at fortinbras.us Mon Nov 9 16:56:24 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 9 Nov 2009 16:56:24 -0500 Subject: [Bioperl-l] how to get the protein sequences from DNA sequencesaround novel SNPs? In-Reply-To: References: <794eafc20911091008g1f98b944ncbd66ac4962a85a3@mail.gmail.com> Message-ID: <3ED3D387B5DE4248A218D42882369925@NewLife> I agree that BioPerl would significantly increase in value with such a module; in fact, the BioTeam would probably buy us out. My opinion is that the entire GWAS enterprise is the search for such a callback function, for humans anyway. For those engaged in this quest, if BioPerl doesn't provide a Maserati, it at least provides good italian-made (among others) parts. MAJ ----- Original Message ----- From: "Robert Bradbury" To: "Guangchun Song" Cc: Sent: Monday, November 09, 2009 4:15 PM Subject: Re: [Bioperl-l] how to get the protein sequences from DNA sequencesaround novel SNPs? > On Mon, Nov 9, 2009 at 1:08 PM, Guangchun Song wrote: >> >> I'm new bioperl user. I' working on a project: To determine the >> status of all tutative SNPs such as non-synonymous vs. synonymous, and >> predict the tranlational effect of non-synonymous mutations as benign >> or malicious. I'm trying to use bioperl to get the DNA sequence and >> translate to protein sequence for the SNPs that are in gene's coding >> region. Could someone tell me how to do it? >> >> > I too would like to know if this information is available. I've recently > been working with the dbSNP results from NCBI but they display the results > in a graphical format rather than data that one can play with and ask > questions of like "What is the most disease causing gene in the Human > Genome?" or "What are the critical proteins damaged by gene defects in the > Human Genome?" ... "In terms of premature deaths, extended health care > requirements, loss of quality of life, etc.?" > > The same types of questions can be applied to the dog and cat genomes where > there is emotional value or the cow, horse, pig, etc. genomes where there is > economic value? > > The value of BioPerl would increase significantly if there were > functionality that would allow easy access to "these mutations may have > negative/positive impact" (which means you need a function that qualifies > mutations by degree) and allow for impact to be subjectively determined > (implying there must be some callback function to provide a user > quality/impact rating). > > For example: > $/@differences = protein_compare($mygene, $refseq_gene, @critical_aa, > @critical_domain, $callback) > Where $callback could "rate" differences about the protein and position and > the "type of interest" (e.g. metal binding amino acids, structural changing > amino acids, critical catalysis amino acids, etc.). > > A default callback would be based on some evolving definition of "critical" > changes which result in human disease for example. > > This is a "required" capability to be able to determine things like the > "adaptability" of a species -- those with fewest critical mutation points > may have better adaptability to mutation increasing circumstances. > > Please pardon any errors in perl syntax/usage its been a while since I've > written perl and I'd really rather be coding in C. > > Robert > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From alexl at users.sourceforge.net Mon Nov 9 18:44:07 2009 From: alexl at users.sourceforge.net (Alex Lancaster) Date: Mon, 09 Nov 2009 18:44:07 -0500 Subject: [Bioperl-l] version of ExtUtils::Manifest too strict? In-Reply-To: <1D9E943F-2EDC-49AB-83DE-78DED5A8AC23@illinois.edu> (Chris Fields's message of "Wed, 4 Nov 2009 07:53:35 -0600") References: <1D9E943F-2EDC-49AB-83DE-78DED5A8AC23@illinois.edu> Message-ID: >>>>> Chris Fields writes: > Alex, Not sure why ExtUtils::Manifest can't be bundled as a separate > perl package alone. It is part of perl core but it's also available > on CPAN separately from perl itself: > http://search.cpan.org/~rkobes/ExtUtils-Manifest-1.57/lib/ExtUtils/Manifest.pm Hi Chris, Yes, in principle it would be possible to have this split out as a separate package (currently it's a "subpackage" under the main perl package), unfortunately that's just not the way it's currently done in Fedora (probably because it's part of the core set and they like to update all relevant packages in one step) and I have little control over that. As I suspected, the perl maintainer is not at all enthusiastic for updating the whole of perl just for that package (except for rawhide which would mean that bioperl 1.6.1 would not be available until F-13, about 6 months from now). See: http://bugzilla.redhat.com/show_bug.cgi?id=533562#c1 Obviously I am not happy with this situation either, because it will freeze bioperl on Fedora at 1.6.0 for about 6 months, so can you recommend any temporary workarounds in the meantime? > This is the commit message for that BTW. This allows spaces in file > names for the MANIFEST. v1.52 is a bug fix and is required. > http://code.open-bio.org/svnweb/index.cgi/bioperl/revision/?rev=15673 Perhaps I could create a patch that renamed files with spaces in them to ones with no spaces and then rename them again upon installation. Can you point me to which files are the problematic ones that triggered the dependency for 1.52? Perhaps I can figure a workaround. Meanwhile I will press the maintainer of perl in Fedora to perhaps reconsider his position (e.g. if another update for perl is going out for another reason, like a security update, perhaps he could roll in the 1.52 update at the same time). Cheers, Alex From cjfields at illinois.edu Mon Nov 9 19:50:00 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 9 Nov 2009 18:50:00 -0600 Subject: [Bioperl-l] version of ExtUtils::Manifest too strict? In-Reply-To: References: <1D9E943F-2EDC-49AB-83DE-78DED5A8AC23@illinois.edu> Message-ID: <29EA2398-F60B-48F2-AFE7-39A44011C451@illinois.edu> On Nov 9, 2009, at 5:44 PM, Alex Lancaster wrote: >>>>>> Chris Fields writes: > >> Alex, Not sure why ExtUtils::Manifest can't be bundled as a separate >> perl package alone. It is part of perl core but it's also available >> on CPAN separately from perl itself: > >> http://search.cpan.org/~rkobes/ExtUtils-Manifest-1.57/lib/ExtUtils/Manifest.pm > > Hi Chris, > > Yes, in principle it would be possible to have this split out as a > separate package (currently it's a "subpackage" under the main perl > package), unfortunately that's just not the way it's currently done in > Fedora (probably because it's part of the core set and they like to > update all relevant packages in one step) and I have little control > over > that. > > As I suspected, the perl maintainer is not at all enthusiastic for > updating the whole of perl just for that package (except for rawhide > which would mean that bioperl 1.6.1 would not be available until F-13, > about 6 months from now). See: > > http://bugzilla.redhat.com/show_bug.cgi?id=533562#c1 > > Obviously I am not happy with this situation either, because it will > freeze bioperl on Fedora at 1.6.0 for about 6 months, so can you > recommend any temporary workarounds in the meantime? Well, if you don't absolutely require the MANIFEST for the final package you can forego the requirement. The file in question that triggered the requirement is a data file used only for testing: t/data/test 2.txt >> This is the commit message for that BTW. This allows spaces in file >> names for the MANIFEST. v1.52 is a bug fix and is required. > >> http://code.open-bio.org/svnweb/index.cgi/bioperl/revision/?rev=15673 > > Perhaps I could create a patch that renamed files with spaces in > them to > ones with no spaces and then rename them again upon installation. > > Can you point me to which files are the problematic ones that > triggered > the dependency for 1.52? Perhaps I can figure a workaround. > > Meanwhile I will press the maintainer of perl in Fedora to perhaps > reconsider his position (e.g. if another update for perl is going out > for another reason, like a security update, perhaps he could roll in > the > 1.52 update at the same time). > > Cheers, > Alex I would point out that this is a fairly significant bug fix for ExtUtils::Manifest. A newer point release of perl is now available (5.10.1) that contains the fix and has a fix for a performance regression that popped up in 5.10.0. chris From jay at jays.net Mon Nov 9 19:05:51 2009 From: jay at jays.net (Jay Hannah) Date: Mon, 9 Nov 2009 18:05:51 -0600 Subject: [Bioperl-l] Bio::Index::GenBank - by organism? Message-ID: <3B01A09C-198E-4691-B807-7ED3250BB81A@jays.net> Many thanks to Ewan Birney et. al. for Bio::Index::* I can throw away my awful grep based index-by-accession stuff. :) Any chance someone has also written an organism based index mechanism? Something like... while (my $seq = $inx?>get_Seq_by_organism('*Xanthomonas*')) { print $seq->display_id . "\n"; } Thanks, j From cjfields at illinois.edu Mon Nov 9 22:55:01 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 9 Nov 2009 21:55:01 -0600 Subject: [Bioperl-l] Bio::Index::GenBank - by organism? In-Reply-To: <3B01A09C-198E-4691-B807-7ED3250BB81A@jays.net> References: <3B01A09C-198E-4691-B807-7ED3250BB81A@jays.net> Message-ID: <12DFD22E-42DC-4626-9873-0DE3EBB5CFBD@illinois.edu> On Nov 9, 2009, at 6:05 PM, Jay Hannah wrote: > Many thanks to Ewan Birney et. al. for Bio::Index::* > > I can throw away my awful grep based index-by-accession stuff. :) > > Any chance someone has also written an organism based index > mechanism? Something like... > > while (my $seq = $inx?>get_Seq_by_organism('*Xanthomonas*')) { > print $seq->display_id . "\n"; > } > > Thanks, > > j It should work via id_parser(); from Bio::Index::GenBank: $inx->id_parser(\&get_id); # make the index $inx->make_index($file_name); # here is where the retrieval key is specified sub get_id { my $line = shift; $line =~ /clone="(\S+)"/; $1; } Change the code ref deal with the line you want and parse the name out. Caveat: this may not be absolutely perfect (it only passes in a line at a time, and some species lines will wrap). Also not sure how this would work in cases where multiple sequences from the same species are present. The other option is to preparse everything and tie a hash to store a species->UID map, then use that along with your Bio::Index index to grab what you need. chris From cjfields at illinois.edu Mon Nov 9 23:58:32 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 9 Nov 2009 22:58:32 -0600 Subject: [Bioperl-l] how to get the protein sequences from DNA sequences around novel SNPs? In-Reply-To: References: <794eafc20911091008g1f98b944ncbd66ac4962a85a3@mail.gmail.com> Message-ID: <435BA1A8-2CCB-4D7A-8909-84F8135C439F@illinois.edu> On Nov 9, 2009, at 3:15 PM, Robert Bradbury wrote: > On Mon, Nov 9, 2009 at 1:08 PM, Guangchun Song > wrote: >> >> I'm new bioperl user. I' working on a project: To determine the >> status of all tutative SNPs such as non-synonymous vs. synonymous, >> and >> predict the tranlational effect of non-synonymous mutations as benign >> or malicious. I'm trying to use bioperl to get the DNA sequence and >> translate to protein sequence for the SNPs that are in gene's coding >> region. Could someone tell me how to do it? >> >> > I too would like to know if this information is available. I've > recently > been working with the dbSNP results from NCBI but they display the > results > in a graphical format rather than data that one can play with and ask > questions of like "What is the most disease causing gene in the Human > Genome?" or "What are the critical proteins damaged by gene defects > in the > Human Genome?" ... "In terms of premature deaths, extended health care > requirements, loss of quality of life, etc.?" > > The same types of questions can be applied to the dog and cat > genomes where > there is emotional value or the cow, horse, pig, etc. genomes where > there is > economic value? > > The value of BioPerl would increase significantly if there were > functionality that would allow easy access to "these mutations may > have > negative/positive impact" (which means you need a function that > qualifies > mutations by degree) and allow for impact to be subjectively > determined > (implying there must be some callback function to provide a user > quality/impact rating). > > For example: > $/@differences = protein_compare($mygene, $refseq_gene, > @critical_aa, > @critical_domain, $callback) > Where $callback could "rate" differences about the protein and > position and > the "type of interest" (e.g. metal binding amino acids, structural > changing > amino acids, critical catalysis amino acids, etc.). > > A default callback would be based on some evolving definition of > "critical" > changes which result in human disease for example. > > This is a "required" capability to be able to determine things like > the > "adaptability" of a species -- those with fewest critical mutation > points > may have better adaptability to mutation increasing circumstances. > > Please pardon any errors in perl syntax/usage its been a while since > I've > written perl and I'd really rather be coding in C. > > Robert I will say that most of the information from the SNP database is available in various formats (see following link under 'Retrieval Types'): http://www.ncbi.nlm.nih.gov/corehtml/query/static/efetchseq_help.html You can access this information, as well as the full XML, using something like the following script. chris ------------------------------------------------ #!/usr/bin/perl -w use 5.010; use strict; use warnings; use Bio::DB::EUtilities; my $term = shift; my $eutil = Bio::DB::EUtilities->new(-eutil => 'esearch', -db => 'snp', -term => $term, -usehistory => 'y', -retmax => 100); my $hist = $eutil->next_History || die "No history returned"; # for SNP XML, change retmode to 'xml' $eutil->set_parameters(-eutil => 'efetch', -history => $hist, -retmode => 'text', -rettype => 'flt'); # dumps to STDOUT say $eutil->get_Response->content; From jluis.lavin at unavarra.es Tue Nov 10 05:43:40 2009 From: jluis.lavin at unavarra.es (jluis.lavin at unavarra.es) Date: Tue, 10 Nov 2009 11:43:40 +0100 (CET) Subject: [Bioperl-l] [Fwd: Re: A question about iBio::Index: and itscorrect use] In-Reply-To: References: <2642.130.206.164.153.1257435996.squirrel@webmail.unavarra.es><1 984ED07F36C446284B25F617964B6C6@NewLife><2969.130.206.164.153.1257438117.squirrel@webmail.unavarra.es> Message-ID: <3471.130.206.164.153.1257849820.squirrel@webmail.unavarra.es> Hello again, I tried what Mark told me modifying the code line he told me but there?s still a problem that I believe must be due to the sequences name. My secuences header on the Fasta file have this format: >PleosPC9_1_103820|fgenesh1_pg.3_#_1 Th part on the right of the pipe changes depending on the program used to create the gene model, for example: >PleosPC9_1_103820|fgenesh1_pg.3_#_1 >PleosPC9_1_123413|genemark.2731_g >PleosPC9_1_52065|e_gw1.3.64.1 So I guess I need to parse my ids somehow for thr program to detect only the first part of the fasta header (the "protein name") and not to get messed with the other side of the pipe... This is the corrected code I wrote following Mark?s indications, but I still don?t have any idea about the parsing issue... #!/c:/Perl -w use Bio::Index::Fasta; use strict; #PC9.fasta is my genomic file my $Index_File_Name ="PC9.fasta"; my $inx = Bio::Index::Fasta->new('PC9.fasta.idx'); #LCS.txt is my sequences list @ARGV = ; foreach my $id (@ARGV) { if ($id eq ''){ die ("empty list") } else { my $seqobj = $inx->fetch($id); my $out = new Bio::SeqIO (-file => ">>index_extracted.fasta", -format => 'fasta'); $out->write_seq($seqobj); } } exit; } Thanks in advance PD. May it be a faster way of extracting those sequences using plain PERL? El Jue, 5 de Noviembre de 2009, 17:39, Mark A. Jensen escribi?: > Yes, these are files created by the SDBM, Perl's internal db manager. You > should > be able to > open the index by simply > $inx = Bio::Index::Fasta->new('PC9.fasta.idx'); > and the dbm will know what to do-- > cheers MAJ > ----- Original Message ----- > From: > To: "Mark A. Jensen" > Cc: ; > Sent: Thursday, November 05, 2009 11:21 AM > Subject: Re: [Bioperl-l] [Fwd: Re: A question about iBio::Index: and its > correct > use] > > >> Thank you very much Mark, that?s a good point :$ >> I guess your correction is referred to the second script, isn?t it? >> >> If it is so, there is still a problem with the first script, it doesn?t >> create the PC9.fasta.idx file, instead it creates two files named: >> -PC9.fasta.idx.pag >> -PC9.fasta.idx.dir >> >> which seem to be clearly related with some kind of indexing >> process...but, >> unless the PC9.fasta.idx file is only virtual or remains hidden, I can?t >> find it anywhere... >> Forgive me if I?m talking nosense... >> >> Thank you very much again for your help ;) >> >> >> El Jue, 5 de Noviembre de 2009, 17:02, Mark A. Jensen escribi?: >>> Hey Jos?, >>> The first thing that jumps out it the index file name. Looks >>> like you create it as >>> PC9.fasta.idx >>> But you read it as >>> PC9.fasta >>> Not an unusual mistake. Do >>> my $inx = Bio::Index::Fasta->new('PC9.fasta.idx'); >>> and see if it works. >>> MAJ >>> ----- Original Message ----- >>> From: >>> To: >>> Sent: Thursday, November 05, 2009 10:46 AM >>> Subject: [Bioperl-l] [Fwd: Re: A question about iBio::Index: and its >>> correct >>> use] >>> >>> >>> >>> >>> ---------------------------- Mensaje original >>> ---------------------------- >>> Subject: Re: [Bioperl-l] A question about iBio::Index: and its correct >>> use >>> From: jluis.lavin at unavarra.es >>> Fecha: Jue, 5 de Noviembre de 2009, 16:46 >>> To: "Mark A. Jensen" >>> -------------------------------------------------------------------------- >>> >>> Hi Mark, >>> >>> I?ve actually got two scripts, the first one is to create the index and >>> the second one is to retrieve the sequence lis from the indexed file. >>> >>> 1)Here is the Index creation script: >>> >>> #!/c:/Perl -w >>> use strict; >>> use Bio::Index::Fasta; >>> use strict; >>> >>> print "Enter file for indexing: \n"; >>> my $Index_File_Name = ; >>> my $inx = Bio::Index::Fasta->new(-filename => $Index_File_Name.".idx", >>> -write_flag => 1); >>> $inx->make_index(my $File_Name); >>> >>> 2)And here is the sequence retrieval script: >>> >>> #!/c:/Perl -w >>> use Bio::Index::Fasta; >>> use strict; >>> #PC9.fasta is my genomic file >>> my $Index_File_Name ="PC9.fasta"; >>> my $inx = Bio::Index::Fasta->new($Index_File_Name); >>> #LCS.txt is my sequences list >>> @ARGV = ; >>> foreach my $id (@ARGV) { >>> if ($id eq ''){ >>> die ("empty list") >>> } >>> else { >>> my $seqobj = $inx->fetch($id); >>> my $out = new Bio::SeqIO (-file => ">>extracted_seqs.fasta", >>> -format => 'fasta'); >>> $out->write_seq($seqobj); >>> } >>> } >>> exit; >>> } >>> >>> I hope this code is not a total scum... >>> >>> Thanks in advance ;) >>> >>> >>> >>> El Jue, 5 de Noviembre de 2009, 16:39, Mark A. Jensen escribi?: >>>> Jos? -- It looks like this is a good solution to your problem. Please >>>> send >>>> you >>>> script so we can look at it- >>>> cheers Mark >>>> ----- Original Message ----- >>>> From: >>>> To: >>>> Sent: Thursday, November 05, 2009 10:28 AM >>>> Subject: [Bioperl-l] A question about iBio::Index: and its correct use >>>> >>>> >>>> >>>> Hello to all, >>>> >>>> I?m trying to write a script to retrieve a list of sequences from a >>>> local >>>> FASTA file (for example a fasta archive where all the protein models >>>> of >>>> an >>>> organism are stored). This file would be used by me as some kind >>>> "local >>>> database" (sorry if I mistake a few concepts...) >>>> I?ve been reading the BioPerl HOWTOs and I came across the >>>> Bio::Index::Fasta tool. >>>> If I didn?t misunderstood what I read (which can be easy because my >>>> low >>>> level on programming) this Indexing tool should do the job. >>>> I wrote a couple of scripts based on the documentation i read about >>>> this >>>> tool, but I don?t seem to be able to create the index file to be used >>>> later (to retrieve the sequences from). >>>> -First of all, I want to ask the people in this forum if the >>>> Bio::Index::Fasta is the right one to chose for this tasks. >>>> -Then I?ll beg you to take a look at my scripts, because I don?t seem >>>> to >>>> catch the bug... >>>> >>>> Best wishes to you all and thanks in advance ;) >>>> >>>> -- >>>> Jos? Luis Lav?n Trueba, PhD >>>> >>>> Dpto. de Producci?n Agraria >>>> Grupo de Gen?tica y Microbiolog?a >>>> Universidad P?blica de Navarra >>>> 31006 Pamplona >>>> Navarra >>>> SPAIN >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> >>> >>> >>> -- >>> Dr. Jos? Luis Lav?n Trueba >>> >>> Dpto. de Producci?n Agraria >>> Grupo de Gen?tica y Microbiolog?a >>> Universidad P?blica de Navarra >>> 31006 Pamplona >>> Navarra >>> SPAIN >>> >>> >>> >>> -- >>> Dr. Jos? Luis Lav?n Trueba >>> >>> Dpto. de Producci?n Agraria >>> Grupo de Gen?tica y Microbiolog?a >>> Universidad P?blica de Navarra >>> 31006 Pamplona >>> Navarra >>> SPAIN >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >> >> >> -- >> Dr. Jos? Luis Lav?n Trueba >> >> Dpto. de Producci?n Agraria >> Grupo de Gen?tica y Microbiolog?a >> Universidad P?blica de Navarra >> 31006 Pamplona >> Navarra >> SPAIN >> >> >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Dr. Jos? Luis Lav?n Trueba Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN From saikari78 at gmail.com Tue Nov 10 06:41:11 2009 From: saikari78 at gmail.com (saikari keitele) Date: Tue, 10 Nov 2009 11:41:11 +0000 Subject: [Bioperl-l] [bioperl newbie] Retrieving link to protein from PubChem In-Reply-To: References: <1ECC543A-F923-4D5E-A0C1-5BBD35ECAAE8@illinois.edu> Message-ID: Thanks again very much for your help and the script. i've been trying it, however I fail to find any protein record linked to a record in the pcsubstance database. Do you think that its is because no links have been defined between the 2 databases, or that I am just unlucky and that no link exists for the particular records I'm testing? Thanks again saikari On Mon, Nov 9, 2009 at 4:41 PM, saikari keitele wrote: > Fabulous!. Huge help. > saikari > > On Mon, Nov 9, 2009 at 4:27 PM, Chris Fields wrote: > >> On Nov 9, 2009, at 10:05 AM, saikari keitele wrote: >> >> Hi, >>> >>> I'm using Bioperl to retrieve records from PubChem. >>> I'm trying to find a way-but have been unsuccessful- to retrieve from a >>> compound record, the reference to the protein(s) that can synthesize the >>> compound. >>> Thanks very much. >>> >>> saikari >>> >> >> The below bioperl script returns the GI for proteins that correspond to >> the substance passed on the command line; invoke using 'perl >> pc_substance.pl substance_requested'. It probably needs more fiddling to >> catch everything but it should get you started. >> >> For other bits and pieces (such as how to retrieve the raw sequence >> files), please see the EUtilities HOWTO: >> >> http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook >> >> chris >> >> ---------------------------------------- >> >> #!/usr/bin/perl -w >> >> use 5.010; >> use strict; >> use warnings; >> use Bio::DB::EUtilities; >> >> my $substance = shift; >> >> my $eutil = Bio::DB::EUtilities->new(-eutil => 'esearch', >> -db => 'pcsubstance', >> -term => $substance, >> -usehistory => 'y'); >> >> my $hist = $eutil->next_History || die; >> >> $eutil->reset_parameters(-eutil => 'elink', >> -history => $hist, >> -db => 'protein', >> -dbfrom => 'pcsubstance', >> -retmax => 1000); >> >> say join(',',$eutil->get_ids); >> > > From heyne at informatik.uni-freiburg.de Tue Nov 10 07:55:06 2009 From: heyne at informatik.uni-freiburg.de (Steffen Heyne) Date: Tue, 10 Nov 2009 13:55:06 +0100 Subject: [Bioperl-l] problem with alignments and sequence locations Message-ID: <4AF962AA.7060908@informatik.uni-freiburg.de> Hi, I'm using Bioperl for my research and it is very useful! Thank you! Currently I have a problem with locations tags of sequences. I read in seed alignments of Rfam (in stockholm format, but I think it is similar to other formats). If the location is like: AB194432.1/908-846 the start/end values are changed to $seq->start = 846 $seq->end = 908 and therefore the new location (e.g.$seq->get_nse) is: AB194432.1/846-908 The $seq->strand tag is correctly set to -1 in this case, but if the alignment is written out again (clustal, stockholm,...) this strand info is lost and the sequences have this "wrong" location. But this information is important in respect to the sequence accession number. Is there a way to set the location back to the original one or is this behavior desired? Any manually setting with $seq->start($val) failed due to automatic checking. I'm using bioperl 1.6.1 Thanks! steffen -- --- Steffen Heyne, Dipl.-Bioinf. Lehrstuhl f?r Bioinformatik Institut f?r Informatik Albert-Ludwigs-Universit?t Freiburg Georges-K?hler-Allee 106 79110 Freiburg, Germany Tel: (+49) 761 203 8239 Fax: (+49) 761 203 7462 Mail: heyne at informatik.uni-freiburg.de From cjfields at illinois.edu Tue Nov 10 08:58:52 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 10 Nov 2009 07:58:52 -0600 Subject: [Bioperl-l] problem with alignments and sequence locations In-Reply-To: <4AF962AA.7060908@informatik.uni-freiburg.de> References: <4AF962AA.7060908@informatik.uni-freiburg.de> Message-ID: On Nov 10, 2009, at 6:55 AM, Steffen Heyne wrote: > Hi, > > I'm using Bioperl for my research and it is very useful! Thank you! > > Currently I have a problem with locations tags of sequences. I read > in seed alignments of Rfam (in stockholm format, but I think it is > similar to other formats). > > If the location is like: > > AB194432.1/908-846 > > the start/end values are changed to > > $seq->start = 846 > $seq->end = 908 > > and therefore the new location (e.g.$seq->get_nse) is: > > AB194432.1/846-908 > > The $seq->strand tag is correctly set to -1 in this case, but if the > alignment is written out again (clustal, stockholm,...) this strand > info is lost and the sequences have this "wrong" location. But this > information is important in respect to the sequence accession number. > > Is there a way to set the location back to the original one or is > this behavior desired? Any manually setting with $seq->start($val) > failed due to automatic checking. > > I'm using bioperl 1.6.1 > > Thanks! > > steffen This is a definite bug. We recently discussed amending the NSE format due to this (the subject came up over the last few months or so); it's fallen through the cracks. Fortunaely it is very easy to fix (the relevant method is in LocatableSeq). Does anyone have a problem with me adding this in? It will change output for only those instances where the strand is -1, so AB194432.1/908-846 would be start = 846, end = 908, strand = -1 AB194432.1/846-908 would be start = 846, end = 908, strand = 1 chris From cjfields at illinois.edu Tue Nov 10 09:05:51 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 10 Nov 2009 08:05:51 -0600 Subject: [Bioperl-l] [bioperl newbie] Retrieving link to protein from PubChem In-Reply-To: References: <1ECC543A-F923-4D5E-A0C1-5BBD35ECAAE8@illinois.edu> Message-ID: <738F6320-B87A-4541-B9FA-20273ABA96B9@illinois.edu> On Nov 10, 2009, at 5:41 AM, saikari keitele wrote: > Thanks again very much for your help and the script. > i've been trying it, however I fail to find any protein record > linked to a > record in the pcsubstance database. > Do you think that its is because no links have been defined between > the 2 > databases, or that I am just unlucky and that no link exists for the > particular records I'm testing? > Thanks again > > saikari It's probably that no links have been defined. I have found similar problems in the past with pubchem, in that not all substances have proteins associated with them. Most proteins linked to are those with a deposited structure. There are a few other databases to check out; KEGG, the BioCyc dbs (like EcoCyc), come to mind. I don't think we have a generic remote query engine set up for any of those unfortunately (unless there is one I'm unaware of), but I know BioCyc comes with it's own set of tools (including perl- and java-based query tools) and can be set up locally, which is likely much faster and more in lines with what you need. chris ... From vebaev at gmail.com Tue Nov 10 12:38:54 2009 From: vebaev at gmail.com (Vesselin Baev) Date: Tue, 10 Nov 2009 09:38:54 -0800 (PST) Subject: [Bioperl-l] Invitation to connect on LinkedIn Message-ID: <1983273212.597925.1257874734811.JavaMail.app@ech3-cdn07.prod> LinkedIn ------------ Vesselin Baev requested to add you as a connection on LinkedIn: ------------------------------------------ Bolotin,, I'd like to add you to my professional network on LinkedIn. - Vesselin Accept invitation from Vesselin Baev http://www.linkedin.com/e/uez6TYkHzbaXxXM-lUk23auFwJZodcPlXc2UWC0Ao8h/blk/I1572789477_2/pmpxnSRJrSdvj4R5fnhv9ClRsDgZp6lQs6lzoQ5AomZIpn8_cBYTdPgVe3sOdPkNiiZFlAN1oPlOp2YMdPsTcz8OdjwLrCBxbOYWrSlI/EML_comm_afe/ View invitation from Vesselin Baev http://www.linkedin.com/e/uez6TYkHzbaXxXM-lUk23auFwJZodcPlXc2UWC0Ao8h/blk/I1572789477_2/39vdPsQejwTczsRckALqnpPbOYWrSlI/svi/ ------------------------------------------ DID YOU KNOW your LinkedIn profile helps you control your public image when people search for you? Setting your profile as public means your LinkedIn profile will come up when people enter your name in leading search engines. Take control of your image! http://www.linkedin.com/e/ewp/inv-22/ ------ (c) 2009, LinkedIn Corporation From jason at bioperl.org Tue Nov 10 13:47:02 2009 From: jason at bioperl.org (Jason Stajich) Date: Tue, 10 Nov 2009 10:47:02 -0800 Subject: [Bioperl-l] [Fwd: Re: A question about iBio::Index: and itscorrect use] In-Reply-To: <3471.130.206.164.153.1257849820.squirrel@webmail.unavarra.es> References: <2642.130.206.164.153.1257435996.squirrel@webmail.unavarra.es><1 984ED07F36C446284B25F617964B6C6@NewLife><2969.130.206.164.153.1257438117.squirrel@webmail.unavarra.es> <3471.130.206.164.153.1257849820.squirrel@webmail.unavarra.es> Message-ID: Page 44 has the custom ID info or look at documentation for Bio::DB::Fasta - there is a similar syntax for Bio::Index::Fasta if you read the perldoc for the module. http://jason.open-bio.org/Bioperl_Tutorials/ProgrammingBiology2008/ProgBiology_BioPerl_I.pdf Don't re-opening SeqIO each time just do it once at the beginning outside of the loop and then call write_seq within the loop. This is one nuance of doing OO programming vs procedural is that there is some outside state information that can persist in an object, but conceptually, you want to open a filehandle once and just keep writing to it. -jason On Nov 10, 2009, at 2:43 AM, jluis.lavin at unavarra.es wrote: > Hello again, > > I tried what Mark told me modifying the code line he told me but > there?s > still a problem that I believe must be due to the sequences name. > My secuences header on the Fasta file have this format: > >> PleosPC9_1_103820|fgenesh1_pg.3_#_1 > > Th part on the right of the pipe changes depending on the program > used to > create the gene model, for example: > >> PleosPC9_1_103820|fgenesh1_pg.3_#_1 >> PleosPC9_1_123413|genemark.2731_g >> PleosPC9_1_52065|e_gw1.3.64.1 > > So I guess I need to parse my ids somehow for thr program to detect > only > the first part of the fasta header (the "protein name") and not to get > messed with the other side of the pipe... > > This is the corrected code I wrote following Mark?s indications, but I > still don?t have any idea about the parsing issue... > > #!/c:/Perl -w > use Bio::Index::Fasta; > use strict; > #PC9.fasta is my genomic file > my $Index_File_Name ="PC9.fasta"; > my $inx = Bio::Index::Fasta->new('PC9.fasta.idx'); > #LCS.txt is my sequences list > @ARGV = ; > foreach my $id (@ARGV) { > if ($id eq ''){ > die ("empty list") > } > else { > my $seqobj = $inx->fetch($id); > my $out = new Bio::SeqIO (-file => ">>index_extracted.fasta", > -format => 'fasta'); > $out->write_seq($seqobj); > } > } > exit; > } > > Thanks in advance > > PD. May it be a faster way of extracting those sequences using plain > PERL? > > > > > El Jue, 5 de Noviembre de 2009, 17:39, Mark A. Jensen escribi?: >> Yes, these are files created by the SDBM, Perl's internal db >> manager. You >> should >> be able to >> open the index by simply >> $inx = Bio::Index::Fasta->new('PC9.fasta.idx'); >> and the dbm will know what to do-- >> cheers MAJ >> ----- Original Message ----- >> From: >> To: "Mark A. Jensen" >> Cc: ; >> Sent: Thursday, November 05, 2009 11:21 AM >> Subject: Re: [Bioperl-l] [Fwd: Re: A question about iBio::Index: >> and its >> correct >> use] >> >> >>> Thank you very much Mark, that?s a good point :$ >>> I guess your correction is referred to the second script, isn?t it? >>> >>> If it is so, there is still a problem with the first script, it >>> doesn?t >>> create the PC9.fasta.idx file, instead it creates two files named: >>> -PC9.fasta.idx.pag >>> -PC9.fasta.idx.dir >>> >>> which seem to be clearly related with some kind of indexing >>> process...but, >>> unless the PC9.fasta.idx file is only virtual or remains hidden, I >>> can?t >>> find it anywhere... >>> Forgive me if I?m talking nosense... >>> >>> Thank you very much again for your help ;) >>> >>> >>> El Jue, 5 de Noviembre de 2009, 17:02, Mark A. Jensen escribi?: >>>> Hey Jos?, >>>> The first thing that jumps out it the index file name. Looks >>>> like you create it as >>>> PC9.fasta.idx >>>> But you read it as >>>> PC9.fasta >>>> Not an unusual mistake. Do >>>> my $inx = Bio::Index::Fasta->new('PC9.fasta.idx'); >>>> and see if it works. >>>> MAJ >>>> ----- Original Message ----- >>>> From: >>>> To: >>>> Sent: Thursday, November 05, 2009 10:46 AM >>>> Subject: [Bioperl-l] [Fwd: Re: A question about iBio::Index: and >>>> its >>>> correct >>>> use] >>>> >>>> >>>> >>>> >>>> ---------------------------- Mensaje original >>>> ---------------------------- >>>> Subject: Re: [Bioperl-l] A question about iBio::Index: and its >>>> correct >>>> use >>>> From: jluis.lavin at unavarra.es >>>> Fecha: Jue, 5 de Noviembre de 2009, 16:46 >>>> To: "Mark A. Jensen" >>>> -------------------------------------------------------------------------- >>>> >>>> Hi Mark, >>>> >>>> I?ve actually got two scripts, the first one is to create the >>>> index and >>>> the second one is to retrieve the sequence lis from the indexed >>>> file. >>>> >>>> 1)Here is the Index creation script: >>>> >>>> #!/c:/Perl -w >>>> use strict; >>>> use Bio::Index::Fasta; >>>> use strict; >>>> >>>> print "Enter file for indexing: \n"; >>>> my $Index_File_Name = ; >>>> my $inx = Bio::Index::Fasta->new(-filename => >>>> $Index_File_Name.".idx", >>>> -write_flag => 1); >>>> $inx->make_index(my $File_Name); >>>> >>>> 2)And here is the sequence retrieval script: >>>> >>>> #!/c:/Perl -w >>>> use Bio::Index::Fasta; >>>> use strict; >>>> #PC9.fasta is my genomic file >>>> my $Index_File_Name ="PC9.fasta"; >>>> my $inx = Bio::Index::Fasta->new($Index_File_Name); >>>> #LCS.txt is my sequences list >>>> @ARGV = ; >>>> foreach my $id (@ARGV) { >>>> if ($id eq ''){ >>>> die ("empty list") >>>> } >>>> else { >>>> my $seqobj = $inx->fetch($id); >>>> my $out = new Bio::SeqIO (-file => ">>extracted_seqs.fasta", >>>> -format => 'fasta'); >>>> $out->write_seq($seqobj); >>>> } >>>> } >>>> exit; >>>> } >>>> >>>> I hope this code is not a total scum... >>>> >>>> Thanks in advance ;) >>>> >>>> >>>> >>>> El Jue, 5 de Noviembre de 2009, 16:39, Mark A. Jensen escribi?: >>>>> Jos? -- It looks like this is a good solution to your problem. >>>>> Please >>>>> send >>>>> you >>>>> script so we can look at it- >>>>> cheers Mark >>>>> ----- Original Message ----- >>>>> From: >>>>> To: >>>>> Sent: Thursday, November 05, 2009 10:28 AM >>>>> Subject: [Bioperl-l] A question about iBio::Index: and its >>>>> correct use >>>>> >>>>> >>>>> >>>>> Hello to all, >>>>> >>>>> I?m trying to write a script to retrieve a list of sequences >>>>> from a >>>>> local >>>>> FASTA file (for example a fasta archive where all the protein >>>>> models >>>>> of >>>>> an >>>>> organism are stored). This file would be used by me as some kind >>>>> "local >>>>> database" (sorry if I mistake a few concepts...) >>>>> I?ve been reading the BioPerl HOWTOs and I came across the >>>>> Bio::Index::Fasta tool. >>>>> If I didn?t misunderstood what I read (which can be easy because >>>>> my >>>>> low >>>>> level on programming) this Indexing tool should do the job. >>>>> I wrote a couple of scripts based on the documentation i read >>>>> about >>>>> this >>>>> tool, but I don?t seem to be able to create the index file to be >>>>> used >>>>> later (to retrieve the sequences from). >>>>> -First of all, I want to ask the people in this forum if the >>>>> Bio::Index::Fasta is the right one to chose for this tasks. >>>>> -Then I?ll beg you to take a look at my scripts, because I don?t >>>>> seem >>>>> to >>>>> catch the bug... >>>>> >>>>> Best wishes to you all and thanks in advance ;) >>>>> >>>>> -- >>>>> Jos? Luis Lav?n Trueba, PhD >>>>> >>>>> Dpto. de Producci?n Agraria >>>>> Grupo de Gen?tica y Microbiolog?a >>>>> Universidad P?blica de Navarra >>>>> 31006 Pamplona >>>>> Navarra >>>>> SPAIN >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Dr. Jos? Luis Lav?n Trueba >>>> >>>> Dpto. de Producci?n Agraria >>>> Grupo de Gen?tica y Microbiolog?a >>>> Universidad P?blica de Navarra >>>> 31006 Pamplona >>>> Navarra >>>> SPAIN >>>> >>>> >>>> >>>> -- >>>> Dr. Jos? Luis Lav?n Trueba >>>> >>>> Dpto. de Producci?n Agraria >>>> Grupo de Gen?tica y Microbiolog?a >>>> Universidad P?blica de Navarra >>>> 31006 Pamplona >>>> Navarra >>>> SPAIN >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> >>> >>> >>> -- >>> Dr. Jos? Luis Lav?n Trueba >>> >>> Dpto. de Producci?n Agraria >>> Grupo de Gen?tica y Microbiolog?a >>> Universidad P?blica de Navarra >>> 31006 Pamplona >>> Navarra >>> SPAIN >>> >>> >>> >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > -- > Dr. Jos? Luis Lav?n Trueba > > Dpto. de Producci?n Agraria > Grupo de Gen?tica y Microbiolog?a > Universidad P?blica de Navarra > 31006 Pamplona > Navarra > SPAIN > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From jason at bioperl.org Tue Nov 10 13:50:00 2009 From: jason at bioperl.org (Jason Stajich) Date: Tue, 10 Nov 2009 10:50:00 -0800 Subject: [Bioperl-l] Bio::Index::GenBank - by organism? In-Reply-To: <12DFD22E-42DC-4626-9873-0DE3EBB5CFBD@illinois.edu> References: <3B01A09C-198E-4691-B807-7ED3250BB81A@jays.net> <12DFD22E-42DC-4626-9873-0DE3EBB5CFBD@illinois.edu> Message-ID: <2BA451B1-6E18-483E-B655-74D1146772CC@bioperl.org> You might also look at what mygenbank does: http://homepage.mac.com/iankorf/mygenbank.html On Nov 9, 2009, at 7:55 PM, Chris Fields wrote: > On Nov 9, 2009, at 6:05 PM, Jay Hannah wrote: > >> Many thanks to Ewan Birney et. al. for Bio::Index::* >> >> I can throw away my awful grep based index-by-accession stuff. :) >> >> Any chance someone has also written an organism based index >> mechanism? Something like... >> >> while (my $seq = $inx?>get_Seq_by_organism('*Xanthomonas*')) { >> print $seq->display_id . "\n"; >> } >> >> Thanks, >> >> j > > It should work via id_parser(); from Bio::Index::GenBank: > > $inx->id_parser(\&get_id); > # make the index > $inx->make_index($file_name); > > # here is where the retrieval key is specified > sub get_id { > my $line = shift; > $line =~ /clone="(\S+)"/; > $1; > } > > Change the code ref deal with the line you want and parse the name > out. Caveat: this may not be absolutely perfect (it only passes in > a line at a time, and some species lines will wrap). Also not sure > how this would work in cases where multiple sequences from the same > species are present. > > The other option is to preparse everything and tie a hash to store a > species->UID map, then use that along with your Bio::Index index to > grab what you need. > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From jluis.lavin at unavarra.es Wed Nov 11 10:01:18 2009 From: jluis.lavin at unavarra.es (jluis.lavin at unavarra.es) Date: Wed, 11 Nov 2009 16:01:18 +0100 (CET) Subject: [Bioperl-l] [Fwd: Re: A question about iBio::Index: anditscorrect use] In-Reply-To: References: <2642.130.206.164.153.1257435996.squirrel@webmail.unavarra.es><1 984ED07F36C446284B25F617964B6C6@NewLife><2969.130.206.164.153.1257438117.sq uirrel@webmail.unavarra.es><3471. 130.206.164.153.1257849820.squirrel@webmail.unavarra.es> Message-ID: <2979.130.206.164.153.1257951678.squirrel@webmail.unavarra.es> Hi once again, I have modified the script following the instructions Jason gave me (at last what I understood, remember it is my first time trying to learn a programming language...and I?m not the smartest guy in the class, hehe)but it seems I didn?t fix the problem... Here?s the new code I wrote: #!/c:/Perl -w use strict; use Bio::Index::Fasta; use Bio::DB::Fasta; use Bio::SeqIO; use IO::File; # assign files to scalars my $index_file = 'PC91.fasta'; my $id_list = 'LCS2.txt'; # open index file my $db = Bio::DB::Fasta->new($index_file) or die; # open the id list my $in = IO::File->new($id_list) or die; # open FASTA to write my $out = new Bio::SeqIO (-file => ">>index_extracted.fasta", -format => 'fasta'); # retrieve ids loop foreach my $id ($in) { if ($id eq ''){ die ("empty list") } else { my $seqobj = my $inx->fetch($id); $out->write_seq($seqobj); } } # parse fasta headers sub my_makeid { my $id = shift; if ( $id =~ /^>[^:]+:(\S+)/ ) { return $1; } elsif ($id =~ /^>(\S+)/) { return $1; } else { warn("cannot parse ID for $id\n"); } } exit; Would anyone, please take a look at it ... Thanks in advance ;) El Mar, 10 de Noviembre de 2009, 19:47, Jason Stajich escribi?: > Page 44 has the custom ID info or look at documentation for > Bio::DB::Fasta - there is a similar syntax for Bio::Index::Fasta if > you read the perldoc for the module. > > http://jason.open-bio.org/Bioperl_Tutorials/ProgrammingBiology2008/ProgBiology_BioPerl_I.pdf > > Don't re-opening SeqIO each time just do it once at the beginning > outside of the loop and then call write_seq within the loop. > > This is one nuance of doing OO programming vs procedural is that there > is some outside state information that can persist in an object, but > conceptually, you want to open a filehandle once and just keep writing > to it. > > -jason > On Nov 10, 2009, at 2:43 AM, jluis.lavin at unavarra.es wrote: > >> Hello again, >> >> I tried what Mark told me modifying the code line he told me but >> there?s >> still a problem that I believe must be due to the sequences name. >> My secuences header on the Fasta file have this format: >> >>> PleosPC9_1_103820|fgenesh1_pg.3_#_1 >> >> Th part on the right of the pipe changes depending on the program >> used to >> create the gene model, for example: >> >>> PleosPC9_1_103820|fgenesh1_pg.3_#_1 >>> PleosPC9_1_123413|genemark.2731_g >>> PleosPC9_1_52065|e_gw1.3.64.1 >> >> So I guess I need to parse my ids somehow for thr program to detect >> only >> the first part of the fasta header (the "protein name") and not to get >> messed with the other side of the pipe... >> >> This is the corrected code I wrote following Mark?s indications, but I >> still don?t have any idea about the parsing issue... >> >> #!/c:/Perl -w >> use Bio::Index::Fasta; >> use strict; >> #PC9.fasta is my genomic file >> my $Index_File_Name ="PC9.fasta"; >> my $inx = Bio::Index::Fasta->new('PC9.fasta.idx'); >> #LCS.txt is my sequences list >> @ARGV = ; >> foreach my $id (@ARGV) { >> if ($id eq ''){ >> die ("empty list") >> } >> else { >> my $seqobj = $inx->fetch($id); >> my $out = new Bio::SeqIO (-file => ">>index_extracted.fasta", >> -format => 'fasta'); >> $out->write_seq($seqobj); >> } >> } >> exit; >> } >> >> Thanks in advance >> >> PD. May it be a faster way of extracting those sequences using plain >> PERL? >> >> >> >> >> El Jue, 5 de Noviembre de 2009, 17:39, Mark A. Jensen escribi?: >>> Yes, these are files created by the SDBM, Perl's internal db >>> manager. You >>> should >>> be able to >>> open the index by simply >>> $inx = Bio::Index::Fasta->new('PC9.fasta.idx'); >>> and the dbm will know what to do-- >>> cheers MAJ >>> ----- Original Message ----- >>> From: >>> To: "Mark A. Jensen" >>> Cc: ; >>> Sent: Thursday, November 05, 2009 11:21 AM >>> Subject: Re: [Bioperl-l] [Fwd: Re: A question about iBio::Index: >>> and its >>> correct >>> use] >>> >>> >>>> Thank you very much Mark, that?s a good point :$ >>>> I guess your correction is referred to the second script, isn?t it? >>>> >>>> If it is so, there is still a problem with the first script, it >>>> doesn?t >>>> create the PC9.fasta.idx file, instead it creates two files named: >>>> -PC9.fasta.idx.pag >>>> -PC9.fasta.idx.dir >>>> >>>> which seem to be clearly related with some kind of indexing >>>> process...but, >>>> unless the PC9.fasta.idx file is only virtual or remains hidden, I >>>> can?t >>>> find it anywhere... >>>> Forgive me if I?m talking nosense... >>>> >>>> Thank you very much again for your help ;) >>>> >>>> >>>> El Jue, 5 de Noviembre de 2009, 17:02, Mark A. Jensen escribi?: >>>>> Hey Jos?, >>>>> The first thing that jumps out it the index file name. Looks >>>>> like you create it as >>>>> PC9.fasta.idx >>>>> But you read it as >>>>> PC9.fasta >>>>> Not an unusual mistake. Do >>>>> my $inx = Bio::Index::Fasta->new('PC9.fasta.idx'); >>>>> and see if it works. >>>>> MAJ >>>>> ----- Original Message ----- >>>>> From: >>>>> To: >>>>> Sent: Thursday, November 05, 2009 10:46 AM >>>>> Subject: [Bioperl-l] [Fwd: Re: A question about iBio::Index: and >>>>> its >>>>> correct >>>>> use] >>>>> >>>>> >>>>> >>>>> >>>>> ---------------------------- Mensaje original >>>>> ---------------------------- >>>>> Subject: Re: [Bioperl-l] A question about iBio::Index: and its >>>>> correct >>>>> use >>>>> From: jluis.lavin at unavarra.es >>>>> Fecha: Jue, 5 de Noviembre de 2009, 16:46 >>>>> To: "Mark A. Jensen" >>>>> -------------------------------------------------------------------------- >>>>> >>>>> Hi Mark, >>>>> >>>>> I?ve actually got two scripts, the first one is to create the >>>>> index and >>>>> the second one is to retrieve the sequence lis from the indexed >>>>> file. >>>>> >>>>> 1)Here is the Index creation script: >>>>> >>>>> #!/c:/Perl -w >>>>> use strict; >>>>> use Bio::Index::Fasta; >>>>> use strict; >>>>> >>>>> print "Enter file for indexing: \n"; >>>>> my $Index_File_Name = ; >>>>> my $inx = Bio::Index::Fasta->new(-filename => >>>>> $Index_File_Name.".idx", >>>>> -write_flag => 1); >>>>> $inx->make_index(my $File_Name); >>>>> >>>>> 2)And here is the sequence retrieval script: >>>>> >>>>> #!/c:/Perl -w >>>>> use Bio::Index::Fasta; >>>>> use strict; >>>>> #PC9.fasta is my genomic file >>>>> my $Index_File_Name ="PC9.fasta"; >>>>> my $inx = Bio::Index::Fasta->new($Index_File_Name); >>>>> #LCS.txt is my sequences list >>>>> @ARGV = ; >>>>> foreach my $id (@ARGV) { >>>>> if ($id eq ''){ >>>>> die ("empty list") >>>>> } >>>>> else { >>>>> my $seqobj = $inx->fetch($id); >>>>> my $out = new Bio::SeqIO (-file => ">>extracted_seqs.fasta", >>>>> -format => 'fasta'); >>>>> $out->write_seq($seqobj); >>>>> } >>>>> } >>>>> exit; >>>>> } >>>>> >>>>> I hope this code is not a total scum... >>>>> >>>>> Thanks in advance ;) >>>>> >>>>> >>>>> >>>>> El Jue, 5 de Noviembre de 2009, 16:39, Mark A. Jensen escribi?: >>>>>> Jos? -- It looks like this is a good solution to your problem. >>>>>> Please >>>>>> send >>>>>> you >>>>>> script so we can look at it- >>>>>> cheers Mark >>>>>> ----- Original Message ----- >>>>>> From: >>>>>> To: >>>>>> Sent: Thursday, November 05, 2009 10:28 AM >>>>>> Subject: [Bioperl-l] A question about iBio::Index: and its >>>>>> correct use >>>>>> >>>>>> >>>>>> >>>>>> Hello to all, >>>>>> >>>>>> I?m trying to write a script to retrieve a list of sequences >>>>>> from a >>>>>> local >>>>>> FASTA file (for example a fasta archive where all the protein >>>>>> models >>>>>> of >>>>>> an >>>>>> organism are stored). This file would be used by me as some kind >>>>>> "local >>>>>> database" (sorry if I mistake a few concepts...) >>>>>> I?ve been reading the BioPerl HOWTOs and I came across the >>>>>> Bio::Index::Fasta tool. >>>>>> If I didn?t misunderstood what I read (which can be easy because >>>>>> my >>>>>> low >>>>>> level on programming) this Indexing tool should do the job. >>>>>> I wrote a couple of scripts based on the documentation i read >>>>>> about >>>>>> this >>>>>> tool, but I don?t seem to be able to create the index file to be >>>>>> used >>>>>> later (to retrieve the sequences from). >>>>>> -First of all, I want to ask the people in this forum if the >>>>>> Bio::Index::Fasta is the right one to chose for this tasks. >>>>>> -Then I?ll beg you to take a look at my scripts, because I don?t >>>>>> seem >>>>>> to >>>>>> catch the bug... >>>>>> >>>>>> Best wishes to you all and thanks in advance ;) >>>>>> >>>>>> -- >>>>>> Jos? Luis Lav?n Trueba, PhD >>>>>> >>>>>> Dpto. de Producci?n Agraria >>>>>> Grupo de Gen?tica y Microbiolog?a >>>>>> Universidad P?blica de Navarra >>>>>> 31006 Pamplona >>>>>> Navarra >>>>>> SPAIN >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Dr. Jos? Luis Lav?n Trueba >>>>> >>>>> Dpto. de Producci?n Agraria >>>>> Grupo de Gen?tica y Microbiolog?a >>>>> Universidad P?blica de Navarra >>>>> 31006 Pamplona >>>>> Navarra >>>>> SPAIN >>>>> >>>>> >>>>> >>>>> -- >>>>> Dr. Jos? Luis Lav?n Trueba >>>>> >>>>> Dpto. de Producci?n Agraria >>>>> Grupo de Gen?tica y Microbiolog?a >>>>> Universidad P?blica de Navarra >>>>> 31006 Pamplona >>>>> Navarra >>>>> SPAIN >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Dr. Jos? Luis Lav?n Trueba >>>> >>>> Dpto. de Producci?n Agraria >>>> Grupo de Gen?tica y Microbiolog?a >>>> Universidad P?blica de Navarra >>>> 31006 Pamplona >>>> Navarra >>>> SPAIN >>>> >>>> >>>> >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> >> -- >> Dr. Jos? Luis Lav?n Trueba >> >> Dpto. de Producci?n Agraria >> Grupo de Gen?tica y Microbiolog?a >> Universidad P?blica de Navarra >> 31006 Pamplona >> Navarra >> SPAIN >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Dr. Jos? Luis Lav?n Trueba Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN From maj at fortinbras.us Wed Nov 11 18:48:33 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 11 Nov 2009 18:48:33 -0500 Subject: [Bioperl-l] Maq assembly wrapper ready for beta testing Message-ID: <4057E5A862B845EA8BB153888075590C@NewLife> Hi All- New modules are available in the core and in bioperl-run for working with Heng Li's short read assembler "maq" (http://maq.sourceforge.net/maq-man.shtml). Bio::Tools::Run::Maq allows a quick assembly call with a canned a maq pipeline, and also allows individual maq commands to be called separately. It uses Bio::Assembly::IO::maq (a read-only module) to deliver a Bio::Assembly::Scaffold from maq output. If you're interested, see http://www.bioperl.org/wiki/HOWTO:Short-read_assemblies_with_maq and update your core and bioperl-run. The code inherits from Florent's excellent new Bio::Tools::Run::AssemblerBase -- kudos to him!! tests are in bioperl-run/trunk/t/Maq.t, see them for myriad examples send me the bugs MAJ From clarsen at vecna.com Thu Nov 12 12:22:26 2009 From: clarsen at vecna.com (Chris Larsen) Date: Thu, 12 Nov 2009 12:22:26 -0500 Subject: [Bioperl-l] Polyproteins, ribo slippage, and mat_peptide in viruses? In-Reply-To: <320fb6e00910271029m26f07564l727fb78adae81c11@mail.gmail.com> References: <320fb6e00910271029m26f07564l727fb78adae81c11@mail.gmail.com> Message-ID: <7BBAE077-4D76-46C2-BF66-363F5A017278@vecna.com> All, This is a short followup on the prior thread of discussion, regarding computing mature peptide sequences for viruses. The topic has gone underwater for the time being as we solve some problems with source data. While the biopython effort and contributors on this board have given good guidance, and we now have scripts that function (thanks mostly to pcock), however, the source data on which everything relies is suspect: mat_peptide 15118..16914 <=== /product="nsp13" /note="helicase" I can tell you the virus community does not want to rely heavily, on those position numbers. Furthermore we have found fewer compete source genomes for viruses than bacteria, more virus-to-virus variation in the data fields annotated in the GBK file, (Gene, CDS, ORF, Protein, Polyprotein, mat_peptide, db_xref) and in fact the community will have to come together significantly on how these molecules are defined in public repositories, before a mature scripting effort becomes reliable, public and well received. Because of the variation in viruses, it's not even clear at this point what a 'gene' is. I will let you know how we proceed when more sequence data has been fully analyzed, and we can think about making any perl based solution a new viral protein module. Thanks, Chris -- Christopher Larsen, Ph.D. Sr. Scientist / Grants Manager Vecna Technologies 6404 Ivy Lane #500 Greenbelt, MD 20770 Phone: (240) 965-4525 Fax: (240) 547-6133 240-737-4525 From David.Messina at sbc.su.se Thu Nov 12 14:20:54 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 12 Nov 2009 20:20:54 +0100 Subject: [Bioperl-l] highest PAML version supported? Message-ID: <628aabb70911121120w4c609056v50204b9bd9e5c3fb@mail.gmail.com> Hi everyone, What is the latest version of PAML (specifically codeml) that I can use with bioperl-live and bioperl-run? I looked around and couldn't find where (or if) this is documented. With PAML version 4.3a against the current trunk of both -live and -run I see this: ------------- EXCEPTION Bio::Root::NotImplemented ------------- MSG: Unknown format of PAML output did not see seqtype STACK Bio::Tools::Phylo::PAML::_parse_summary /Users/dave/src/bioperl-live/Bio/Tools/Phylo/PAML.pm:461 STACK Bio::Tools::Phylo::PAML::next_result /Users/dave/src/bioperl-live/Bio/Tools/Phylo/PAML.pm:270 STACK toplevel ../bin/cluster_kaks:251 --------------------------------------------------------------- ...which I suspect (but haven't confirmed) is due to a change in the file format. Dave From jason at bioperl.org Thu Nov 12 14:29:22 2009 From: jason at bioperl.org (Jason Stajich) Date: Thu, 12 Nov 2009 11:29:22 -0800 Subject: [Bioperl-l] highest PAML version supported? In-Reply-To: <628aabb70911121120w4c609056v50204b9bd9e5c3fb@mail.gmail.com> References: <628aabb70911121120w4c609056v50204b9bd9e5c3fb@mail.gmail.com> Message-ID: prolly 3.15 or so. it really needs a maintainer!!! On Nov 12, 2009, at 11:20 AM, Dave Messina wrote: > Hi everyone, > > What is the latest version of PAML (specifically codeml) that I can > use with > bioperl-live and bioperl-run? > > I looked around and couldn't find where (or if) this is documented. > > > With PAML version 4.3a against the current trunk of both -live and - > run I > see this: > ------------- EXCEPTION Bio::Root::NotImplemented ------------- > MSG: Unknown format of PAML output did not see seqtype > STACK Bio::Tools::Phylo::PAML::_parse_summary > /Users/dave/src/bioperl-live/Bio/Tools/Phylo/PAML.pm:461 > STACK Bio::Tools::Phylo::PAML::next_result > /Users/dave/src/bioperl-live/Bio/Tools/Phylo/PAML.pm:270 > STACK toplevel ../bin/cluster_kaks:251 > --------------------------------------------------------------- > > ...which I suspect (but haven't confirmed) is due to a change in the > file > format. > > > Dave > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From scott at scottcain.net Fri Nov 13 09:48:43 2009 From: scott at scottcain.net (Scott Cain) Date: Fri, 13 Nov 2009 09:48:43 -0500 Subject: [Bioperl-l] January GMOD meeting announcement Message-ID: <4536f7700911130648j40eb2d82g2594adaccf476d73@mail.gmail.com> Hello, I am pleased to announce that the January GMOD meeting will be taking place on January 14 and 15 in San Diego at the Best Western Seven Seas (the same location as last year). Please see this page for registration information: http://gmod.org/wiki/January_2010_GMOD_Meeting When you go to that page, please take a moment to add suggestions for the agenda. There is no registration fee for this meeting, however there is limited space, so please register early. The proprietors of the Best Western have given us an excellent room rate, and extended it to the previous week, so that people attending the GMOD meeting and the Plant and Animal Genome meeting before it may stay at the Best Western the entire time. Please direct follow up questions to the gmod-devel mailing list: https://lists.sourceforge.net/lists/listinfo/gmod-devel Thanks and I look forward to seeing you in San Diego! Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From j.inoue at ucl.ac.uk Sat Nov 14 14:20:29 2009 From: j.inoue at ucl.ac.uk (Jun Inoue) Date: Sat, 14 Nov 2009 19:20:29 +0000 Subject: [Bioperl-l] Bio::TreeIO, Root-tip branch lengths Message-ID: Dear All, I just started to learn BioPerl for phylogenetics. Usually I am using perl v5.10.0 on my Mac OS 10.5.8. I would like to ask you a hint to calculate the Branch lengths from root to tip for all species in NEWICK TREE format. Please see the following web site. I am explaining what I want to do and showing my easy script (not completed). http://www.geocities.jp/ancientfishtree/BioPerl_BLRootTip.html Thank you for your help. Best, Jun Inoue http://www.geocities.jp/ancientfishtree/index_eng.html From maj at fortinbras.us Sat Nov 14 16:47:37 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sat, 14 Nov 2009 16:47:37 -0500 Subject: [Bioperl-l] Bio::TreeIO, Root-tip branch lengths In-Reply-To: References: Message-ID: <3BC179984D5E49868C4F12D181D82B8D@NewLife> Hi Jun, Some hints: incorporate @leaves = $tree->get_leaf_nodes; and use Bio::Tree::TreeFunctionsI; $distance = $tree->distance( $node_a, $node_b ); cheers, Mark ----- Original Message ----- From: "Jun Inoue" To: Cc: "?? ?" Sent: Saturday, November 14, 2009 2:20 PM Subject: [Bioperl-l] Bio::TreeIO, Root-tip branch lengths > Dear All, > > I just started to learn BioPerl for phylogenetics. > Usually I am using perl v5.10.0 on my Mac OS 10.5.8. > I would like to ask you a hint to calculate the Branch lengths > from root to tip for all species in NEWICK TREE format. > > Please see the following web site. > I am explaining what I want to do and > showing my easy script (not completed). > http://www.geocities.jp/ancientfishtree/BioPerl_BLRootTip.html > > Thank you for your help. > > Best, > Jun Inoue > http://www.geocities.jp/ancientfishtree/index_eng.html > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From jay at jays.net Sun Nov 15 20:23:38 2009 From: jay at jays.net (Jay Hannah) Date: Sun, 15 Nov 2009 19:23:38 -0600 Subject: [Bioperl-l] Bio::Index::GenBank - by organism? In-Reply-To: <12DFD22E-42DC-4626-9873-0DE3EBB5CFBD@illinois.edu> References: <3B01A09C-198E-4691-B807-7ED3250BB81A@jays.net> <12DFD22E-42DC-4626-9873-0DE3EBB5CFBD@illinois.edu> Message-ID: On Nov 9, 2009, at 9:55 PM, Chris Fields wrote: > It should work via id_parser(); from Bio::Index::GenBank: > > $inx->id_parser(\&get_id); > # make the index > $inx->make_index($file_name); > > # here is where the retrieval key is specified > sub get_id { > my $line = shift; > $line =~ /clone="(\S+)"/; > $1; > } This worked great for me today (tackling a different problem than the original). Thanks!! j From veronica.xiaoyu at gmail.com Fri Nov 13 15:35:48 2009 From: veronica.xiaoyu at gmail.com (Xiaoyu Liang) Date: Fri, 13 Nov 2009 15:35:48 -0500 Subject: [Bioperl-l] Bio::Graphics::Panel question Message-ID: Hi, I'm using Bio::Graphics to parse the blast result and generate images. But, sometimes, in the middle of the output image, the hit's color is white, eventhough I set it to other colors. I attached the picture here for an example. This doesn't occur all the time, usually, it works well. I'm wondering if I did something wrong? or depends on the blast result? Thank you, Xiaoyu -------------- next part -------------- A non-text attachment was scrubbed... Name: BLAST_problem.jpg Type: image/jpeg Size: 51888 bytes Desc: not available URL: From ryan_bogard at hms.harvard.edu Sun Nov 15 22:30:22 2009 From: ryan_bogard at hms.harvard.edu (rbogard) Date: Sun, 15 Nov 2009 19:30:22 -0800 (PST) Subject: [Bioperl-l] Problems with bioperl in Mac OS X 10.6 (Perl 5.10.0) Message-ID: <26366421.post@talk.nabble.com> In advance, any advice would be grealy appreciated! I have installed bioperl-588pm via fink but I am having difficulties calling the modules in script. The following is added to .profile (bash): PERL5LIB=/sw/lib/perl5/5.8.8:$PERL5LIB If I change this to /sw/lib/perl5 then I get an @INC error, as use Bio::PERL cannot be located. The environment variables are as follows: MANPATH=/sw/share/man:/usr/share/man:/usr/X11/man:/sw/lib/perl5/5.10.0/man:/usr/X11R6/man:/sw/lib/perl5-core/5.8.8/man:/sw/lib/perl5/5.8.8/man PERL5LIB=/sw/lib/perl5/5.8.8:/sw/lib/perl5:/sw/lib/perl5/darwin:/sw/lib/perl5/5.8.8 PATH=/sw/bin:/sw/sbin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin:/usr/X11R6/bin INFOPATH=/sw/share/info:/sw/info:/usr/share/info This is the perl script I'm attempting to run: #!/sw/bin/perl5.8.8 use strict; use Bio::Perl; $seq_object = get_sequence('swiss',"ROA1_HUMAN"); write_sequence(">roa1.fasta",'fasta',$seq_object); Here is the error output: dyld: lazy symbol binding failed: Symbol not found: _Perl_Tstack_sp_ptr Referenced from: /sw/lib/perl5/5.8.8/darwin-thread-multi-2level/auto/IO/IO.bundle Expected in: dynamic lookup dyld: Symbol not found: _Perl_Tstack_sp_ptr Referenced from: /sw/lib/perl5/5.8.8/darwin-thread-multi-2level/auto/IO/IO.bundle Expected in: dynamic lookup Trace/BPT trap I have looked through many forum postings and attempted the solutions offered in those instances, but none seem to work in my case. I'm not sure if it's because I have perl 5.10.0 installed while attempting to call bioperl 5.8.8; however, others seem to have it working just fine. Thank you, Ryan -- View this message in context: http://old.nabble.com/Problems-with-bioperl-in-Mac-OS-X-10.6-%28Perl-5.10.0%29-tp26366421p26366421.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From e.osimo at gmail.com Mon Nov 16 02:04:40 2009 From: e.osimo at gmail.com (Emanuele Osimo) Date: Mon, 16 Nov 2009 08:04:40 +0100 Subject: [Bioperl-l] Problems with bioperl in Mac OS X 10.6 (Perl 5.10.0) In-Reply-To: <26366421.post@talk.nabble.com> References: <26366421.post@talk.nabble.com> Message-ID: <2ac05d0f0911152304v58985cb5x6ea0501bff7a41ab@mail.gmail.com> Hello Ryan, unfortunately, if you upgraded to 10.6 without formatting, I have to tell you that you'll be in big trouble with perl and with everything you installed from the commandline... Because in the upgrade process everything in the system folders, perl and bioperl being some of these things, is erased without being uninstalled, so you'll find a lot of folders with the same name but no contents. I suggest you, as I did, to format your pc and reinstall 10.6 from scratch. Then youl'll be able to install mysql (I had to install mysql-5.4.3-beta-osx10.5, the only to work on 10.6), and, working with perl 5.10 that is already installed, you'll install bioperl with no effort. Bye Emanuele On Mon, Nov 16, 2009 at 04:30, rbogard wrote: > > In advance, any advice would be grealy appreciated! I have installed > bioperl-588pm via fink but I am having difficulties calling the modules in > script. The following is added to .profile (bash): > PERL5LIB=/sw/lib/perl5/5.8.8:$PERL5LIB > > If I change this to /sw/lib/perl5 then I get an @INC error, as use > Bio::PERL > cannot be located. > > The environment variables are as follows: > > > MANPATH=/sw/share/man:/usr/share/man:/usr/X11/man:/sw/lib/perl5/5.10.0/man:/usr/X11R6/man:/sw/lib/perl5-core/5.8.8/man:/sw/lib/perl5/5.8.8/man > > PERL5LIB=/sw/lib/perl5/5.8.8:/sw/lib/perl5:/sw/lib/perl5/darwin:/sw/lib/perl5/5.8.8 > > PATH=/sw/bin:/sw/sbin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin:/usr/X11R6/bin > INFOPATH=/sw/share/info:/sw/info:/usr/share/info > > > This is the perl script I'm attempting to run: > #!/sw/bin/perl5.8.8 > use strict; > use Bio::Perl; > $seq_object = get_sequence('swiss',"ROA1_HUMAN"); > write_sequence(">roa1.fasta",'fasta',$seq_object); > > Here is the error output: > > dyld: lazy symbol binding failed: Symbol not found: _Perl_Tstack_sp_ptr > Referenced from: > /sw/lib/perl5/5.8.8/darwin-thread-multi-2level/auto/IO/IO.bundle > Expected in: dynamic lookup > > dyld: Symbol not found: _Perl_Tstack_sp_ptr > Referenced from: > /sw/lib/perl5/5.8.8/darwin-thread-multi-2level/auto/IO/IO.bundle > Expected in: dynamic lookup > > Trace/BPT trap > > I have looked through many forum postings and attempted the solutions > offered in those instances, but none seem to work in my case. I'm not sure > if it's because I have perl 5.10.0 installed while attempting to call > bioperl 5.8.8; however, others seem to have it working just fine. > > Thank you, Ryan > -- > View this message in context: > http://old.nabble.com/Problems-with-bioperl-in-Mac-OS-X-10.6-%28Perl-5.10.0%29-tp26366421p26366421.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From ryan_bogard at hms.harvard.edu Mon Nov 16 08:43:19 2009 From: ryan_bogard at hms.harvard.edu (rbogard) Date: Mon, 16 Nov 2009 05:43:19 -0800 (PST) Subject: [Bioperl-l] Problems with bioperl in Mac OS X 10.6 (Perl 5.10.0) In-Reply-To: <2ac05d0f0911152304v58985cb5x6ea0501bff7a41ab@mail.gmail.com> References: <26366421.post@talk.nabble.com> <2ac05d0f0911152304v58985cb5x6ea0501bff7a41ab@mail.gmail.com> Message-ID: <26372079.post@talk.nabble.com> The Mac OS X 10.6 was a fresh install on a new Mac Book Pro. Not sure if I will have the same issues, but it's worth a shot as I have little on my computer and reinstalling to start over wouldn't be too difficult. What method did you use to install bioperl? I used fink and I am not sure the available stable version is the one I need. I will install from the command line this time around, and let you know how it turns out. Thank you! Emanuele Osimo wrote: > > Hello Ryan, > unfortunately, if you upgraded to 10.6 without formatting, I have to tell > you that you'll be in big trouble with perl and with everything you > installed from the commandline... Because in the upgrade process > everything > in the system folders, perl and bioperl being some of these things, is > erased without being uninstalled, so you'll find a lot of folders with the > same name but no contents. > I suggest you, as I did, to format your pc and reinstall 10.6 from > scratch. > Then youl'll be able to install mysql (I had to install > mysql-5.4.3-beta-osx10.5, the only to work on 10.6), and, working with > perl > 5.10 that is already installed, you'll install bioperl with no effort. > Bye > Emanuele > > On Mon, Nov 16, 2009 at 04:30, rbogard > wrote: > >> >> In advance, any advice would be grealy appreciated! I have installed >> bioperl-588pm via fink but I am having difficulties calling the modules >> in >> script. The following is added to .profile (bash): >> PERL5LIB=/sw/lib/perl5/5.8.8:$PERL5LIB >> >> If I change this to /sw/lib/perl5 then I get an @INC error, as use >> Bio::PERL >> cannot be located. >> >> The environment variables are as follows: >> >> >> MANPATH=/sw/share/man:/usr/share/man:/usr/X11/man:/sw/lib/perl5/5.10.0/man:/usr/X11R6/man:/sw/lib/perl5-core/5.8.8/man:/sw/lib/perl5/5.8.8/man >> >> PERL5LIB=/sw/lib/perl5/5.8.8:/sw/lib/perl5:/sw/lib/perl5/darwin:/sw/lib/perl5/5.8.8 >> >> PATH=/sw/bin:/sw/sbin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin:/usr/X11R6/bin >> INFOPATH=/sw/share/info:/sw/info:/usr/share/info >> >> >> This is the perl script I'm attempting to run: >> #!/sw/bin/perl5.8.8 >> use strict; >> use Bio::Perl; >> $seq_object = get_sequence('swiss',"ROA1_HUMAN"); >> write_sequence(">roa1.fasta",'fasta',$seq_object); >> >> Here is the error output: >> >> dyld: lazy symbol binding failed: Symbol not found: _Perl_Tstack_sp_ptr >> Referenced from: >> /sw/lib/perl5/5.8.8/darwin-thread-multi-2level/auto/IO/IO.bundle >> Expected in: dynamic lookup >> >> dyld: Symbol not found: _Perl_Tstack_sp_ptr >> Referenced from: >> /sw/lib/perl5/5.8.8/darwin-thread-multi-2level/auto/IO/IO.bundle >> Expected in: dynamic lookup >> >> Trace/BPT trap >> >> I have looked through many forum postings and attempted the solutions >> offered in those instances, but none seem to work in my case. I'm not >> sure >> if it's because I have perl 5.10.0 installed while attempting to call >> bioperl 5.8.8; however, others seem to have it working just fine. >> >> Thank you, Ryan >> -- >> View this message in context: >> http://old.nabble.com/Problems-with-bioperl-in-Mac-OS-X-10.6-%28Perl-5.10.0%29-tp26366421p26366421.html >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://old.nabble.com/Problems-with-bioperl-in-Mac-OS-X-10.6-%28Perl-5.10.0%29-tp26366421p26372079.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From maj at fortinbras.us Mon Nov 16 08:48:17 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 16 Nov 2009 08:48:17 -0500 Subject: [Bioperl-l] Problems with bioperl in Mac OS X 10.6 (Perl 5.10.0) In-Reply-To: <26372079.post@talk.nabble.com> References: <26366421.post@talk.nabble.com><2ac05d0f0911152304v58985cb5x6ea0501bff7a41ab@mail.gmail.com> <26372079.post@talk.nabble.com> Message-ID: <8D822081B13F49C2A37677D3A47F38B4@NewLife> Ryan, I'm not a mac person, but Koen has said (see http://www.bioperl.org/wiki/Getting_BioPerl#Mac_OS_X_using_fink ) to use the unstable tree to get BioPerl 1.6.1, which is likely to be what you want. cheers Mark ----- Original Message ----- From: "rbogard" To: Sent: Monday, November 16, 2009 8:43 AM Subject: Re: [Bioperl-l] Problems with bioperl in Mac OS X 10.6 (Perl 5.10.0) > > The Mac OS X 10.6 was a fresh install on a new Mac Book Pro. Not sure if I > will have the same issues, but it's worth a shot as I have little on my > computer and reinstalling to start over wouldn't be too difficult. What > method did you use to install bioperl? I used fink and I am not sure the > available stable version is the one I need. I will install from the command > line this time around, and let you know how it turns out. > > Thank you! > > > > Emanuele Osimo wrote: >> >> Hello Ryan, >> unfortunately, if you upgraded to 10.6 without formatting, I have to tell >> you that you'll be in big trouble with perl and with everything you >> installed from the commandline... Because in the upgrade process >> everything >> in the system folders, perl and bioperl being some of these things, is >> erased without being uninstalled, so you'll find a lot of folders with the >> same name but no contents. >> I suggest you, as I did, to format your pc and reinstall 10.6 from >> scratch. >> Then youl'll be able to install mysql (I had to install >> mysql-5.4.3-beta-osx10.5, the only to work on 10.6), and, working with >> perl >> 5.10 that is already installed, you'll install bioperl with no effort. >> Bye >> Emanuele >> >> On Mon, Nov 16, 2009 at 04:30, rbogard >> wrote: >> >>> >>> In advance, any advice would be grealy appreciated! I have installed >>> bioperl-588pm via fink but I am having difficulties calling the modules >>> in >>> script. The following is added to .profile (bash): >>> PERL5LIB=/sw/lib/perl5/5.8.8:$PERL5LIB >>> >>> If I change this to /sw/lib/perl5 then I get an @INC error, as use >>> Bio::PERL >>> cannot be located. >>> >>> The environment variables are as follows: >>> >>> >>> MANPATH=/sw/share/man:/usr/share/man:/usr/X11/man:/sw/lib/perl5/5.10.0/man:/usr/X11R6/man:/sw/lib/perl5-core/5.8.8/man:/sw/lib/perl5/5.8.8/man >>> >>> PERL5LIB=/sw/lib/perl5/5.8.8:/sw/lib/perl5:/sw/lib/perl5/darwin:/sw/lib/perl5/5.8.8 >>> >>> PATH=/sw/bin:/sw/sbin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin:/usr/X11R6/bin >>> INFOPATH=/sw/share/info:/sw/info:/usr/share/info >>> >>> >>> This is the perl script I'm attempting to run: >>> #!/sw/bin/perl5.8.8 >>> use strict; >>> use Bio::Perl; >>> $seq_object = get_sequence('swiss',"ROA1_HUMAN"); >>> write_sequence(">roa1.fasta",'fasta',$seq_object); >>> >>> Here is the error output: >>> >>> dyld: lazy symbol binding failed: Symbol not found: _Perl_Tstack_sp_ptr >>> Referenced from: >>> /sw/lib/perl5/5.8.8/darwin-thread-multi-2level/auto/IO/IO.bundle >>> Expected in: dynamic lookup >>> >>> dyld: Symbol not found: _Perl_Tstack_sp_ptr >>> Referenced from: >>> /sw/lib/perl5/5.8.8/darwin-thread-multi-2level/auto/IO/IO.bundle >>> Expected in: dynamic lookup >>> >>> Trace/BPT trap >>> >>> I have looked through many forum postings and attempted the solutions >>> offered in those instances, but none seem to work in my case. I'm not >>> sure >>> if it's because I have perl 5.10.0 installed while attempting to call >>> bioperl 5.8.8; however, others seem to have it working just fine. >>> >>> Thank you, Ryan >>> -- >>> View this message in context: >>> http://old.nabble.com/Problems-with-bioperl-in-Mac-OS-X-10.6-%28Perl-5.10.0%29-tp26366421p26366421.html >>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > -- > View this message in context: > http://old.nabble.com/Problems-with-bioperl-in-Mac-OS-X-10.6-%28Perl-5.10.0%29-tp26366421p26372079.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Mon Nov 16 10:00:09 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 16 Nov 2009 09:00:09 -0600 Subject: [Bioperl-l] Problems with bioperl in Mac OS X 10.6 (Perl 5.10.0) In-Reply-To: <2ac05d0f0911152304v58985cb5x6ea0501bff7a41ab@mail.gmail.com> References: <26366421.post@talk.nabble.com> <2ac05d0f0911152304v58985cb5x6ea0501bff7a41ab@mail.gmail.com> Message-ID: <49681E01-E95D-4FC6-AE42-6E57ED43AAA2@illinois.edu> On Nov 16, 2009, at 1:04 AM, Emanuele Osimo wrote: > Hello Ryan, > unfortunately, if you upgraded to 10.6 without formatting, I have to tell > you that you'll be in big trouble with perl and with everything you > installed from the commandline... Because in the upgrade process everything > in the system folders, perl and bioperl being some of these things, is > erased without being uninstalled, so you'll find a lot of folders with the > same name but no contents. > I suggest you, as I did, to format your pc and reinstall 10.6 from scratch. > Then youl'll be able to install mysql (I had to install > mysql-5.4.3-beta-osx10.5, the only to work on 10.6), and, working with perl > 5.10 that is already installed, you'll install bioperl with no effort. > Bye > Emanuele Just starting from scratch isn't always the best solution (though it is the cleanest). In this case I don't think anything you mention applies, as there are conflicting symbols being reported. My guess is conflicting perl builds, probably between your system 5.10.0 (snow leopard) and your fink-installed perl 5.8.8 (they are binary incompatible). Also, remember that snow leopard is primarily 64-bit, so it might be best to try working out whether your fink is attempting to compile 64- vs 32-bit. In this case, I would just uninstall the fink-based perl and either use the system one (snow leopard = 5.10.0), or roll your own and install 5.10.1 locally or in /usr/local. Do NOT replace the system one, as that will likely break your OS. In my experience, and not to bash on fink or MacPorts, I never had much luck with their perl installs. Unless I plan on only using fink or macports for my OS (not likely in my case), I find they tend to cause problems in the long term unless one uses them to install packages with very few dependencies, and even then you need to make sure fink is configure to compile the correct binary. For instance, they're fairly good for gd, libxml2, etc., but beyond that one may get into issues with odd, version-specific dependencies with some packages, such as relying on perl 5.8.8 (but not perl 5.10.x), db42 (instead of db44), etc. I've ended up in the past with 2-3 different perl versions, berkeley db versions, etc. chris > On Mon, Nov 16, 2009 at 04:30, rbogard wrote: > >> >> In advance, any advice would be grealy appreciated! I have installed >> bioperl-588pm via fink but I am having difficulties calling the modules in >> script. The following is added to .profile (bash): >> PERL5LIB=/sw/lib/perl5/5.8.8:$PERL5LIB >> >> If I change this to /sw/lib/perl5 then I get an @INC error, as use >> Bio::PERL >> cannot be located. >> >> The environment variables are as follows: >> >> >> MANPATH=/sw/share/man:/usr/share/man:/usr/X11/man:/sw/lib/perl5/5.10.0/man:/usr/X11R6/man:/sw/lib/perl5-core/5.8.8/man:/sw/lib/perl5/5.8.8/man >> >> PERL5LIB=/sw/lib/perl5/5.8.8:/sw/lib/perl5:/sw/lib/perl5/darwin:/sw/lib/perl5/5.8.8 >> >> PATH=/sw/bin:/sw/sbin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin:/usr/X11R6/bin >> INFOPATH=/sw/share/info:/sw/info:/usr/share/info >> >> >> This is the perl script I'm attempting to run: >> #!/sw/bin/perl5.8.8 >> use strict; >> use Bio::Perl; >> $seq_object = get_sequence('swiss',"ROA1_HUMAN"); >> write_sequence(">roa1.fasta",'fasta',$seq_object); >> >> Here is the error output: >> >> dyld: lazy symbol binding failed: Symbol not found: _Perl_Tstack_sp_ptr >> Referenced from: >> /sw/lib/perl5/5.8.8/darwin-thread-multi-2level/auto/IO/IO.bundle >> Expected in: dynamic lookup >> >> dyld: Symbol not found: _Perl_Tstack_sp_ptr >> Referenced from: >> /sw/lib/perl5/5.8.8/darwin-thread-multi-2level/auto/IO/IO.bundle >> Expected in: dynamic lookup >> >> Trace/BPT trap >> >> I have looked through many forum postings and attempted the solutions >> offered in those instances, but none seem to work in my case. I'm not sure >> if it's because I have perl 5.10.0 installed while attempting to call >> bioperl 5.8.8; however, others seem to have it working just fine. >> >> Thank you, Ryan >> -- >> View this message in context: >> http://old.nabble.com/Problems-with-bioperl-in-Mac-OS-X-10.6-%28Perl-5.10.0%29-tp26366421p26366421.html >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Nov 16 10:01:01 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 16 Nov 2009 09:01:01 -0600 Subject: [Bioperl-l] Problems with bioperl in Mac OS X 10.6 (Perl 5.10.0) In-Reply-To: <8D822081B13F49C2A37677D3A47F38B4@NewLife> References: <26366421.post@talk.nabble.com><2ac05d0f0911152304v58985cb5x6ea0501bff7a41ab@mail.gmail.com> <26372079.post@talk.nabble.com> <8D822081B13F49C2A37677D3A47F38B4@NewLife> Message-ID: <58912861-CD59-4AFC-8F30-B0AA2E77AECB@illinois.edu> Actually, why not just install via CPAN? Any particular reason? chris On Nov 16, 2009, at 7:48 AM, Mark A. Jensen wrote: > Ryan, > I'm not a mac person, but Koen has said (see http://www.bioperl.org/wiki/Getting_BioPerl#Mac_OS_X_using_fink ) > to use the unstable tree to get BioPerl 1.6.1, which is likely to be what you want. > cheers > Mark > ----- Original Message ----- From: "rbogard" > To: > Sent: Monday, November 16, 2009 8:43 AM > Subject: Re: [Bioperl-l] Problems with bioperl in Mac OS X 10.6 (Perl 5.10.0) > > >> >> The Mac OS X 10.6 was a fresh install on a new Mac Book Pro. Not sure if I >> will have the same issues, but it's worth a shot as I have little on my >> computer and reinstalling to start over wouldn't be too difficult. What >> method did you use to install bioperl? I used fink and I am not sure the >> available stable version is the one I need. I will install from the command >> line this time around, and let you know how it turns out. >> >> Thank you! >> >> >> >> Emanuele Osimo wrote: >>> >>> Hello Ryan, >>> unfortunately, if you upgraded to 10.6 without formatting, I have to tell >>> you that you'll be in big trouble with perl and with everything you >>> installed from the commandline... Because in the upgrade process >>> everything >>> in the system folders, perl and bioperl being some of these things, is >>> erased without being uninstalled, so you'll find a lot of folders with the >>> same name but no contents. >>> I suggest you, as I did, to format your pc and reinstall 10.6 from >>> scratch. >>> Then youl'll be able to install mysql (I had to install >>> mysql-5.4.3-beta-osx10.5, the only to work on 10.6), and, working with >>> perl >>> 5.10 that is already installed, you'll install bioperl with no effort. >>> Bye >>> Emanuele >>> >>> On Mon, Nov 16, 2009 at 04:30, rbogard >>> wrote: >>> >>>> >>>> In advance, any advice would be grealy appreciated! I have installed >>>> bioperl-588pm via fink but I am having difficulties calling the modules >>>> in >>>> script. The following is added to .profile (bash): >>>> PERL5LIB=/sw/lib/perl5/5.8.8:$PERL5LIB >>>> >>>> If I change this to /sw/lib/perl5 then I get an @INC error, as use >>>> Bio::PERL >>>> cannot be located. >>>> >>>> The environment variables are as follows: >>>> >>>> >>>> MANPATH=/sw/share/man:/usr/share/man:/usr/X11/man:/sw/lib/perl5/5.10.0/man:/usr/X11R6/man:/sw/lib/perl5-core/5.8.8/man:/sw/lib/perl5/5.8.8/man >>>> >>>> PERL5LIB=/sw/lib/perl5/5.8.8:/sw/lib/perl5:/sw/lib/perl5/darwin:/sw/lib/perl5/5.8.8 >>>> >>>> PATH=/sw/bin:/sw/sbin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin:/usr/X11R6/bin >>>> INFOPATH=/sw/share/info:/sw/info:/usr/share/info >>>> >>>> >>>> This is the perl script I'm attempting to run: >>>> #!/sw/bin/perl5.8.8 >>>> use strict; >>>> use Bio::Perl; >>>> $seq_object = get_sequence('swiss',"ROA1_HUMAN"); >>>> write_sequence(">roa1.fasta",'fasta',$seq_object); >>>> >>>> Here is the error output: >>>> >>>> dyld: lazy symbol binding failed: Symbol not found: _Perl_Tstack_sp_ptr >>>> Referenced from: >>>> /sw/lib/perl5/5.8.8/darwin-thread-multi-2level/auto/IO/IO.bundle >>>> Expected in: dynamic lookup >>>> >>>> dyld: Symbol not found: _Perl_Tstack_sp_ptr >>>> Referenced from: >>>> /sw/lib/perl5/5.8.8/darwin-thread-multi-2level/auto/IO/IO.bundle >>>> Expected in: dynamic lookup >>>> >>>> Trace/BPT trap >>>> >>>> I have looked through many forum postings and attempted the solutions >>>> offered in those instances, but none seem to work in my case. I'm not >>>> sure >>>> if it's because I have perl 5.10.0 installed while attempting to call >>>> bioperl 5.8.8; however, others seem to have it working just fine. >>>> >>>> Thank you, Ryan >>>> -- >>>> View this message in context: >>>> http://old.nabble.com/Problems-with-bioperl-in-Mac-OS-X-10.6-%28Perl-5.10.0%29-tp26366421p26366421.html >>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> -- >> View this message in context: http://old.nabble.com/Problems-with-bioperl-in-Mac-OS-X-10.6-%28Perl-5.10.0%29-tp26366421p26372079.html >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Kevin.M.Brown at asu.edu Mon Nov 16 10:49:13 2009 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Mon, 16 Nov 2009 08:49:13 -0700 Subject: [Bioperl-l] Bio::Graphics::Panel question In-Reply-To: References: Message-ID: <1A4207F8295607498283FE9E93B775B40663EDB9@EX02.asurite.ad.asu.edu> To really be able to tell if this was a bug, I (and probably the real devs) would need to see that part of your code and the Blast file that is having this issue as it could be your callback for color choice vs the blast object (e.g. your color picker is missing an option that the data comes in with and so returns with a blank value). -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Xiaoyu Liang Sent: Friday, November 13, 2009 1:36 PM To: Bioperl-l at lists.open-bio.org Subject: [Bioperl-l] Bio::Graphics::Panel question Hi, I'm using Bio::Graphics to parse the blast result and generate images. But, sometimes, in the middle of the output image, the hit's color is white, eventhough I set it to other colors. I attached the picture here for an example. This doesn't occur all the time, usually, it works well. I'm wondering if I did something wrong? or depends on the blast result? Thank you, Xiaoyu From ryan_bogard at hms.harvard.edu Mon Nov 16 11:57:16 2009 From: ryan_bogard at hms.harvard.edu (rbogard) Date: Mon, 16 Nov 2009 08:57:16 -0800 (PST) Subject: [Bioperl-l] Problems with bioperl in Mac OS X 10.6 (Perl 5.10.0) In-Reply-To: <58912861-CD59-4AFC-8F30-B0AA2E77AECB@illinois.edu> References: <26366421.post@talk.nabble.com> <2ac05d0f0911152304v58985cb5x6ea0501bff7a41ab@mail.gmail.com> <26372079.post@talk.nabble.com> <8D822081B13F49C2A37677D3A47F38B4@NewLife> <58912861-CD59-4AFC-8F30-B0AA2E77AECB@illinois.edu> Message-ID: <26375418.post@talk.nabble.com> I read that posting by Koen and used the unstable tree after the first attempt; however, the errors still persisted. I just finished a fresh install and I will just follow Mr. Fields advice and use CPAN. Thank you all for the help! Chris Fields-5 wrote: > > Actually, why not just install via CPAN? Any particular reason? > > chris > > On Nov 16, 2009, at 7:48 AM, Mark A. Jensen wrote: > >> Ryan, >> I'm not a mac person, but Koen has said (see >> http://www.bioperl.org/wiki/Getting_BioPerl#Mac_OS_X_using_fink ) >> to use the unstable tree to get BioPerl 1.6.1, which is likely to be what >> you want. >> cheers >> Mark >> ----- Original Message ----- From: "rbogard" >> >> To: >> Sent: Monday, November 16, 2009 8:43 AM >> Subject: Re: [Bioperl-l] Problems with bioperl in Mac OS X 10.6 (Perl >> 5.10.0) >> >> >>> >>> The Mac OS X 10.6 was a fresh install on a new Mac Book Pro. Not sure if >>> I >>> will have the same issues, but it's worth a shot as I have little on my >>> computer and reinstalling to start over wouldn't be too difficult. What >>> method did you use to install bioperl? I used fink and I am not sure the >>> available stable version is the one I need. I will install from the >>> command >>> line this time around, and let you know how it turns out. >>> >>> Thank you! >>> >>> >>> >>> Emanuele Osimo wrote: >>>> >>>> Hello Ryan, >>>> unfortunately, if you upgraded to 10.6 without formatting, I have to >>>> tell >>>> you that you'll be in big trouble with perl and with everything you >>>> installed from the commandline... Because in the upgrade process >>>> everything >>>> in the system folders, perl and bioperl being some of these things, is >>>> erased without being uninstalled, so you'll find a lot of folders with >>>> the >>>> same name but no contents. >>>> I suggest you, as I did, to format your pc and reinstall 10.6 from >>>> scratch. >>>> Then youl'll be able to install mysql (I had to install >>>> mysql-5.4.3-beta-osx10.5, the only to work on 10.6), and, working with >>>> perl >>>> 5.10 that is already installed, you'll install bioperl with no effort. >>>> Bye >>>> Emanuele >>>> >>>> On Mon, Nov 16, 2009 at 04:30, rbogard >>>> wrote: >>>> >>>>> >>>>> In advance, any advice would be grealy appreciated! I have installed >>>>> bioperl-588pm via fink but I am having difficulties calling the >>>>> modules >>>>> in >>>>> script. The following is added to .profile (bash): >>>>> PERL5LIB=/sw/lib/perl5/5.8.8:$PERL5LIB >>>>> >>>>> If I change this to /sw/lib/perl5 then I get an @INC error, as use >>>>> Bio::PERL >>>>> cannot be located. >>>>> >>>>> The environment variables are as follows: >>>>> >>>>> >>>>> MANPATH=/sw/share/man:/usr/share/man:/usr/X11/man:/sw/lib/perl5/5.10.0/man:/usr/X11R6/man:/sw/lib/perl5-core/5.8.8/man:/sw/lib/perl5/5.8.8/man >>>>> >>>>> PERL5LIB=/sw/lib/perl5/5.8.8:/sw/lib/perl5:/sw/lib/perl5/darwin:/sw/lib/perl5/5.8.8 >>>>> >>>>> PATH=/sw/bin:/sw/sbin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin:/usr/X11R6/bin >>>>> INFOPATH=/sw/share/info:/sw/info:/usr/share/info >>>>> >>>>> >>>>> This is the perl script I'm attempting to run: >>>>> #!/sw/bin/perl5.8.8 >>>>> use strict; >>>>> use Bio::Perl; >>>>> $seq_object = get_sequence('swiss',"ROA1_HUMAN"); >>>>> write_sequence(">roa1.fasta",'fasta',$seq_object); >>>>> >>>>> Here is the error output: >>>>> >>>>> dyld: lazy symbol binding failed: Symbol not found: >>>>> _Perl_Tstack_sp_ptr >>>>> Referenced from: >>>>> /sw/lib/perl5/5.8.8/darwin-thread-multi-2level/auto/IO/IO.bundle >>>>> Expected in: dynamic lookup >>>>> >>>>> dyld: Symbol not found: _Perl_Tstack_sp_ptr >>>>> Referenced from: >>>>> /sw/lib/perl5/5.8.8/darwin-thread-multi-2level/auto/IO/IO.bundle >>>>> Expected in: dynamic lookup >>>>> >>>>> Trace/BPT trap >>>>> >>>>> I have looked through many forum postings and attempted the solutions >>>>> offered in those instances, but none seem to work in my case. I'm not >>>>> sure >>>>> if it's because I have perl 5.10.0 installed while attempting to call >>>>> bioperl 5.8.8; however, others seem to have it working just fine. >>>>> >>>>> Thank you, Ryan >>>>> -- >>>>> View this message in context: >>>>> http://old.nabble.com/Problems-with-bioperl-in-Mac-OS-X-10.6-%28Perl-5.10.0%29-tp26366421p26366421.html >>>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>> >>> -- >>> View this message in context: >>> http://old.nabble.com/Problems-with-bioperl-in-Mac-OS-X-10.6-%28Perl-5.10.0%29-tp26366421p26372079.html >>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://old.nabble.com/Problems-with-bioperl-in-Mac-OS-X-10.6-%28Perl-5.10.0%29-tp26366421p26375418.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From krishna.aneesh at gmail.com Mon Nov 16 02:00:15 2009 From: krishna.aneesh at gmail.com (Aneesh K) Date: Mon, 16 Nov 2009 12:30:15 +0530 Subject: [Bioperl-l] Regarding Bio::TreeIO Object Message-ID: <9cb9dfd70911152300y34789f88qc69dd14bf505f57d@mail.gmail.com> Hi, I just started to use Bioperl modules. It's really useful and interesting. Now I have in stuck with "Tree objects and phylogenetic trees". I couldn't get any documentation/examples about reading/parsing phylip tree files. Please tell me from where I can get some sample codes for this. Waiting for your reply. Thanks Aneesh.K Mob. 09646181517 From David.Messina at sbc.su.se Mon Nov 16 12:33:36 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 16 Nov 2009 18:33:36 +0100 Subject: [Bioperl-l] highest PAML version supported? In-Reply-To: References: <628aabb70911121120w4c609056v50204b9bd9e5c3fb@mail.gmail.com> Message-ID: Hi everyone, I just committed support for parsing codeml 4.3a (August 2009) to bioperl-live. I added new tests and all PAML-related tests pass, but please report any problems you have to the list. Note that I haven't tested the other PAML 4.3a executables to see if there are format changes with those. If you get the chance to try any and it doesn't work, let me know and I'll try to add support for them. (Note that these changes are only to the PAML parsing code; Bio::Tools::Run already appears to handle 4.3a just fine.) Dave From jason at bioperl.org Mon Nov 16 12:34:57 2009 From: jason at bioperl.org (Jason Stajich) Date: Mon, 16 Nov 2009 09:34:57 -0800 Subject: [Bioperl-l] Regarding Bio::TreeIO Object In-Reply-To: <9cb9dfd70911152300y34789f88qc69dd14bf505f57d@mail.gmail.com> References: <9cb9dfd70911152300y34789f88qc69dd14bf505f57d@mail.gmail.com> Message-ID: Is this at all helpful to your questions. http://www.bioperl.org/wiki/HOWTO:Trees The trees are in 'newick' or new hampshire format though I don't think there is a phylip format for trees. -jason On Nov 15, 2009, at 11:00 PM, Aneesh K wrote: > Hi, > > I just started to use Bioperl modules. It's really useful and > interesting. > Now I have in stuck with "Tree objects and phylogenetic trees". > I couldn't get any documentation/examples about reading/parsing > phylip tree > files. > > Please tell me from where I can get some sample codes for this. > > Waiting for your reply. > > Thanks > Aneesh.K > Mob. 09646181517 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From roy.chaudhuri at gmail.com Mon Nov 16 12:31:49 2009 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Mon, 16 Nov 2009 17:31:49 +0000 Subject: [Bioperl-l] Regarding Bio::TreeIO Object In-Reply-To: <9cb9dfd70911152300y34789f88qc69dd14bf505f57d@mail.gmail.com> References: <9cb9dfd70911152300y34789f88qc69dd14bf505f57d@mail.gmail.com> Message-ID: <4B018C85.6020801@gmail.com> Hi Aneesh, See the Bioperl trees howto: http://www.bioperl.org/wiki/HOWTO:Trees Roy. Aneesh K wrote: > Hi, > > I just started to use Bioperl modules. It's really useful and interesting. > Now I have in stuck with "Tree objects and phylogenetic trees". > I couldn't get any documentation/examples about reading/parsing phylip tree > files. > > Please tell me from where I can get some sample codes for this. > > Waiting for your reply. > > Thanks > Aneesh.K > Mob. 09646181517 -- Dr. Roy Chaudhuri Department of Veterinary Medicine University of Cambridge, U.K. From Kevin.M.Brown at asu.edu Mon Nov 16 13:22:07 2009 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Mon, 16 Nov 2009 11:22:07 -0700 Subject: [Bioperl-l] FW: Bio::Graphics::Panel question Message-ID: <1A4207F8295607498283FE9E93B775B40663EE37@EX02.asurite.ad.asu.edu> Please keep your responses on the list for more timely help. Kevin Brown Center for Innovations in Medicine Biodesign Institute Arizona State University ________________________________ From: Xiaoyu Liang [mailto:veronica.xiaoyu at gmail.com] Sent: Monday, November 16, 2009 9:34 AM To: Kevin Brown Subject: Re: [Bioperl-l] Bio::Graphics::Panel question Hi Kevin, Thank you for ur quick response. I attached the BLAST .out file here. And the follow is my code part. I have an array keeping the color for each hit, and I printed it out the array, there is no missing. my $track = $panel->add_track( -glyph => 'graded_segments', -label => 1, -connector => 'dashed', -font2color => 'red', -sort_order => 'high_score', -description => sub { $feature = shift; #print "--".$feature."\n"; return unless $feature->has_tag('description'); my ($description) = $feature->each_tag_value('description'); my ($id) = $feature->display_name; my @records= split(/\|/,$description); my $score = $feature->score; #print $id.":".$score."\n"; if($score >=200){ push (@color_array,1); }elsif($score >=80){ push (@color_array,2); }elsif($score >=50){ push (@color_array,3); }elsif($score >= 40){ push (@color_array,4); }else{ push (@color_array,5); } if($type == 1){ "Species:Arabidopsis TF Family:$records[1] Score=$score"; }elsif($type == 2){ if(scalar(@records)==5){ "Species:$records[1] TF Family:$records[2] Accepted Name:$records[3] Score=$score"; }else{ "Species:$records[1] TF Family:$records[2] Score=$score"; } }else{ ""; } }, -bgcolor => sub{ return unless $feature->has_tag('description'); if($color_array[$index] == 1 ){ $color = 'red'; } if($color_array[$index]== 2){ $color = 'orange'; } if($color_array[$index]== 3){ $color = 'green'; } if($color_array[$index]== 4){ $color = 'blue'; } if($color_array[$index]== 5){ $color = 'black'; } #if ($index == 20){ # $color = 'black'; #} #print $index."--".$color_array[$index]."\n"; $index++; #print $feature."\n"; #print $feature->display_name."\n"; return $color; }, ); Best regrads, Xiaoyu On Mon, Nov 16, 2009 at 10:49 AM, Kevin Brown wrote: To really be able to tell if this was a bug, I (and probably the real devs) would need to see that part of your code and the Blast file that is having this issue as it could be your callback for color choice vs the blast object (e.g. your color picker is missing an option that the data comes in with and so returns with a blank value). -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Xiaoyu Liang Sent: Friday, November 13, 2009 1:36 PM To: Bioperl-l at lists.open-bio.org Subject: [Bioperl-l] Bio::Graphics::Panel question Hi, I'm using Bio::Graphics to parse the blast result and generate images. But, sometimes, in the middle of the output image, the hit's color is white, eventhough I set it to other colors. I attached the picture here for an example. This doesn't occur all the time, usually, it works well. I'm wondering if I did something wrong? or depends on the blast result? Thank you, Xiaoyu _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l -------------- next part -------------- A non-text attachment was scrubbed... Name: 1258388779.out Type: application/octet-stream Size: 32599 bytes Desc: 1258388779.out URL: From paolo.pavan at gmail.com Mon Nov 16 14:06:06 2009 From: paolo.pavan at gmail.com (Paolo Pavan) Date: Mon, 16 Nov 2009 20:06:06 +0100 Subject: [Bioperl-l] bioperl-ext installation issue Message-ID: <56be91b60911161106w69e20fd9k133a465e8d4f8a3f@mail.gmail.com> Hi everybody, I have problems installing the bioperl-ext package, any help is much appreciated. 1) - I start trying with cpan i /bioperl-ext/ the only resource available is /B/BI/BIRNEY/bioperl-ext-1.4 (is it ok?) - I install Inline::MakeMaker and Inline::C then - i/BIRNEY/bioperl-ext-1.4/ fails bacause I don't have staden package 2) I try to install io_lib-1.8.10.tar as suggested by the README ( ftp://ftp.mrc-lmb.cam.ac.uk/pub/staden/io_lib/), installation fails after: ... gcc -g -O2 -o makeSCF makeSCF.o ../read/.libs/libread.a -lz -lm ../read/.libs/libread.a(compress.o): In function `fopen_compressed': /root/Download/staden/io_lib-1.8.10/utils/compress.c:321: warning: the use of `tempnam' is dangerous, better use `mkstemp' gcc -DHAVE_CONFIG_H -I. -I. -I.. -I.. -I../read -I../alf -I../abi -I../ctf -I../ztr -I../plain -I../scf -I../exp_file -I../utils -I/usr/local/include -g -O2 -c -o extract_seq.o `test -f extract_seq.c || echo './'`extract_seq.c /bin/sh ../libtool --mode=link gcc -g -O2 -o extract_seq extract_seq.o ../read/libread.la gcc -g -O2 -o extract_seq extract_seq.o ../read/.libs/libread.a -lz -lm ../read/.libs/libread.a(compress.o): In function `fopen_compressed': /root/Download/staden/io_lib-1.8.10/utils/compress.c:321: warning: the use of `tempnam' is dangerous, better use `mkstemp' gcc -DHAVE_CONFIG_H -I. -I. -I.. -I.. -I../read -I../alf -I../abi -I../ctf -I../ztr -I../plain -I../scf -I../exp_file -I../utils -I/usr/local/include -g -O2 -c -o index_tar.o `test -f index_tar.c || echo './'`index_tar.c index_tar.c: In function ?main?: index_tar.c:12: error: two or more data types in declaration specifiers make[2]: *** [index_tar.o] Error 1 make[2]: Leaving directory `/home/root/Download/staden/io_lib-1.8.10/progs' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/home/root/Download/staden/io_lib-1.8.10' make: *** [all-recursive-am] Error 2 3) I give up staden, because I actually need pSW, and try to install from Makefile.PL in Bio/Ext/Align but installation fails after: ... Align.xs:18: warning: ?not_here? defined but not used Running Mkbootstrap for Bio::Ext::Align () chmod 644 Align.bs rm -f ../blib/arch/auto/Bio/Ext/Align/Align.so gcc -shared -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic Align.o -o ../blib/arch/auto/Bio/Ext/Align/Align.so libs/libsw.a \ -lm \ /usr/bin/ld: libs/libsw.a(aln.o): relocation R_X86_64_32 against `a local symbol' can not be used when making a shared object; recompile with -fPIC libs/libsw.a: could not read symbols: Bad value collect2: ld returned 1 exit status make[1]: *** [../blib/arch/auto/Bio/Ext/Align/Align.so] Error 1 make[1]: Leaving directory `/home/root/.cpan/sources/authors/id/B/BI/BIRNEY/bioperl-ext-1.4/Bio/Ext/Align' make: *** [subdirs] Error 2 I have also made some other tries such force install Bio::Ext:Align without success but I'm sure I miss something trivial that I can't catch. Can someone help me? Thank you, Paolo From lincoln.stein at gmail.com Mon Nov 16 15:08:20 2009 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Mon, 16 Nov 2009 15:08:20 -0500 Subject: [Bioperl-l] FW: Bio::Graphics::Panel question In-Reply-To: <1A4207F8295607498283FE9E93B775B40663EE37@EX02.asurite.ad.asu.edu> References: <1A4207F8295607498283FE9E93B775B40663EE37@EX02.asurite.ad.asu.edu> Message-ID: <6dce9a0b0911161208q2f826d83s319184f0cacca097@mail.gmail.com> Hi, I think you should modify your color selection code as follows: if($color_array[$index] == 1 ){ $color = 'red'; } elsif($color_array[$index]== 2){ $color = 'orange'; } elsif($color_array[$index]== 3){ $color = 'green'; } elsif($color_array[$index]== 4){ $color = 'blue'; } elsif($color_array[$index]== 5){ $color = 'black'; } else { die "unexpected color array value $color_array[$index]" } Lincoln On Mon, Nov 16, 2009 at 1:22 PM, Kevin Brown wrote: > Please keep your responses on the list for more timely help. > > > Kevin Brown > Center for Innovations in Medicine > Biodesign Institute > Arizona State University > > > > ________________________________ > > From: Xiaoyu Liang [mailto:veronica.xiaoyu at gmail.com] > Sent: Monday, November 16, 2009 9:34 AM > To: Kevin Brown > Subject: Re: [Bioperl-l] Bio::Graphics::Panel question > > > Hi Kevin, > > Thank you for ur quick response. I attached the BLAST .out file here. > And the follow is my code part. I have an array keeping the color for > each hit, and I printed it out the array, there is no missing. > > my $track = $panel->add_track( > -glyph => 'graded_segments', > -label => 1, > -connector => 'dashed', > -font2color => 'red', > -sort_order => 'high_score', > -description => sub { > $feature = shift; > #print "--".$feature."\n"; > return unless > $feature->has_tag('description'); > my ($description) = > $feature->each_tag_value('description'); > my ($id) = $feature->display_name; > my @records= split(/\|/,$description); > my $score = $feature->score; > #print $id.":".$score."\n"; > if($score >=200){ > push (@color_array,1); > }elsif($score >=80){ > push (@color_array,2); > }elsif($score >=50){ > push (@color_array,3); > }elsif($score >= 40){ > push (@color_array,4); > }else{ > push (@color_array,5); > } > > if($type == 1){ > "Species:Arabidopsis TF > Family:$records[1] Score=$score"; > }elsif($type == 2){ > if(scalar(@records)==5){ > "Species:$records[1] TF > Family:$records[2] Accepted Name:$records[3] Score=$score"; > }else{ > "Species:$records[1] TF > Family:$records[2] Score=$score"; > } > }else{ > ""; > } > }, > -bgcolor => sub{ > return unless > $feature->has_tag('description'); > if($color_array[$index] == 1 ){ > $color = 'red'; > } > if($color_array[$index]== 2){ > $color = 'orange'; > } > if($color_array[$index]== 3){ > $color = 'green'; > } > if($color_array[$index]== 4){ > $color = 'blue'; > } > if($color_array[$index]== 5){ > $color = 'black'; > } > #if ($index == 20){ > # $color = 'black'; > #} > #print > $index."--".$color_array[$index]."\n"; > $index++; > > #print $feature."\n"; > #print > $feature->display_name."\n"; > return $color; > }, > ); > > > Best regrads, > Xiaoyu > > > On Mon, Nov 16, 2009 at 10:49 AM, Kevin Brown > wrote: > > > To really be able to tell if this was a bug, I (and probably the > real > devs) would need to see that part of your code and the Blast > file that > is having this issue as it could be your callback for color > choice vs > the blast object (e.g. your color picker is missing an option > that the > data comes in with and so returns with a blank value). > > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Xiaoyu Liang > Sent: Friday, November 13, 2009 1:36 PM > To: Bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Bio::Graphics::Panel question > > Hi, > > I'm using Bio::Graphics to parse the blast result and generate > images. > But, sometimes, in the middle of the output image, the hit's > color is > white, eventhough I set it to other colors. I attached the > picture here > for an example. This doesn't occur all the time, usually, it > works well. > I'm wondering if I did something wrong? or depends on the blast > result? > > Thank you, > Xiaoyu > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa From ryan_bogard at hms.harvard.edu Mon Nov 16 16:44:25 2009 From: ryan_bogard at hms.harvard.edu (rbogard) Date: Mon, 16 Nov 2009 13:44:25 -0800 (PST) Subject: [Bioperl-l] Problems with bioperl in Mac OS X 10.6 (Perl 5.10.0) In-Reply-To: <26366421.post@talk.nabble.com> References: <26366421.post@talk.nabble.com> Message-ID: <26379710.post@talk.nabble.com> Thank you all for your help! I was able to get bioperl working via manual download and install. It was a combination of permissions issues and X86_64 vs. X86_32 compatibility issues. Using fink to download and install seems to have given me a combination of 32 and 64 associated files (I probably did something wrong in config). rbogard wrote: > > In advance, any advice would be grealy appreciated! I have installed > bioperl-588pm via fink but I am having difficulties calling the modules in > script. The following is added to .profile (bash): > PERL5LIB=/sw/lib/perl5/5.8.8:$PERL5LIB > > If I change this to /sw/lib/perl5 then I get an @INC error, as use > Bio::PERL cannot be located. > > The environment variables are as follows: > > MANPATH=/sw/share/man:/usr/share/man:/usr/X11/man:/sw/lib/perl5/5.10.0/man:/usr/X11R6/man:/sw/lib/perl5-core/5.8.8/man:/sw/lib/perl5/5.8.8/man > PERL5LIB=/sw/lib/perl5/5.8.8:/sw/lib/perl5:/sw/lib/perl5/darwin:/sw/lib/perl5/5.8.8 > PATH=/sw/bin:/sw/sbin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin:/usr/X11R6/bin > INFOPATH=/sw/share/info:/sw/info:/usr/share/info > > > This is the perl script I'm attempting to run: > #!/sw/bin/perl5.8.8 > use strict; > use Bio::Perl; > $seq_object = get_sequence('swiss',"ROA1_HUMAN"); > write_sequence(">roa1.fasta",'fasta',$seq_object); > > Here is the error output: > > dyld: lazy symbol binding failed: Symbol not found: _Perl_Tstack_sp_ptr > Referenced from: > /sw/lib/perl5/5.8.8/darwin-thread-multi-2level/auto/IO/IO.bundle > Expected in: dynamic lookup > > dyld: Symbol not found: _Perl_Tstack_sp_ptr > Referenced from: > /sw/lib/perl5/5.8.8/darwin-thread-multi-2level/auto/IO/IO.bundle > Expected in: dynamic lookup > > Trace/BPT trap > > I have looked through many forum postings and attempted the solutions > offered in those instances, but none seem to work in my case. I'm not sure > if it's because I have perl 5.10.0 installed while attempting to call > bioperl 5.8.8; however, others seem to have it working just fine. > > Thank you, Ryan > -- View this message in context: http://old.nabble.com/Problems-with-bioperl-in-Mac-OS-X-10.6-%28Perl-5.10.0%29-tp26366421p26379710.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From jay at jays.net Mon Nov 16 17:02:10 2009 From: jay at jays.net (Jay Hannah) Date: Mon, 16 Nov 2009 16:02:10 -0600 Subject: [Bioperl-l] Bio::Index::GenBank - by organism? In-Reply-To: <2BA451B1-6E18-483E-B655-74D1146772CC@bioperl.org> References: <3B01A09C-198E-4691-B807-7ED3250BB81A@jays.net> <12DFD22E-42DC-4626-9873-0DE3EBB5CFBD@illinois.edu> <2BA451B1-6E18-483E-B655-74D1146772CC@bioperl.org> Message-ID: <60ADD3A9-D38B-4A39-A5CE-C8118DEC1242@jays.net> On Nov 10, 2009, at 12:50 PM, Jason Stajich wrote: > You might also look at what mygenbank does: > http://homepage.mac.com/iankorf/mygenbank.html It appears, perhaps, that BioSQL can provide *foo* searching like so: http://www.biosql.org/wiki/Schema_Overview#TAXON.2C_TAXON_NAME SELECT DISTINCT include.ncbi_taxon_id FROM taxon INNER JOIN taxon AS include ON (include.left_value BETWEEN taxon.left_value AND taxon.right_value) WHERE taxon.taxon_id IN (SELECT taxon_id FROM taxon_name WHERE name LIKE '%fungi%') So I think we're going to chase that for a while. I didn't see a *foo* search in MyGenBank? Thanks, j http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah From roy.chaudhuri at gmail.com Tue Nov 17 06:24:07 2009 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Tue, 17 Nov 2009 11:24:07 +0000 Subject: [Bioperl-l] Regarding Bio::TreeIO Object In-Reply-To: <9cb9dfd70911162117nfac0e52gea3d638e34337b16@mail.gmail.com> References: <9cb9dfd70911152300y34789f88qc69dd14bf505f57d@mail.gmail.com> <4B018C85.6020801@gmail.com> <9cb9dfd70911162117nfac0e52gea3d638e34337b16@mail.gmail.com> Message-ID: <4B0287D7.5050702@gmail.com> Hi Aneesh, Please keep your replies on the mailing list, that way someone else can respond, which would be particularly useful in this case since I know nothing about MapIO. Roy. Aneesh K wrote: > Thanks for your reply. > > I would like to know about "Genetic Maps" also. I would like to > use MapIO object. > But I'm not aware about genetic maps and the mapmaker format. > > Please tell me from where I can get some examples for mapmaker format > and some example scripts to use MapIO object. > > Hoping your reply. > > Aneesh.K > Mob. 09646181517 > > > > On Mon, Nov 16, 2009 at 11:01 PM, Roy Chaudhuri > wrote: > > Hi Aneesh, > > See the Bioperl trees howto: > http://www.bioperl.org/wiki/HOWTO:Trees > > Roy. > > > Aneesh K wrote: > > Hi, > > I just started to use Bioperl modules. It's really useful and > interesting. > Now I have in stuck with "Tree objects and phylogenetic trees". > I couldn't get any documentation/examples about reading/parsing > phylip tree > files. > > Please tell me from where I can get some sample codes for this. > > Waiting for your reply. > > Thanks > Aneesh.K > Mob. 09646181517 > > > > -- > Dr. Roy Chaudhuri > Department of Veterinary Medicine > University of Cambridge, U.K. > > From maj at fortinbras.us Tue Nov 17 07:50:06 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 17 Nov 2009 07:50:06 -0500 Subject: [Bioperl-l] Regarding Bio::TreeIO Object In-Reply-To: <4B0287D7.5050702@gmail.com> References: <9cb9dfd70911152300y34789f88qc69dd14bf505f57d@mail.gmail.com><4B018C85.6020801@gmail.com><9cb9dfd70911162117nfac0e52gea3d638e34337b16@mail.gmail.com> <4B0287D7.5050702@gmail.com> Message-ID: <394F62D51F15405BBCF8BB50DA0FF336@NewLife> Aneesh, Have a look in the t/Map directory of the BioPerl distribution. These are test scripts that are also examples of usage. The t/data directory will contain the datafiles that the tests use; these will provide example data. cheers Mark ----- Original Message ----- From: "Roy Chaudhuri" To: "Aneesh K" ; Sent: Tuesday, November 17, 2009 6:24 AM Subject: Re: [Bioperl-l] Regarding Bio::TreeIO Object > Hi Aneesh, > > Please keep your replies on the mailing list, that way someone else can > respond, which would be particularly useful in this case since I know > nothing about MapIO. > > Roy. > > Aneesh K wrote: >> Thanks for your reply. >> >> I would like to know about "Genetic Maps" also. I would like to >> use MapIO object. >> But I'm not aware about genetic maps and the mapmaker format. >> >> Please tell me from where I can get some examples for mapmaker format >> and some example scripts to use MapIO object. >> >> Hoping your reply. >> >> Aneesh.K >> Mob. 09646181517 >> >> >> >> On Mon, Nov 16, 2009 at 11:01 PM, Roy Chaudhuri > > wrote: >> >> Hi Aneesh, >> >> See the Bioperl trees howto: >> http://www.bioperl.org/wiki/HOWTO:Trees >> >> Roy. >> >> >> Aneesh K wrote: >> >> Hi, >> >> I just started to use Bioperl modules. It's really useful and >> interesting. >> Now I have in stuck with "Tree objects and phylogenetic trees". >> I couldn't get any documentation/examples about reading/parsing >> phylip tree >> files. >> >> Please tell me from where I can get some sample codes for this. >> >> Waiting for your reply. >> >> Thanks >> Aneesh.K >> Mob. 09646181517 >> >> >> >> -- >> Dr. Roy Chaudhuri >> Department of Veterinary Medicine >> University of Cambridge, U.K. >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From veronica.xiaoyu at gmail.com Wed Nov 18 12:18:33 2009 From: veronica.xiaoyu at gmail.com (Xiaoyu Liang) Date: Wed, 18 Nov 2009 12:18:33 -0500 Subject: [Bioperl-l] how to visualize multiple sequences alignments Message-ID: Hi, I'm wondering Is there any modules that can be used for visualizing multiple sequences alignments? like the result from ClustalW? Thank you very much, Xiaoyu From jason at bioperl.org Wed Nov 18 13:23:05 2009 From: jason at bioperl.org (Jason Stajich) Date: Wed, 18 Nov 2009 10:23:05 -0800 Subject: [Bioperl-l] how to visualize multiple sequences alignments In-Reply-To: References: Message-ID: try jalview http://www.jalview.org/ On Nov 18, 2009, at 9:18 AM, Xiaoyu Liang wrote: > Hi, > > I'm wondering Is there any modules that can be used for visualizing > multiple > sequences alignments? like the result from ClustalW? > > Thank you very much, > Xiaoyu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From andrew.j.grimm at gmail.com Wed Nov 18 21:52:31 2009 From: andrew.j.grimm at gmail.com (Andrew Grimm) Date: Thu, 19 Nov 2009 13:52:31 +1100 Subject: [Bioperl-l] DANGER: hacking of bioperl wiki? Message-ID: Caution: read the whole email before visiting the bioperl wiki I was doing some bioinformatics-related searching using google, and one of the hits was to the bio dot perl dot org wiki (the FAQ in particular). When I did that, I was redirected to a ferdax dot com web site (a typo-squatting of fedex?). Some people reckon that ferdax hacks web sites and redirects google hits from the victim web site to their own web site. For example, this thread at google's webmaster central http://www.google.com/support/forum/p/Webmasters/thread?tid=37a36c0d1ea99819&hl=en#all (it's talking about zencart, but presumably they've since found other victims) Just going to the website without using google may not trigger the redirect. Apologies if this is a false alarm, but I don't think it is. I won't be in contact between Friday and Monday Australian time (I'll be at railscamp 6 in Melbourne), so I won't be able to answer any replies. Thanks, Andrew Grimm From maj at fortinbras.us Wed Nov 18 22:14:44 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 18 Nov 2009 22:14:44 -0500 Subject: [Bioperl-l] DANGER: hacking of bioperl wiki? In-Reply-To: References: Message-ID: <7761C2223DB54DE6B836F302D2FF6AC0@NewLife> Andrew-- thanks!! We're on it. MAJ ----- Original Message ----- From: "Andrew Grimm" To: Sent: Wednesday, November 18, 2009 9:52 PM Subject: [Bioperl-l] DANGER: hacking of bioperl wiki? > Caution: read the whole email before visiting the bioperl wiki > > I was doing some bioinformatics-related searching using google, and > one of the hits was to the bio dot perl dot org wiki (the FAQ in > particular). > > When I did that, I was redirected to a ferdax dot com web site (a > typo-squatting of fedex?). > > Some people reckon that ferdax hacks web sites and redirects google > hits from the victim web site to their own web site. For example, this > thread at google's webmaster central > http://www.google.com/support/forum/p/Webmasters/thread?tid=37a36c0d1ea99819&hl=en#all > (it's talking about zencart, but presumably they've since found other > victims) > > Just going to the website without using google may not trigger the redirect. > > Apologies if this is a false alarm, but I don't think it is. > > I won't be in contact between Friday and Monday Australian time (I'll > be at railscamp 6 in Melbourne), so I won't be able to answer any > replies. > > Thanks, > > Andrew Grimm > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From sandipan.chowdhury at physiology.wisc.edu Thu Nov 19 01:49:45 2009 From: sandipan.chowdhury at physiology.wisc.edu (Sandipan Chowdhury) Date: Thu, 19 Nov 2009 00:49:45 -0600 Subject: [Bioperl-l] accessing EMBL database Message-ID: <475F74057A618245A773CD325E105D1E033334AC@phy-srv01.physiol.physiology.wisc.edu> Hi, I have 3 questions all related to the retreival of sequences from online databases. (1) I have been trying to download a protein sequence from the EMBL database and trying to write the sequence into a text file, as a string. I am using the following code: use Bio::DB::EMBL; open b,">","s.txt"; $em_obj = Bio::DB::EMBL->new; $seq_obj = $em_obj->get_Seq_by_acc("CAB95729"); $s_str = $seq_obj->seq; print b "$s_str\n"; close b; The script is not working and gives the messege: "MSG: EMBL stream with no ID. Not embl in my book STACK: Error::throw STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm: 368 STACK: Bio::SeqIO::embl::next_seq C:/Perl/site/lib/Bio\SeqIO\embl.pm: 203 STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc C:/Perl/site/lib/Bio/DB/WebDBSeqI.pm: 194 STACK: trial2.pl" I am not sure what this means. A similar version of the script works for the Swissprot, GenBank and RefSeq databases but not for the EMBL. What is the way around this so that I can download the embl sequence? (2) Also, is there anyway I can download sequences from DDBJ (database of Japan)? (3) Can GI numbers be used to retreive the sequences? If so then how? Answers to these questions would be greatly appreciated. I am very new to Perl/Bioperl and am not really familiar with the advanced programming features, so I would need to your help to find my way out of this situation. Many Thanks Sandipan From maj at fortinbras.us Thu Nov 19 08:10:07 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 19 Nov 2009 08:10:07 -0500 Subject: [Bioperl-l] accessing EMBL database In-Reply-To: <475F74057A618245A773CD325E105D1E033334AC@phy-srv01.physiol.physiology.wisc.edu> References: <475F74057A618245A773CD325E105D1E033334AC@phy-srv01.physiol.physiology.wisc.edu> Message-ID: Sandipan-- That id (CAB95729) returns "No entries" from EMBL. I would agree that the error message is not really informative. The module documentation warns: # remember that EMBL_ID does not equal GenBank_ID! so I would check that. MAJ ----- Original Message ----- From: "Sandipan Chowdhury" To: Sent: Thursday, November 19, 2009 1:49 AM Subject: [Bioperl-l] accessing EMBL database > Hi, > > I have 3 questions all related to the retreival of sequences from online > databases. > > (1) I have been trying to download a protein sequence from the EMBL database > and trying to write the sequence into a text file, as a string. I am using the > following code: > > use Bio::DB::EMBL; > open b,">","s.txt"; > $em_obj = Bio::DB::EMBL->new; > $seq_obj = $em_obj->get_Seq_by_acc("CAB95729"); > $s_str = $seq_obj->seq; > print b "$s_str\n"; > close b; > > The script is not working and gives the messege: > "MSG: EMBL stream with no ID. Not embl in my book > STACK: Error::throw > STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm: 368 > STACK: Bio::SeqIO::embl::next_seq C:/Perl/site/lib/Bio\SeqIO\embl.pm: 203 > STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc > C:/Perl/site/lib/Bio/DB/WebDBSeqI.pm: 194 > STACK: trial2.pl" > > I am not sure what this means. A similar version of the script works for the > Swissprot, GenBank and RefSeq databases but not for the EMBL. What is the way > around this so that I can download the embl sequence? > > (2) Also, is there anyway I can download sequences from DDBJ (database of > Japan)? > > (3) Can GI numbers be used to retreive the sequences? If so then how? > > Answers to these questions would be greatly appreciated. I am very new to > Perl/Bioperl and am not really familiar with the advanced programming > features, so I would need to your help to find my way out of this situation. > > Many Thanks > Sandipan > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From hrh at fmi.ch Thu Nov 19 08:23:29 2009 From: hrh at fmi.ch (Hotz, Hans-Rudolf) Date: Thu, 19 Nov 2009 14:23:29 +0100 Subject: [Bioperl-l] accessing EMBL database In-Reply-To: <475F74057A618245A773CD325E105D1E033334AC@phy-srv01.physiol.physiology.wisc.edu> Message-ID: Sandipan > I have 3 questions all related to the retreival of sequences from online > databases. > > (1) I have been trying to download a protein sequence from the EMBL database > and trying to write the sequence into a text file, as a string. I am using the > following code: > > use Bio::DB::EMBL; > open b,">","s.txt"; > $em_obj = Bio::DB::EMBL->new; > $seq_obj = $em_obj->get_Seq_by_acc("CAB95729"); > $s_str = $seq_obj->seq; > print b "$s_str\n"; > close b; > > The script is not working and gives the messege: > "MSG: EMBL stream with no ID. Not embl in my book > STACK: Error::throw > STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm: 368 > STACK: Bio::SeqIO::embl::next_seq C:/Perl/site/lib/Bio\SeqIO\embl.pm: 203 > STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc > C:/Perl/site/lib/Bio/DB/WebDBSeqI.pm: 194 > STACK: trial2.pl" > > I am not sure what this means. A similar version of the script works for the > Swissprot, GenBank and RefSeq databases but not for the EMBL. What is the way > around this so that I can download the embl sequence? "CAB95729" is a protein sequence, ie a translation of the CDS of 'AJ277028.1'. As far as I know, Bio::DB::EMBL is only designed to get EMBL entries, ie the nucleotides sequence > (2) Also, is there anyway I can download sequences from DDBJ (database of > Japan)? Unless, for network/speed reason, why do you want to download data from DDBJ? It contains the same data as GenBank and EMBL. Those three databases exchange their data on a daily basis. > (3) Can GI numbers be used to retreive the sequences? If so then how? Have you looked at Bio::DB::Eutilities ? See the 'HOWTOs' page in the Bioperl Wiki Regards, Hans > Answers to these questions would be greatly appreciated. I am very new to > Perl/Bioperl and am not really familiar with the advanced programming > features, so I would need to your help to find my way out of this situation. > > Many Thanks > Sandipan > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu Nov 19 08:47:16 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 19 Nov 2009 07:47:16 -0600 Subject: [Bioperl-l] accessing EMBL database In-Reply-To: References: Message-ID: <95D416ED-7630-40A1-ABA5-A3C3525D25B1@illinois.edu> On Nov 19, 2009, at 7:23 AM, Hotz, Hans-Rudolf wrote: > > Sandipan > > >> I have 3 questions all related to the retreival of sequences from online >> databases. >> >> (1) I have been trying to download a protein sequence from the EMBL database >> and trying to write the sequence into a text file, as a string. I am using the >> following code: >> >> use Bio::DB::EMBL; >> open b,">","s.txt"; >> $em_obj = Bio::DB::EMBL->new; >> $seq_obj = $em_obj->get_Seq_by_acc("CAB95729"); >> $s_str = $seq_obj->seq; >> print b "$s_str\n"; >> close b; >> >> The script is not working and gives the messege: >> "MSG: EMBL stream with no ID. Not embl in my book >> STACK: Error::throw >> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm: 368 >> STACK: Bio::SeqIO::embl::next_seq C:/Perl/site/lib/Bio\SeqIO\embl.pm: 203 >> STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc >> C:/Perl/site/lib/Bio/DB/WebDBSeqI.pm: 194 >> STACK: trial2.pl" >> >> I am not sure what this means. A similar version of the script works for the >> Swissprot, GenBank and RefSeq databases but not for the EMBL. What is the way >> around this so that I can download the embl sequence? > > "CAB95729" is a protein sequence, ie a translation of the CDS of > 'AJ277028.1'. > > As far as I know, Bio::DB::EMBL is only designed to get EMBL entries, ie the > nucleotides sequence > > > >> (2) Also, is there anyway I can download sequences from DDBJ (database of >> Japan)? > > Unless, for network/speed reason, why do you want to download data from > DDBJ? It contains the same data as GenBank and EMBL. Those three databases > exchange their data on a daily basis. > >> (3) Can GI numbers be used to retreive the sequences? If so then how? > > Have you looked at Bio::DB::Eutilities ? See the 'HOWTOs' page in the > Bioperl Wiki > > > > Regards, Hans > > > >> Answers to these questions would be greatly appreciated. I am very new to >> Perl/Bioperl and am not really familiar with the advanced programming >> features, so I would need to your help to find my way out of this situation. >> >> Many Thanks >> Sandipan To add to that, if you want the protein sequences as a Bio::Seq you can use Bio::DB::GenPept (Bio::DB::EUtilities will retrieve raw data only). chris From David.Messina at sbc.su.se Thu Nov 19 09:04:55 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 19 Nov 2009 15:04:55 +0100 Subject: [Bioperl-l] accessing EMBL database In-Reply-To: References: <475F74057A618245A773CD325E105D1E033334AC@phy-srv01.physiol.physiology.wisc.edu> Message-ID: > I would agree that the error message is not really informative. Agreed that it could be better, but I wonder whether part of the problem with BioPerl error messages is the stack dump. I think a lot of eyes just glaze right over when they see a big wad of complicated stuff, with colons and slashes and line numbers, spewing out at them. Perhaps the stack dump should be turned off by default? Wouldn't this: ERROR: EMBL stream with no ID. Not embl in my book Be a lot clearer than this?: MSG: EMBL stream with no ID. Not embl in my book STACK: Error::throw STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm: 368 STACK: Bio::SeqIO::embl::next_seq C:/Perl/site/lib/Bio\SeqIO\embl.pm: 203 STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc C:/Perl/site/lib/Bio/DB/WebDBSeqI.pm: 194 STACK: trial2.pl Just a thought. This has probably been discussed before. Dave From maj at fortinbras.us Thu Nov 19 09:17:05 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 19 Nov 2009 09:17:05 -0500 Subject: [Bioperl-l] accessing EMBL database In-Reply-To: References: <475F74057A618245A773CD325E105D1E033334AC@phy-srv01.physiol.physiology.wisc.edu> Message-ID: I'm inclined to agree. Lots of responses to questions here that begin "Well, as the error message said, you need to check...", which means people tend towards "I broke it! Write the list!". I do find it hairy when my errors are way down in the object tree. ----- Original Message ----- From: "Dave Messina" To: "Mark A. Jensen" Cc: Sent: Thursday, November 19, 2009 9:04 AM Subject: Re: [Bioperl-l] accessing EMBL database > I would agree that the error message is not really informative. Agreed that it could be better, but I wonder whether part of the problem with BioPerl error messages is the stack dump. I think a lot of eyes just glaze right over when they see a big wad of complicated stuff, with colons and slashes and line numbers, spewing out at them. Perhaps the stack dump should be turned off by default? Wouldn't this: ERROR: EMBL stream with no ID. Not embl in my book Be a lot clearer than this?: MSG: EMBL stream with no ID. Not embl in my book STACK: Error::throw STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm: 368 STACK: Bio::SeqIO::embl::next_seq C:/Perl/site/lib/Bio\SeqIO\embl.pm: 203 STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc C:/Perl/site/lib/Bio/DB/WebDBSeqI.pm: 194 STACK: trial2.pl Just a thought. This has probably been discussed before. Dave From rtbio.2009 at gmail.com Thu Nov 19 09:55:27 2009 From: rtbio.2009 at gmail.com (Roopa Raghuveer) Date: Thu, 19 Nov 2009 15:55:27 +0100 Subject: [Bioperl-l] Remote blast Message-ID: Hello everybody, I have a problem. I would like to use remote blast to find sequences matching for an input sequence. Ex:-I would like to search sequences which match Trypanosoma Brucei sequence. I want the output to be only Trypanosoma Brucei sequences matching with my query.When i tried to use remoteblast to nr database,I got sequences from different organisms like E.coli,Pseudomonas etc., Could you please tell me how can this be solved...? My code is as follows. use Bio::Tools::Run::RemoteBlast; use strict; my $prog = 'blastn'; my $db = 'nr'; my $e_val= '1e-10'; my $organism= 'Trypanosoma Brucei'; my @params = ( '-prog' => $prog, '-data' => $db, '-expect' => $e_val, '-readmethod' => 'SearchIO', '-Organism' => $organism ); my $factory = Bio::Tools::Run::RemoteBlast-> new(@params); #change a paramter #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma brucei[ORGN]' #remove a parameter #delete $Bio::Tools::Run::RemoteBlast::HEADER{'FILTER'}; my $v = 1; #$v is just to turn on and off the messages my $str = Bio::SeqIO->new(-file=>'amino.fa' , '-format' => 'fasta' , '-organism' => 'Trypanosoma Brucei' ); while (my $input = $str->next_seq()){ #Blast a sequence against a database: my $r = $factory->submit_blast($input); #my $r = $factory->submit_blast('amino.fa'); print STDERR "waiting..." if( $v > 0 ); while ( my @rids = $factory->each_rid ) { foreach my $rid ( @rids ) { my $rc = $factory->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $factory->remove_rid($rid); } print STDERR "." if ( $v > 0 ); sleep 5; } else { my $result = $rc->next_result(); #save the output my $filename = $result->query_name()."\.out"; $factory->save_output($filename); $factory->remove_rid($rid); print "\nQuery Name: ", $result->query_name(), "\n"; while ( my $hit = $result->next_hit ) { next unless ( $v > 0); print "\thit name is ", $hit->name, "\n"; while( my $hsp = $hit->next_hsp ) { print "\t\tscore is ", $hsp->score, "\n"; } } } } } } My input sequence is >ref|NC_009512.1|:385-1902 GTGTCAGTGGAACTTTGGCAGCAGTGCGTGGAGCTTCTGCGCGATGAACTGCCTGCCCAGCAATTCAACA CCTGGATCCGTCCGCTACAGGTCGAAGCCGAAGGCGACGAGTTGCGCGTCTATGCGCCTAACCGTTTCGT TCTCGATTGGGTCAATGAAAAGTACCTGGGTCGTTTGCTCGAGCTGTTGGGTGAGAACGGTAGCGGCATT GCACCAGCCCTTTCCTTATTAATAGGTAGCCGCCGCAGCTCGGCCCCAAGGGCTGCACCCAACGCGCCGG TCAGCGCTGCCGTTGCGGCTTCGCTGGCGCAGACTCAGGCGCACAAGACGGCCCCGGCAGCAGCGGTTGA ACCCGTTGCCGTGGCCGCGGCCGAGCCTGTATTGGTCGAGACGTCTTCGCGTGACAGCTTTGATGCCATG GCCGAGCCTGCTGCTGCGCCGCCCAGTGGTGGCCGGGCTGAACAGCGCACCGTGCAGGTTGAAGGTGCGC TCAAGCACACCAGTTACCTGAACCGGACCTTTACCTTTGACACCTTCGTCGAAGGTAAGTCGAACCAGCT CGCCCGCGCGGCTGCCTGGCAGGTTGCGGACAACCCTAAGCATGGCTACAACCCACTGTTCCTTTATGGC GGTGTGGGTTTGGGTAAAACCCACCTTATGCATGCTGTGGGTAACCATCTGCTGAAGAAGAATCCGAACG CCAAGGTGGTGTACCTGCATTCGGAGCGCTTCGTCGCGGACATGGTCAAAGCGTTGCAACTCAACGCCAT CAACGAATTCAAGCGCTTCTACCGCTCGGTGGACGCGTTGCTGATCGACGATATCCAGTTCTTCGCTCGC AAAGAGCGCTCGCAAGAAGAGTTTTTCCACACCTTCAACGCCTTGCTTGAGGGTGGCCAGCAGGTAATCC TTACCTCTGACCGCTATCCCAAGGAAATCGAAGGCCTGGAAGAGCGTCTGAAGTCGCGCTTTGGTTGGGG CCTGACGGTGGCTGTCGAGCCGCCAGAGCTGGAGACCCGCGTAGCGATCCTGATGAAGAAGGCCGACCAG GCCAAAGTCGAGCTCCCGCATGACGCAGCCTTTTTCATCGCTCAGCGCATCCGGTCCAACGTCCGTGAGC TGGAAGGTGCACTGAAGCGAGTTATTGCTCACTCGCACTTCATGGGGCGTGACATCACCATCGAGCTGAT TCGTGAATCGCTCAAGGATCTGTTGGCGCTGCAAGACAAACTGGTCAGTGTGGATAACATTCAGCGTACC GTCGCTGAGTACTACAAGATCAAGATCTCCGATCTGTTGTCCAAGCGTCGTTCGCGTTCTGTCGCGCGCC CGCGTCAGGTAGCCATGGCCCTGTCCAAGGAGTTGACCAACCACAGTCTGCCGGAAATCGGCGACATGTT CGGTGGTCGCGACCATACGACCGTGCTGCACGCCTGCCGCAAAATCAATGAACTGAAGGAATCCGACGCG GACATCCGCGAGGACTACAAGAACCTGCTGCGGACGCTGACGACCTGA Please mail me regarding any queries. Regards, Roopa. From cjfields at illinois.edu Thu Nov 19 10:30:34 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 19 Nov 2009 09:30:34 -0600 Subject: [Bioperl-l] verbosity and error stack, was accessing EMBL database In-Reply-To: References: <475F74057A618245A773CD325E105D1E033334AC@phy-srv01.physiol.physiology.wisc.edu> Message-ID: Mark, Dave, This could be based on verbose(). Level w t d st verbose < 0 - + - -/+ verbose 0 + + - -/+ verbose 1 + + + +/+ verbose > 1 +* -> + + +/+ * converts to throw() w = warn t = throw d = debug st = stack trace warn() is set up that way now, you don't get a stack trace unless verbose() is > 0. throw() could be the same; would be a simple fix, really. My only problem with the current state of things is (I think we've delved down this path before) verbosity level is tied to exception strictness as seen above, and they're really two separate concepts, at least to me. Verbosity of 1 or more doesn't necessarily mean I want an elevated level of strictness along with it. For instance, one might want very strict exceptions w/o the noise, or (conversely) lots of debugging output but no warnings. (aside: another small nit, but I haven't exactly liked that the global level of strictness is designated by a env. variable with DEBUG in the name, but that's just me). I've been thinking it would be nice to have simple separate verbose/strict switches (this is the way it's implemented in Biome). This would allow some finer grained control over output: Level d st verbose 0 - - verbose 1 + + Default = BIOPERLDEBUG || 0 # current situation Level w t strict -1 - + strict 0 + + strict 1 +* -> + * converts to throw() Default = BIOPERLSTRICT || 0 We could even allow finer-grained control of verbosity (states which cover all combinations) w/o affecting strictness. chris On Nov 19, 2009, at 8:17 AM, Mark A. Jensen wrote: > I'm inclined to agree. Lots of responses to questions here that begin > "Well, as the error message said, you need to check...", which means > people tend towards "I broke it! Write the list!". I do find it hairy when > my errors are way down in the object tree. > ----- Original Message ----- From: "Dave Messina" > To: "Mark A. Jensen" > Cc: > Sent: Thursday, November 19, 2009 9:04 AM > Subject: Re: [Bioperl-l] accessing EMBL database > > >> I would agree that the error message is not really informative. > > Agreed that it could be better, but I wonder whether part of the problem with BioPerl error messages is the stack dump. > > I think a lot of eyes just glaze right over when they see a big wad of complicated stuff, with colons and slashes and line numbers, spewing out at them. > > Perhaps the stack dump should be turned off by default? > > Wouldn't this: > > ERROR: EMBL stream with no ID. Not embl in my book > > > > Be a lot clearer than this?: > > MSG: EMBL stream with no ID. Not embl in my book > STACK: Error::throw > STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm: 368 > STACK: Bio::SeqIO::embl::next_seq C:/Perl/site/lib/Bio\SeqIO\embl.pm: 203 > STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc C:/Perl/site/lib/Bio/DB/WebDBSeqI.pm: 194 > STACK: trial2.pl > > > > Just a thought. This has probably been discussed before. > Dave > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From roy.chaudhuri at gmail.com Thu Nov 19 11:10:28 2009 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Thu, 19 Nov 2009 16:10:28 +0000 Subject: [Bioperl-l] Remote blast In-Reply-To: References: Message-ID: <4B056DF4.2030502@gmail.com> Hi Roopa, I think that the -Organism parameter that you specify for Bio::Tools::Run::RemoteBlast is ignored - I can't find any reference to it in the documentation: http://search.cpan.org/~cjfields/BioPerl-1.6.1/Bio/Tools/Run/RemoteBlast.pm You have the correct approach in your code - limiting the search to the Entrez query "Trypanosoma brucei[ORGN]", but the line is commented out. If you uncomment the line (and add a semicolon afterwards), the program runs correctly, but no hits are reported below your threshold e-value. If you change the value of $e_val to 10 then some T.brucei hits are reported. Roy. Roopa Raghuveer wrote: > Hello everybody, > > I have a problem. I would like to use remote blast to find sequences > matching for an input sequence. > > Ex:-I would like to search sequences which match Trypanosoma Brucei > sequence. > > I want the output to be only Trypanosoma Brucei sequences matching with my > query.When i tried to use remoteblast to nr database,I got sequences from > different organisms like E.coli,Pseudomonas etc., > > Could you please tell me how can this be solved...? > > My code is as follows. > > use Bio::Tools::Run::RemoteBlast; > use strict; > my $prog = 'blastn'; > my $db = 'nr'; > my $e_val= '1e-10'; > my $organism= 'Trypanosoma Brucei'; > > my @params = ( '-prog' => $prog, > '-data' => $db, > '-expect' => $e_val, > '-readmethod' => 'SearchIO', > '-Organism' => $organism ); > > my $factory = Bio::Tools::Run::RemoteBlast-> > new(@params); > > #change a paramter > #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma > brucei[ORGN]' > > #remove a parameter > #delete $Bio::Tools::Run::RemoteBlast::HEADER{'FILTER'}; > > my $v = 1; > #$v is just to turn on and off the messages > > my $str = Bio::SeqIO->new(-file=>'amino.fa' , '-format' => 'fasta' , > '-organism' => 'Trypanosoma Brucei' ); > > while (my $input = $str->next_seq()){ > #Blast a sequence against a database: > my $r = $factory->submit_blast($input); > #my $r = $factory->submit_blast('amino.fa'); > > print STDERR "waiting..." if( $v > 0 ); > while ( my @rids = $factory->each_rid ) { > foreach my $rid ( @rids ) { > my $rc = $factory->retrieve_blast($rid); > if( !ref($rc) ) { > if( $rc < 0 ) { > $factory->remove_rid($rid); > } > print STDERR "." if ( $v > 0 ); > sleep 5; > } > else { > my $result = $rc->next_result(); > #save the output > my $filename = $result->query_name()."\.out"; > $factory->save_output($filename); > $factory->remove_rid($rid); > print "\nQuery Name: ", $result->query_name(), "\n"; > while ( my $hit = $result->next_hit ) { > next unless ( $v > 0); > print "\thit name is ", $hit->name, "\n"; > while( my $hsp = $hit->next_hsp ) { > print "\t\tscore is ", $hsp->score, "\n"; > } > } > } > } > } > } > > My input sequence is > >> ref|NC_009512.1|:385-1902 > GTGTCAGTGGAACTTTGGCAGCAGTGCGTGGAGCTTCTGCGCGATGAACTGCCTGCCCAGCAATTCAACA > CCTGGATCCGTCCGCTACAGGTCGAAGCCGAAGGCGACGAGTTGCGCGTCTATGCGCCTAACCGTTTCGT > TCTCGATTGGGTCAATGAAAAGTACCTGGGTCGTTTGCTCGAGCTGTTGGGTGAGAACGGTAGCGGCATT > GCACCAGCCCTTTCCTTATTAATAGGTAGCCGCCGCAGCTCGGCCCCAAGGGCTGCACCCAACGCGCCGG > TCAGCGCTGCCGTTGCGGCTTCGCTGGCGCAGACTCAGGCGCACAAGACGGCCCCGGCAGCAGCGGTTGA > ACCCGTTGCCGTGGCCGCGGCCGAGCCTGTATTGGTCGAGACGTCTTCGCGTGACAGCTTTGATGCCATG > GCCGAGCCTGCTGCTGCGCCGCCCAGTGGTGGCCGGGCTGAACAGCGCACCGTGCAGGTTGAAGGTGCGC > TCAAGCACACCAGTTACCTGAACCGGACCTTTACCTTTGACACCTTCGTCGAAGGTAAGTCGAACCAGCT > CGCCCGCGCGGCTGCCTGGCAGGTTGCGGACAACCCTAAGCATGGCTACAACCCACTGTTCCTTTATGGC > GGTGTGGGTTTGGGTAAAACCCACCTTATGCATGCTGTGGGTAACCATCTGCTGAAGAAGAATCCGAACG > CCAAGGTGGTGTACCTGCATTCGGAGCGCTTCGTCGCGGACATGGTCAAAGCGTTGCAACTCAACGCCAT > CAACGAATTCAAGCGCTTCTACCGCTCGGTGGACGCGTTGCTGATCGACGATATCCAGTTCTTCGCTCGC > AAAGAGCGCTCGCAAGAAGAGTTTTTCCACACCTTCAACGCCTTGCTTGAGGGTGGCCAGCAGGTAATCC > TTACCTCTGACCGCTATCCCAAGGAAATCGAAGGCCTGGAAGAGCGTCTGAAGTCGCGCTTTGGTTGGGG > CCTGACGGTGGCTGTCGAGCCGCCAGAGCTGGAGACCCGCGTAGCGATCCTGATGAAGAAGGCCGACCAG > GCCAAAGTCGAGCTCCCGCATGACGCAGCCTTTTTCATCGCTCAGCGCATCCGGTCCAACGTCCGTGAGC > TGGAAGGTGCACTGAAGCGAGTTATTGCTCACTCGCACTTCATGGGGCGTGACATCACCATCGAGCTGAT > TCGTGAATCGCTCAAGGATCTGTTGGCGCTGCAAGACAAACTGGTCAGTGTGGATAACATTCAGCGTACC > GTCGCTGAGTACTACAAGATCAAGATCTCCGATCTGTTGTCCAAGCGTCGTTCGCGTTCTGTCGCGCGCC > CGCGTCAGGTAGCCATGGCCCTGTCCAAGGAGTTGACCAACCACAGTCTGCCGGAAATCGGCGACATGTT > CGGTGGTCGCGACCATACGACCGTGCTGCACGCCTGCCGCAAAATCAATGAACTGAAGGAATCCGACGCG > GACATCCGCGAGGACTACAAGAACCTGCTGCGGACGCTGACGACCTGA > > Please mail me regarding any queries. > > Regards, > Roopa. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From clements at nescent.org Thu Nov 19 12:46:32 2009 From: clements at nescent.org (Dave Clements) Date: Thu, 19 Nov 2009 18:46:32 +0100 Subject: [Bioperl-l] how to visualize multiple sequences alignments In-Reply-To: References: Message-ID: Hi Xiaoyu, I would also take a look at GBrowse_syn, a perl based solution built with the GBrowse genome browser framework. See http://gmod.org/wiki/GBrowse_syn. Cheers, Dave C. On Wed, Nov 18, 2009 at 7:23 PM, Jason Stajich wrote: > try jalview http://www.jalview.org/ > > > On Nov 18, 2009, at 9:18 AM, Xiaoyu Liang wrote: > > Hi, >> >> I'm wondering Is there any modules that can be used for visualizing >> multiple >> sequences alignments? like the result from ClustalW? >> >> Thank you very much, >> Xiaoyu >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- http://gmod.org/wiki/GMOD_News http://gmod.org/wiki/January_2010_GMOD_Meeting From maj at fortinbras.us Thu Nov 19 18:37:05 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 19 Nov 2009 18:37:05 -0500 Subject: [Bioperl-l] verbosity and error stack, was accessing EMBL database In-Reply-To: References: <475F74057A618245A773CD325E105D1E033334AC@phy-srv01.physiol.physiology.wisc.edu> Message-ID: I like this verbose/strict separability a lot. Should we go for it? ----- Original Message ----- From: "Chris Fields" To: "Mark A. Jensen" Cc: Sent: Thursday, November 19, 2009 10:30 AM Subject: [Bioperl-l] verbosity and error stack, was accessing EMBL database > Mark, Dave, > > This could be based on verbose(). > > Level w t d st > verbose < 0 - + - -/+ > verbose 0 + + - -/+ > verbose 1 + + + +/+ > verbose > 1 +* -> + + +/+ > * converts to throw() > w = warn > t = throw > d = debug > st = stack trace > > warn() is set up that way now, you don't get a stack trace unless verbose() is > > 0. throw() could be the same; would be a simple fix, really. > > My only problem with the current state of things is (I think we've delved down > this path before) verbosity level is tied to exception strictness as seen > above, and they're really two separate concepts, at least to me. Verbosity of > 1 or more doesn't necessarily mean I want an elevated level of strictness > along with it. For instance, one might want very strict exceptions w/o the > noise, or (conversely) lots of debugging output but no warnings. > > (aside: another small nit, but I haven't exactly liked that the global level > of strictness is designated by a env. variable with DEBUG in the name, but > that's just me). > > I've been thinking it would be nice to have simple separate verbose/strict > switches (this is the way it's implemented in Biome). This would allow some > finer grained control over output: > > Level d st > verbose 0 - - > verbose 1 + + > Default = BIOPERLDEBUG || 0 # current situation > > Level w t > strict -1 - + > strict 0 + + > strict 1 +* -> + > * converts to throw() > Default = BIOPERLSTRICT || 0 > > We could even allow finer-grained control of verbosity (states which cover all > combinations) w/o affecting strictness. > > chris > > On Nov 19, 2009, at 8:17 AM, Mark A. Jensen wrote: > >> I'm inclined to agree. Lots of responses to questions here that begin >> "Well, as the error message said, you need to check...", which means >> people tend towards "I broke it! Write the list!". I do find it hairy when >> my errors are way down in the object tree. >> ----- Original Message ----- From: "Dave Messina" >> To: "Mark A. Jensen" >> Cc: >> Sent: Thursday, November 19, 2009 9:04 AM >> Subject: Re: [Bioperl-l] accessing EMBL database >> >> >>> I would agree that the error message is not really informative. >> >> Agreed that it could be better, but I wonder whether part of the problem with >> BioPerl error messages is the stack dump. >> >> I think a lot of eyes just glaze right over when they see a big wad of >> complicated stuff, with colons and slashes and line numbers, spewing out at >> them. >> >> Perhaps the stack dump should be turned off by default? >> >> Wouldn't this: >> >> ERROR: EMBL stream with no ID. Not embl in my book >> >> >> >> Be a lot clearer than this?: >> >> MSG: EMBL stream with no ID. Not embl in my book >> STACK: Error::throw >> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm: 368 >> STACK: Bio::SeqIO::embl::next_seq C:/Perl/site/lib/Bio\SeqIO\embl.pm: 203 >> STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc >> C:/Perl/site/lib/Bio/DB/WebDBSeqI.pm: 194 >> STACK: trial2.pl >> >> >> >> Just a thought. This has probably been discussed before. >> Dave >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From michael.watson at bbsrc.ac.uk Fri Nov 20 05:07:10 2009 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Fri, 20 Nov 2009 10:07:10 +0000 Subject: [Bioperl-l] strand in Bio::SearchIO when parsing fasta35 output In-Reply-To: <8D08960C647E64438CE5740657CBBDC501487319B6@iahcexch1.iah.bbsrc.ac.uk> References: <8D08960C647E64438CE5740657CBBDC501487319AE@iahcexch1.iah.bbsrc.ac.uk> <9994F70B-AE92-4425-9AAC-E9A2DC26964E@bioperl.org> <8D08960C647E64438CE5740657CBBDC501487319B6@iahcexch1.iah.bbsrc.ac.uk> Message-ID: <8D08960C647E64438CE5740657CBBDC50148731CEB@iahcexch1.iah.bbsrc.ac.uk> Hello I was just wondering if anyone had had time to look into this? I posted a bug: http://bugzilla.open-bio.org/show_bug.cgi?id=2937 Thanks Mick -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of michael watson (IAH-C) Sent: 27 October 2009 09:01 To: 'Jason Stajich' Cc: bioperl-l Subject: Re: [Bioperl-l] strand in Bio::SearchIO when parsing fasta35 output Hi Jason They both print 0 also. A bug report it is Mick -----Original Message----- From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of Jason Stajich Sent: 26 October 2009 18:46 To: michael watson (IAH-C) Cc: bioperl-l Subject: Re: [Bioperl-l] strand in Bio::SearchIO when parsing fasta35 output Is this -m9 -d 0 output or standard default? I think the strand is parsed in the HSP parsing. Can you double check what $hsp->query->strand and $hsp->hit->strand prints? A full example report as a bug request will be next step if that doesn't resolve. -jason On Oct 26, 2009, at 10:04 AM, michael watson (IAH-C) wrote: > Dear all > > Where does this go? Perhaps I am doing something wrong. > > Fasta35 output puts the strand in the hit list at the top: > > cluster_99033:3 ( 23) [r] 115 37.9 > 0.0011 > cluster_79238:1 ( 27) [f] 126 38.0 0.00097 0.963 > 0.963 27 > > The [r] stands for reverse and the [f] stands for forward. > > There is also the text "rev-comp" after the hit line further down. > > However, when I parse fasta35 output using SearchIO and output the > strand of the HSP: > > print $hsp->strand('hit'), ","; > print $hsp->strand('query'), "\n"; > > This simply prints out 0, 0 (I assume 0 is the default in BioPerl > for "I don't know which strand it's on"). > > So the information is there, but it's not getting parsed. > Alternatively, I've missed something and will feel a bit foolish. > > Currently using BioPerl 1.6.0 > > Thanks > Mick > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Fri Nov 20 05:15:11 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 20 Nov 2009 11:15:11 +0100 Subject: [Bioperl-l] verbosity and error stack, was accessing EMBL database In-Reply-To: References: <475F74057A618245A773CD325E105D1E033334AC@phy-srv01.physiol.physiology.wisc.edu> Message-ID: <3277368F-615A-4DD3-B9B3-5D32A5EEEE98@sbc.su.se> Chris, I took a look at how you implemented this in Biome -- very nice! > I like this verbose/strict separability a lot. Should we go for it? Me too. So yes, I think so. > We could even allow finer-grained control of verbosity (states which cover all combinations) w/o affecting strictness. Perhaps this is a job for Log::Log4Perl or Log::Dispatch? http://search.cpan.org/~mschilli/Log-Log4perl-1.25/lib/Log/Log4perl.pm http://search.cpan.org/~drolsky/Log-Dispatch-2.26/lib/Log/Dispatch.pm That might be overkill, though. Dave From roychu at gmail.com Fri Nov 20 05:21:54 2009 From: roychu at gmail.com (Chu, Roy) Date: Fri, 20 Nov 2009 02:21:54 -0800 Subject: [Bioperl-l] Installing Bio-perl on dreamhost via CPAN Message-ID: <4d7f3e450911200221m5f39ace2hb979712115fb9d78@mail.gmail.com> Hi, Does anyone use dreamhost as a web hosting service? I'm just curious if anyone has had any luck installing the module as their daemon seems to kill my process whenever I try to install it. Dreamhost tech support attributes it to either exceeding the allocated memory cache or exceeding the processing time. I tried to nice the process, but that didn't help for me. Any luck or experience in resolving this would be much appreciated. I suppose my next attempt would be to try installing it directly and hope I don't need root... Thanks, Roy From s.denaxas at gmail.com Fri Nov 20 05:27:42 2009 From: s.denaxas at gmail.com (Spiros Denaxas) Date: Fri, 20 Nov 2009 11:27:42 +0100 Subject: [Bioperl-l] Installing Bio-perl on dreamhost via CPAN In-Reply-To: <4d7f3e450911200221m5f39ace2hb979712115fb9d78@mail.gmail.com> References: <4d7f3e450911200221m5f39ace2hb979712115fb9d78@mail.gmail.com> Message-ID: Hello, normally you don't need to be root - http://sial.org/howto/perl/life-with-cpan/non-root/ Kind of disturbing that their tech support cannot give you a straight answer on what they are killing the process. Good luck Spiros On Fri, Nov 20, 2009 at 11:21 AM, Chu, Roy wrote: > ?I suppose my next attempt would be to try > installing it directly and hope I don't need root... > > Thanks, > Roy > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From charles-listes+bioperl at plessy.org Fri Nov 20 05:44:45 2009 From: charles-listes+bioperl at plessy.org (Charles Plessy) Date: Fri, 20 Nov 2009 19:44:45 +0900 Subject: [Bioperl-l] Installing Bio-perl on dreamhost via CPAN In-Reply-To: <4d7f3e450911200221m5f39ace2hb979712115fb9d78@mail.gmail.com> References: <4d7f3e450911200221m5f39ace2hb979712115fb9d78@mail.gmail.com> Message-ID: <20091120104445.GG31318@kunpuu.plessy.org> Le Fri, Nov 20, 2009 at 02:21:54AM -0800, Chu, Roy a ?crit : > > Does anyone use dreamhost as a web hosting service? I'm just curious > if anyone has had any luck installing the module as their daemon seems > to kill my process whenever I try to install it. Dreamhost tech > support attributes it to either exceeding the allocated memory cache > or exceeding the processing time. I tried to nice the process, but > that didn't help for me. Any luck or experience in resolving this > would be much appreciated. I suppose my next attempt would be to try > installing it directly and hope I don't need root... Dear Roy, DreamHost uses Debian, so you can suggest them to install the Debian package. If you are in contact with the tech service, do not hesitate to tell them to contact me if they are interested by a backport of the 1.6.0 package. For version 1.6.1, it may be more difficult as it depends on perl 5.10.1. PS: if you propse to install BioPerl as a feature in the Dreamhost panel, I will vote for it :) Have a nice day, -- Charles Plessy Debian Med packaging team, http://www.debian.org/devel/debian-med Tsurumi, Kanagawa, Japan From cjfields at illinois.edu Fri Nov 20 07:51:39 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 20 Nov 2009 06:51:39 -0600 Subject: [Bioperl-l] strand in Bio::SearchIO when parsing fasta35 output In-Reply-To: <8D08960C647E64438CE5740657CBBDC50148731CEB@iahcexch1.iah.bbsrc.ac.uk> References: <8D08960C647E64438CE5740657CBBDC501487319AE@iahcexch1.iah.bbsrc.ac.uk> <9994F70B-AE92-4425-9AAC-E9A2DC26964E@bioperl.org> <8D08960C647E64438CE5740657CBBDC501487319B6@iahcexch1.iah.bbsrc.ac.uk> <8D08960C647E64438CE5740657CBBDC50148731CEB@iahcexch1.iah.bbsrc.ac.uk> Message-ID: Mick, Short answer, no. It was in the queue to be fixed at some point in 1.6.x, but that queue is quite long. I'm pushing it into the queue specifically for 1.6.2, so it should be addressed soon. chris On Nov 20, 2009, at 4:07 AM, michael watson (IAH-C) wrote: > Hello > > I was just wondering if anyone had had time to look into this? > > I posted a bug: http://bugzilla.open-bio.org/show_bug.cgi?id=2937 > > Thanks > Mick > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of michael watson (IAH-C) > Sent: 27 October 2009 09:01 > To: 'Jason Stajich' > Cc: bioperl-l > Subject: Re: [Bioperl-l] strand in Bio::SearchIO when parsing fasta35 output > > Hi Jason > > They both print 0 also. > > A bug report it is > > Mick > > -----Original Message----- > From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of Jason Stajich > Sent: 26 October 2009 18:46 > To: michael watson (IAH-C) > Cc: bioperl-l > Subject: Re: [Bioperl-l] strand in Bio::SearchIO when parsing fasta35 output > > > Is this -m9 -d 0 output or standard default? I think the strand is > parsed in the HSP parsing. > > Can you double check what $hsp->query->strand and $hsp->hit->strand > prints? > > A full example report as a bug request will be next step if that > doesn't resolve. > > -jason > On Oct 26, 2009, at 10:04 AM, michael watson (IAH-C) wrote: > >> Dear all >> >> Where does this go? Perhaps I am doing something wrong. >> >> Fasta35 output puts the strand in the hit list at the top: >> >> cluster_99033:3 ( 23) [r] 115 37.9 >> 0.0011 >> cluster_79238:1 ( 27) [f] 126 38.0 0.00097 0.963 >> 0.963 27 >> >> The [r] stands for reverse and the [f] stands for forward. >> >> There is also the text "rev-comp" after the hit line further down. >> >> However, when I parse fasta35 output using SearchIO and output the >> strand of the HSP: >> >> print $hsp->strand('hit'), ","; >> print $hsp->strand('query'), "\n"; >> >> This simply prints out 0, 0 (I assume 0 is the default in BioPerl >> for "I don't know which strand it's on"). >> >> So the information is there, but it's not getting parsed. >> Alternatively, I've missed something and will feel a bit foolish. >> >> Currently using BioPerl 1.6.0 >> >> Thanks >> Mick >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Fri Nov 20 08:00:45 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 20 Nov 2009 07:00:45 -0600 Subject: [Bioperl-l] Installing Bio-perl on dreamhost via CPAN In-Reply-To: <20091120104445.GG31318@kunpuu.plessy.org> References: <4d7f3e450911200221m5f39ace2hb979712115fb9d78@mail.gmail.com> <20091120104445.GG31318@kunpuu.plessy.org> Message-ID: On Nov 20, 2009, at 4:44 AM, Charles Plessy wrote: > Le Fri, Nov 20, 2009 at 02:21:54AM -0800, Chu, Roy a ?crit : >> >> Does anyone use dreamhost as a web hosting service? I'm just curious >> if anyone has had any luck installing the module as their daemon seems >> to kill my process whenever I try to install it. Dreamhost tech >> support attributes it to either exceeding the allocated memory cache >> or exceeding the processing time. I tried to nice the process, but >> that didn't help for me. Any luck or experience in resolving this >> would be much appreciated. I suppose my next attempt would be to try >> installing it directly and hope I don't need root... > > Dear Roy, > > DreamHost uses Debian, so you can suggest them to install the Debian package. > If you are in contact with the tech service, do not hesitate to tell them to > contact me if they are interested by a backport of the 1.6.0 package. For > version 1.6.1, it may be more difficult as it depends on perl 5.10.1. Any reason why this is so? We specify compatibility back to 5.6.1. Alex mentioned the reliance on the specific Extutils::Manifest version. The version requested has an important bug fix, is present on CPAN, and is backwards-compatible to 5.6.1. It should be fairly easy to request that as a separate package. A strict requirement for perl 5.10.1 doesn't make sense in that light, unless said perl maintainer can enlighten us as to why this is an issue? This one may require a ranty blog post. > PS: if you propse to install BioPerl as a feature in the Dreamhost panel, I > will vote for it :) > > Have a nice day, > > -- > Charles Plessy > Debian Med packaging team, > http://www.debian.org/devel/debian-med > Tsurumi, Kanagawa, Japan chris From rtbio.2009 at gmail.com Fri Nov 20 10:52:09 2009 From: rtbio.2009 at gmail.com (Roopa Raghuveer) Date: Fri, 20 Nov 2009 16:52:09 +0100 Subject: [Bioperl-l] Remote blast In-Reply-To: References: <4B056DF4.2030502@gmail.com> Message-ID: Hello everybody, I have tried to use Remote blast on Trypanasoma brucei sequences and could get certain hits.But I am unable to retrieve the complete sequence from where I got hits. i.e., I am unable to parse the blast output file for getting the complete sequences of the hits. Here is my code. #!/usr/bin/perl -w use Bio::SearchIO; my $blast_report = new Bio::SearchIO ('-format' => 'blast', '-file' => $ARGV[0]); my $result = $blast_report->next_result; my $level = $ARGV[1]; while( my $hit = $result->next_hit) { print $hit->name; push(@arr1,$hit->name); while( my $hsp = $hit->next_hsp()) { if ($hsp->frac_identical() >= $level) { #print $hsp->hit_string, "\n"; push(@arr,$hsp->hit_string); } } } $k=@arr1; for($i=0;$i<$k;$i++){ push(@arr2,split(/|/,$arr1[$i])); #print "$arr[$i]\n"; } #$t=@arr2; Here,I am trying to use the blast output file and get the complete sequence where I found a hit but I could not get the complete sequence. i/p:- Last login: Mon Nov 16 11:57:22 on console Welcome to Darwin! lmbicip-mac1:~ cip$ ssh admin at 141.84.66.66 The authenticity of host '141.84.66.66 (141.84.66.66)' can't be established. RSA key fingerprint is 2d:4a:09:1d:2e:f3:51:c7:ba:8b:29:37:36:f6:44:db. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added '141.84.66.66' (RSA) to the list of known hosts. Password: Last login: Fri Nov 20 13:52:57 2009 from 10.153.189.239 Have a lot of fun... admin at BosLinux:~> clear admin at BosLinux:~> cd Documents/ admin at BosLinux:~/Documents> clear admin at BosLinux:~/Documents> vim blast.pl admin at BosLinux:~/Documents> clear admin at BosLinux:~/Documents> vim nnn.pl admin at BosLinux:~/Documents> vim other.pl admin at BosLinux:~/Documents> vim amino.fa admin at BosLinux:~/Documents> vim Tb09.211.2410.out admin at BosLinux:~/Documents> vim Tb09.211.2410.out |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 661 TTTGATGAAACCCCAATTCGGACGTATGAAAAGATTCTTGCGGGCCGGCTTAAATTCCCC 720 Query 721 AATTGGTTTGATGAGCGTGCGCGGGATCTCGTAAAGGGTTTATTGCAAACGGATCACACG 780 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 721 AATTGGTTTGATGAGCGTGCGCGGGATCTCGTAAAGGGTTTATTGCAAACGGATCACACG 780 Query 781 AAACGGTTGGGCACGCTGAAGGATGGCGTAGCTGATGTGAAGAATCACCCATTCTTCCGT 840 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 781 AAACGGTTGGGCACGCTGAAGGATGGCGTAGCTGATGTGAAGAATCACCCATTCTTCCGT 840 Query 841 GGTGCGAATTGGGAGAAACTCTATGGACGTCATTATAACGCCCCCATTGCCGTAAAAGTG 900 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 841 GGTGCGAATTGGGAGAAACTCTATGGACGTCATTATAACGCCCCCATTGCCGTAAAAGTG 900 Query 901 AAGAGCCCCGGCGACACAAGTAACTTTGAGTCGTATCCCGAGAGTGGAGATAAGGGTTCT 960 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 901 AAGAGCCCCGGCGACACAAGTAACTTTGAGTCGTATCCCGAGAGTGGAGATAAGGGTTCT 960 Query 961 CCTCCACTAACCCCTTCGCAACAGGTTGCATTCCGTGGTTTTTAG 1005 ||||||||||||||||||||||||||||||||||||||||||||| Sbjct 961 CCTCCACTAACCCCTTCGCAACAGGTTGCATTCCGTGGTTTTTAG 1005 >ref|XM_822286.1| Trypanosoma brucei TREU927 protein kinase A catalytic subunit isoform 2 (Tb09.211.2360) partial mRNA Length=1011 Score = 1622 bits (1798), Expect = 0.0 Identities = 944/974 (96%), Gaps = 0/974 (0%) Strand=Plus/Plus Query 32 TGTTTACCAAGCCTGACACATCGGGATGGAAGCTGAGTGACTTTGAAATGGGTGACACGC 91 |||||||||| ||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 38 TGTTTACCAAACCTGACACATCGGGATGGAAGCTGAGTGACTTTGAAATGGGTGACACGC 97 Query 92 TAGGGACCGGCTCGTTTGGTCGCGTGCGCATTGCAAAACTGAAGAGCAGGGGGGAGTATT 151 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 98 TAGGGACCGGCTCGTTTGGTCGCGTGCGCATTGCAAAACTGAAGAGCAGGGGGGAGTATT 157 Query 152 ATGCAATAAAATGTCTAAAGAAGCATGAGATACTAAAGATGAAGCAGGTACAACACCTGA 211 |||||||||||||||||||||||| ||||||||||||||||||||||||||||||||||| Sbjct 158 ATGCAATAAAATGTCTAAAGAAGCGTGAGATACTAAAGATGAAGCAGGTACAACACCTGA 217 Query 212 ACCAAGAGAAGCAAATTCTAATGGAGTTGTCACACCCCTTCATTGTGAACATGATGTGTT 271 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 218 ACCAAGAGAAGCAAATTCTAATGGAGTTGTCACACCCCTTCATTGTGAACATGATGTGTT 277 uery 272 CCTTCCAGGATGAGAACCGCGTCTACTTTGTTCTAGAATTTGTGGTAGGTGGTGAGGTAT 331 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 278 CCTTCCAGGATGAGAACCGCGTCTACTTTGTTCTAGAATTTGTGGTAGGTGGTGAGGTAT 337 Query 332 TTACTCACCTTCGTTCCGCAGGCCGTTTCCCGAATGACGTAGCGAAGTTCTATCATGCGG 391 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 338 TTACTCACCTTCGTTCCGCAGGCCGTTTCCCGAATGACGTAGCGAAGTTCTATCATGCGG 397 Query 392 AGCTTGTGTTGGCCTTTGAATATTTACACTCGAAGGACATTATCTACCGTGACTTGAAAC 451 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 398 AGCTTGTGTTGGCCTTTGAATATTTACACTCGAAGGACATTATCTACCGTGACTTGAAAC 457 Query 452 CTGAGAATCTGCTACTTGATGGGAAGGGACACGTCAAGGTGACTGATTTTGGTTTTGCTA 511 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 458 CTGAGAATCTGCTACTTGATGGGAAGGGACACGTCAAGGTGACTGATTTTGGTTTTGCTA 517 Query 512 AGAAGGTGACGGATCGTACCTATACGTTATGTGGGACACCTGAGTATCTTGCACCTGAGG 571 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 518 AGAAGGTGACGGATCGTACCTATACGTTATGTGGGACACCTGAGTATCTTGCACCTGAGG 577 Query 572 TAATTCAGAGCAAAGGACATGGGAAGGCTGTGGATTGGTGGACGATGGGTGTTTTGCTGT 631 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| It follows like this. The output I got is ATGACGACAACTCCCACTGGTGATGGCCAACTGTTTACCAAGCCTGACACATCGGGATGGAAGCTGAGTGACTTTGAAATGGGTGACACGCTAGGGACCGGCTCGTTTGGTCGCGTGCGCATTGCAAAACTGAAGAGCAGGGGGGAGTATTATGCAATAAAATGTCTAAAGAAGCATGAGATACTAAAGATGAAGCAGGTACAACACCTGAACCAAGAGAAGCAAATTCTAATGGAGTTGTCACACCCCTTCATTGTGAACATGATGTGTTCCTTCCAGGATGAGAACCGCGTCTACTTTGTTCTAGAATTTGTGGTAGGTGGTGAGGTATTTACTCACCTTCGTTCCGCAGGCCGTTTCCCGAATGACGTAGCGAAGTTCTATCATGCGGAGCTTGTGTTGGCCTTTGAATATTTACACTCGAAGGACATTATCTACCGTGACTTGAAACCTGAGAATCTGCTACTTGATGGGAAGGGACACGTCAAGGTGACTGATTTTGGTTTTGCTAAGAAGGTGACGGATCGTACCTATACGTTATGTGGGACACCTGAGTATCTTGCACCTGAGGTAATTCAGAGCAAAGGACATGGGAAGGCTGTGGATTGGTGGACGATGGGTGTTTTGCTGTATGAATTCATAGCTGGCCATCCTCCCTTTTTTGATGAAACCCCAATTCGGACGTATGAAAAGATTCTTGCGGGCCGGCTTAAATTCCCCAATTGGTTTGATGAGCGTGCGCGGGATCTCGTAAAGGGTTTATTGCAAACGGATCACACGAAACGGTTGGGCACGCTGAAGGATGGCGTAGCTGATGTGAAGAATCACCCATTCTTCCGTGGTGCGAATTGGGAGAAACTCTATGGACGTCATTATAACGCCCCCATTGCCGTAAAAGTGAAGAGCCCCGGCGACACAAGTAACTTTGAGTCGTATCCCGAGAGTGGAGATAAGGGTTCTCCTCCACTAACCCCTTCGCAACAGG TTGCATTCCGTGGTTTTTAG TGTTTACCAAACCTGACACATCGGGATGGAAGCTGAGTGACTTTGAAATGGGTGACACGCTAGGGACCGGCTCGTTTGGTCGCGTGCGCATTGCAAAACTGAAGAGCAGGGGGGAGTATTATGCAATAAAATGTCTAAAGAAGCGTGAGATACTAAAGATGAAGCAGGTACAACACCTGAACCAAGAGAAGCAAATTCTAATGGAGTTGTCACACCCCTTCATTGTGAACATGATGTGTTCCTTCCAGGATGAGAACCGCGTCTACTTTGTTCTAGAATTTGTGGTAGGTGGTGAGGTATTTACTCACCTTCGTTCCGCAGGCCGTTTCCCGAATGACGTAGCGAAGTTCTATCATGCGGAGCTTGTGTTGGCCTTTGAATATTTACACTCGAAGGACATTATCTACCGTGACTTGAAACCTGAGAATCTGCTACTTGATGGGAAGGGACACGTCAAGGTGACTGATTTTGGTTTTGCTAAGAAGGTGACGGATCGTACCTATACGTTATGTGGGACACCTGAGTATCTTGCACCTGAGGTAATTCAGAGCAAAGGACATGGGAAGGCTGTGGATTGGTGGACGATGGGTGTTTTGCTGTATGAATTCATAGCTGGCCATCCTCCCTTTTTTGATGAAACCCCAATTCGGACGTATGAAAAGATTCTTGCGGGCCGGTTCAAATTCCCCAATTGGTTTGACTCCCGTGCGCGGGATCTCGTAAAGGGTTTATTGCAAACGGATCACACGAAACGGTTGGGCACGCTGAAGGATGGCGTAGCTGATGTGAAGAATCACCCATTCTTCCGTGGTGCGAATTGGGAGAAACTCTATGGACGTCATTATCACGCTCCCATTCCTGTAAAAGTGAAGAGCCCCGGCGACACAAGTAACTTTGAGTCGTATCCCGAGAGTGGGGATAAGCGGTTGCCCCCGTTAGCACCATCACAACAATTGGAGTTCCGTGGGTTTTAG GGATGATGACCGATTGTACCTCCTCCTCGAGTATGTGGTGGGTGGCGAGCTGT TCTCCCACCTCCGGAAGGCGGGAAAATTCCCTAATGATGTAGCCAAGTTCTACTCCGCAGAAGTGGTTTTGGCGTTTGAATATATTCATGAGTGCGGCATCGTATACCGTGACTTGAAGCCAGAAAATGTGCTTTTGGACAAGCAGGGAAACATTAAGATTACGGACTTTGGGTTCGCGAAACGCGTTAGGGACAGAACGTACACGCTATGTGGGACTCCAGAGTATCTTGCGCCGGAGATAATCCAAAGTAAAGGTCACGATCGGGCTGTGGATTGGTGGACACTCGGAATTCTTCTCTATGAGATGCTTGTCGGTTATCCTCCTTTTTTCGACGAGAGTCCTTTTAGAACATACGAAAAAATTTTAGAGGGGAAACTTCAGTTTCCAAAGTGGGTGGAGATGCGGGCGAAGGACCTCATAAAGAGTTTTTTAACAATTGAACCAACGAAACG i.e.,It is only giving the region where it could find the best alignment i.e., the best hit ones. I want the complete sequence i.e., sequences corresponding to the accession numbers XM_822292.1 XM_822286.1 XM_822694.1 Database used in Remote blast was RefSeq i.e.,(refseq_rna),organism used :Trypanasoma brucei. Can any one please help me in solving this problem Regards, Roopa. On Fri, Nov 20, 2009 at 12:30 PM, Roopa Raghuveer wrote: > > Hello Roy, > > Thanks a lot for your reply.My code is working for my sequence now. > > Thanks alot. > > Regards, > Roopa. > > On Thu, Nov 19, 2009 at 5:10 PM, Roy Chaudhuri wrote: > >> Hi Roopa, >> >> I think that the -Organism parameter that you specify for >> Bio::Tools::Run::RemoteBlast is ignored - I can't find any reference to it >> in the documentation: >> >> http://search.cpan.org/~cjfields/BioPerl-1.6.1/Bio/Tools/Run/RemoteBlast.pm >> >> You have the correct approach in your code - limiting the search to the >> Entrez query "Trypanosoma brucei[ORGN]", but the line is commented out. If >> you uncomment the line (and add a semicolon afterwards), the program runs >> correctly, but no hits are reported below your threshold e-value. If you >> change the value of $e_val to 10 then some T.brucei hits are reported. >> >> Roy. >> >> Roopa Raghuveer wrote: >> >>> Hello everybody, >>> >>> I have a problem. I would like to use remote blast to find sequences >>> matching for an input sequence. >>> >>> Ex:-I would like to search sequences which match Trypanosoma Brucei >>> sequence. >>> >>> I want the output to be only Trypanosoma Brucei sequences matching with >>> my >>> query.When i tried to use remoteblast to nr database,I got sequences from >>> different organisms like E.coli,Pseudomonas etc., >>> >>> Could you please tell me how can this be solved...? >>> >>> My code is as follows. >>> >>> use Bio::Tools::Run::RemoteBlast; >>> use strict; >>> my $prog = 'blastn'; >>> my $db = 'nr'; >>> my $e_val= '1e-10'; >>> my $organism= 'Trypanosoma Brucei'; >>> >>> my @params = ( '-prog' => $prog, >>> '-data' => $db, >>> '-expect' => $e_val, >>> '-readmethod' => 'SearchIO', >>> '-Organism' => $organism ); >>> >>> my $factory = Bio::Tools::Run::RemoteBlast-> >>> new(@params); >>> >>> #change a paramter >>> #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma >>> brucei[ORGN]' >>> >>> #remove a parameter >>> #delete $Bio::Tools::Run::RemoteBlast::HEADER{'FILTER'}; >>> >>> my $v = 1; >>> #$v is just to turn on and off the messages >>> >>> my $str = Bio::SeqIO->new(-file=>'amino.fa' , '-format' => 'fasta' , >>> '-organism' => 'Trypanosoma Brucei' ); >>> >>> while (my $input = $str->next_seq()){ >>> #Blast a sequence against a database: >>> my $r = $factory->submit_blast($input); >>> #my $r = $factory->submit_blast('amino.fa'); >>> >>> print STDERR "waiting..." if( $v > 0 ); >>> while ( my @rids = $factory->each_rid ) { >>> foreach my $rid ( @rids ) { >>> my $rc = $factory->retrieve_blast($rid); >>> if( !ref($rc) ) { >>> if( $rc < 0 ) { >>> $factory->remove_rid($rid); >>> } >>> print STDERR "." if ( $v > 0 ); >>> sleep 5; >>> } >>> else { >>> my $result = $rc->next_result(); >>> #save the output >>> my $filename = $result->query_name()."\.out"; >>> $factory->save_output($filename); >>> $factory->remove_rid($rid); >>> print "\nQuery Name: ", $result->query_name(), "\n"; >>> while ( my $hit = $result->next_hit ) { >>> next unless ( $v > 0); >>> print "\thit name is ", $hit->name, "\n"; >>> while( my $hsp = $hit->next_hsp ) { >>> print "\t\tscore is ", $hsp->score, "\n"; >>> } >>> } >>> } >>> } >>> } >>> } >>> >>> My input sequence is >>> >>> ref|NC_009512.1|:385-1902 >>>> >>> GTGTCAGTGGAACTTTGGCAGCAGTGCGTGGAGCTTCTGCGCGATGAACTGCCTGCCCAGCAATTCAACA >>> CCTGGATCCGTCCGCTACAGGTCGAAGCCGAAGGCGACGAGTTGCGCGTCTATGCGCCTAACCGTTTCGT >>> TCTCGATTGGGTCAATGAAAAGTACCTGGGTCGTTTGCTCGAGCTGTTGGGTGAGAACGGTAGCGGCATT >>> GCACCAGCCCTTTCCTTATTAATAGGTAGCCGCCGCAGCTCGGCCCCAAGGGCTGCACCCAACGCGCCGG >>> TCAGCGCTGCCGTTGCGGCTTCGCTGGCGCAGACTCAGGCGCACAAGACGGCCCCGGCAGCAGCGGTTGA >>> ACCCGTTGCCGTGGCCGCGGCCGAGCCTGTATTGGTCGAGACGTCTTCGCGTGACAGCTTTGATGCCATG >>> GCCGAGCCTGCTGCTGCGCCGCCCAGTGGTGGCCGGGCTGAACAGCGCACCGTGCAGGTTGAAGGTGCGC >>> TCAAGCACACCAGTTACCTGAACCGGACCTTTACCTTTGACACCTTCGTCGAAGGTAAGTCGAACCAGCT >>> CGCCCGCGCGGCTGCCTGGCAGGTTGCGGACAACCCTAAGCATGGCTACAACCCACTGTTCCTTTATGGC >>> GGTGTGGGTTTGGGTAAAACCCACCTTATGCATGCTGTGGGTAACCATCTGCTGAAGAAGAATCCGAACG >>> CCAAGGTGGTGTACCTGCATTCGGAGCGCTTCGTCGCGGACATGGTCAAAGCGTTGCAACTCAACGCCAT >>> CAACGAATTCAAGCGCTTCTACCGCTCGGTGGACGCGTTGCTGATCGACGATATCCAGTTCTTCGCTCGC >>> AAAGAGCGCTCGCAAGAAGAGTTTTTCCACACCTTCAACGCCTTGCTTGAGGGTGGCCAGCAGGTAATCC >>> TTACCTCTGACCGCTATCCCAAGGAAATCGAAGGCCTGGAAGAGCGTCTGAAGTCGCGCTTTGGTTGGGG >>> CCTGACGGTGGCTGTCGAGCCGCCAGAGCTGGAGACCCGCGTAGCGATCCTGATGAAGAAGGCCGACCAG >>> GCCAAAGTCGAGCTCCCGCATGACGCAGCCTTTTTCATCGCTCAGCGCATCCGGTCCAACGTCCGTGAGC >>> TGGAAGGTGCACTGAAGCGAGTTATTGCTCACTCGCACTTCATGGGGCGTGACATCACCATCGAGCTGAT >>> TCGTGAATCGCTCAAGGATCTGTTGGCGCTGCAAGACAAACTGGTCAGTGTGGATAACATTCAGCGTACC >>> GTCGCTGAGTACTACAAGATCAAGATCTCCGATCTGTTGTCCAAGCGTCGTTCGCGTTCTGTCGCGCGCC >>> CGCGTCAGGTAGCCATGGCCCTGTCCAAGGAGTTGACCAACCACAGTCTGCCGGAAATCGGCGACATGTT >>> CGGTGGTCGCGACCATACGACCGTGCTGCACGCCTGCCGCAAAATCAATGAACTGAAGGAATCCGACGCG >>> GACATCCGCGAGGACTACAAGAACCTGCTGCGGACGCTGACGACCTGA >>> >>> Please mail me regarding any queries. >>> >>> Regards, >>> Roopa. >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> > From mauricio at open-bio.org Fri Nov 20 11:15:22 2009 From: mauricio at open-bio.org (Mauricio Herrera Cuadra) Date: Fri, 20 Nov 2009 10:15:22 -0600 Subject: [Bioperl-l] DANGER: hacking of bioperl wiki? In-Reply-To: <7761C2223DB54DE6B836F302D2FF6AC0@NewLife> References: <7761C2223DB54DE6B836F302D2FF6AC0@NewLife> Message-ID: <4B06C09A.8060708@open-bio.org> All OBF wikis and blogs have been upgraded and cleaned from the hack. Thanks for the heads up! Mauricio. Mark A. Jensen wrote: > Andrew-- thanks!! We're on it. > MAJ > ----- Original Message ----- From: "Andrew Grimm" > > To: > Sent: Wednesday, November 18, 2009 9:52 PM > Subject: [Bioperl-l] DANGER: hacking of bioperl wiki? > > >> Caution: read the whole email before visiting the bioperl wiki >> >> I was doing some bioinformatics-related searching using google, and >> one of the hits was to the bio dot perl dot org wiki (the FAQ in >> particular). >> >> When I did that, I was redirected to a ferdax dot com web site (a >> typo-squatting of fedex?). >> >> Some people reckon that ferdax hacks web sites and redirects google >> hits from the victim web site to their own web site. For example, this >> thread at google's webmaster central >> http://www.google.com/support/forum/p/Webmasters/thread?tid=37a36c0d1ea99819&hl=en#all >> >> (it's talking about zencart, but presumably they've since found other >> victims) >> >> Just going to the website without using google may not trigger the >> redirect. >> >> Apologies if this is a false alarm, but I don't think it is. >> >> I won't be in contact between Friday and Monday Australian time (I'll >> be at railscamp 6 in Melbourne), so I won't be able to answer any >> replies. >> >> Thanks, >> >> Andrew Grimm >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From David.Messina at sbc.su.se Fri Nov 20 11:39:53 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 20 Nov 2009 17:39:53 +0100 Subject: [Bioperl-l] Remote blast In-Reply-To: References: <4B056DF4.2030502@gmail.com> Message-ID: <7ECF627D-3DBF-4575-89CF-FA6348C88E8E@sbc.su.se> Hi Roopa, As far as I know, a BLAST report never contains the complete sequences of the hits. If it includes any part of the hit's sequence, it will be the part that matches the query. You'll have to use the hit's ID or accession to get its complete sequence from somewhere else. You can use Bio::DB::Genbank to do that, for example. See http://www.bioperl.org/wiki/HOWTO:Beginners#Retrieving_a_sequence_from_a_database Dave From alessandra.bilardi at gmail.com Fri Nov 20 12:44:18 2009 From: alessandra.bilardi at gmail.com (Alessandra) Date: Fri, 20 Nov 2009 18:44:18 +0100 Subject: [Bioperl-l] Bio::DB::EUtilities question Message-ID: Hi all, I'm testing Bio::DB::EUtilities - webagent which interacts with and retrieves data from NCBI's eUtils. My perl script works but it works only if I request less than ~450 times get_Response function.. else I have got this error message: ------------- EXCEPTION ------------- MSG: Response Error Can't connect to eutils.ncbi.nlm.nih.gov:80 (connect: No route to host) STACK Bio::DB::GenericWebAgent::get_Response /usr/local/share/perl/5.10.0/Bio/DB/GenericWebAgent.pm:215 STACK toplevel ./wget4gbk.pl:77 ------------------------------------- wget4gbk.pl lines 76-77 are: my $req = Bio::DB::EUtilities->new(-db => 'genome', -eutil => 'esummary', -retmode => $mode, -rettype => $type, -id => $id); my $entry = $req->get_Response; I run perl script more ten times and this error arrives random time at the range 300-600 requests. If I use another system to request data, then I can to do ~ 10000 requests, without errors. Had I to set EUtilities object with particular parameters? Can you help me about random exception error? Best, -- Alessandra Bilardi, Ph. D. ---- CRIBI, University of Padova, Italy http://www.linkedin.com/in/bilardi ---- From maj at fortinbras.us Fri Nov 20 13:42:38 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 20 Nov 2009 13:42:38 -0500 Subject: [Bioperl-l] gravatars on the wiki Message-ID: <94431678F3764E8C9A49EA4D2FCD0DBD@NewLife> Hi all, You can now reveal your Gravatar (http://www.gravatar.com) on the wiki, by including the following markup on the page: {{#gravatar|youremail -at- yourplace -dot- tld}} You can do the antispam measure above, or use a regular email. Invalid emails throw an error. http://bioperl.org/wiki/Gravatars Happy coding, MAJ From roychu at gmail.com Fri Nov 20 15:23:21 2009 From: roychu at gmail.com (Chu, Roy) Date: Fri, 20 Nov 2009 12:23:21 -0800 Subject: [Bioperl-l] Installing Bio-perl on dreamhost via CPAN In-Reply-To: References: <4d7f3e450911200221m5f39ace2hb979712115fb9d78@mail.gmail.com> <20091120104445.GG31318@kunpuu.plessy.org> Message-ID: <4d7f3e450911201223w59cb308q5af7690a28697966@mail.gmail.com> "sounds very much like you process was killed for prolonged execution time, or memory usage. We have a daemon in place that monitors for processes that take up too much of a shared web server's resources, and this may have kicked in (and often does when trying to install packages on a shared server)." This was the explanation they had. Regarding asking their admins to install, it seems is a "they'll try to get to it but don't hold your breath situation." Hmmm, I tried some other attempts, installing 1.4.0 posed no problems. I'm not a perl guru, so I tried to increase the build cache size from the default, 10 MB, hoping that that may be the problem--can't imagine how though, since I can't imagine how big the whole package version can differ by (though honestly, I haven't checked). Whenever I try to install 1.6.1, it runs into a problem I guess after the 'make' step and lists the modules--BioPerl-1.6.0/t/Variation/SeqDiff.t BioPerl-1.6.0/t/Variation/SNP.t BioPerl-1.6.0/t/Variation/Variation_IO.t --and typically gets killed here '> Killed' Next, I tried 1.6.0, then I get this: "(I think you ran Build.PL directly, so will use CPAN to install prerequisites on demand) CPAN: Storable loaded ok (v2.12) Going to read '/home/$username/.cpan/Metadata' Killed" (everything prior works and it seems to get further along than when I try to install 1.6.1) Any insight into why this may be happening would be appreciated. Something EQUALLY appreciated would be a recommendation of a decent enough hosting service where someone has had success installing Bio-Perl. I'd try to set up my Mac web sharing feature and then try to setup the stuff locally, but I haven't yet been able to successfully get the port forwarding feature working properly on the apple airport extreme--perplexing. Next, I might just try to install via the Build.pl script. Hmm, checking the wiki, it seems I'll still be able to run remote blast and use the basic seq modules, although some discrepancies and idiosyncrasies may be expected? Any head-ups about any false assumptions by me would be greatly appreciated. Thanks in advance, Roy On Fri, Nov 20, 2009 at 5:00 AM, Chris Fields wrote: > > On Nov 20, 2009, at 4:44 AM, Charles Plessy wrote: > >> Le Fri, Nov 20, 2009 at 02:21:54AM -0800, Chu, Roy a ?crit : >>> >>> Does anyone use dreamhost as a web hosting service? ?I'm just curious >>> if anyone has had any luck installing the module as their daemon seems >>> to kill my process whenever I try to install it. ?Dreamhost tech >>> support attributes it to either exceeding the allocated memory cache >>> or exceeding the processing time. ?I tried to nice the process, but >>> that didn't help for me. ?Any luck or experience in resolving this >>> would be much appreciated. ?I suppose my next attempt would be to try >>> installing it directly and hope I don't need root... >> >> Dear Roy, >> >> DreamHost uses Debian, so you can suggest them to install the Debian package. >> If you are in contact with the tech service, do not hesitate to tell them to >> contact me if they are interested by a backport of the 1.6.0 package. For >> version 1.6.1, it may be more difficult as it depends on perl 5.10.1. > > Any reason why this is so? ?We specify compatibility back to 5.6.1. > > Alex mentioned the reliance on the specific Extutils::Manifest version. ?The version requested has an important bug fix, is present on CPAN, and is backwards-compatible to 5.6.1. ?It should be fairly easy to request that as a separate package. > > A strict requirement for perl 5.10.1 doesn't make sense in that light, unless said perl maintainer can enlighten us as to why this is an issue? ?This one may require a ranty blog post. > >> PS: if you propse to install BioPerl as a feature in the Dreamhost panel, I >> will vote for it :) >> >> Have a nice day, >> >> -- >> Charles Plessy >> Debian Med packaging team, >> http://www.debian.org/devel/debian-med >> Tsurumi, Kanagawa, Japan > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Fri Nov 20 15:40:24 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 20 Nov 2009 14:40:24 -0600 Subject: [Bioperl-l] Installing Bio-perl on dreamhost via CPAN In-Reply-To: <4d7f3e450911201223w59cb308q5af7690a28697966@mail.gmail.com> References: <4d7f3e450911200221m5f39ace2hb979712115fb9d78@mail.gmail.com> <20091120104445.GG31318@kunpuu.plessy.org> <4d7f3e450911201223w59cb308q5af7690a28697966@mail.gmail.com> Message-ID: <1D1B0987-3309-4281-BCE0-2737E4F0D0B1@illinois.edu> BioPerl is pure perl. If you believe all dependencies are installed, just unpack the dist to a specific directory and point PERL5LIB at it (for bash): export PERL5LIB=/home/USER/bioperl/bioperl-live Note that if you plan on doing the same for other bioperl-related modules (ex: bioperl-db) you'll need to add 'lib' to it, as they use a generic Module::Build now. export PERL5LIB=/home/USER/bioperl/bioperl-db/lib You can also add a 'use lib' directive in your scripts as well. More at the following link: http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix#USING_MODULES_NOT_INSTALLED_IN_THE_STANDARD_LOCATION chris On Nov 20, 2009, at 2:23 PM, Chu, Roy wrote: > "sounds very much like you process was killed for prolonged execution > time, or memory usage. We have a daemon in place that monitors for > processes that take up too much of a shared web server's resources, and > this may have kicked in (and often does when trying to install packages > on a shared server)." > > This was the explanation they had. Regarding asking their admins to > install, it seems is a "they'll try to get to it but don't hold your > breath situation." > > Hmmm, I tried some other attempts, installing 1.4.0 posed no problems. > I'm not a perl guru, so I tried to increase the build cache size from > the default, 10 MB, hoping that that may be the problem--can't imagine > how though, since I can't imagine how big the whole package version > can differ by (though honestly, I haven't checked). > Whenever I try to install 1.6.1, it runs into a problem I guess after > the 'make' step and lists the > modules--BioPerl-1.6.0/t/Variation/SeqDiff.t > BioPerl-1.6.0/t/Variation/SNP.t > BioPerl-1.6.0/t/Variation/Variation_IO.t > --and typically gets killed here '> Killed' > > Next, I tried 1.6.0, then I get this: > "(I think you ran Build.PL directly, so will use CPAN to install > prerequisites on demand) > CPAN: Storable loaded ok (v2.12) > Going to read '/home/$username/.cpan/Metadata' > Killed" (everything prior works and it seems to get further along than > when I try to install 1.6.1) > > Any insight into why this may be happening would be appreciated. > Something EQUALLY appreciated would be a recommendation of a decent > enough hosting service where someone has had success installing > Bio-Perl. I'd try to set up my Mac web sharing feature and then try > to setup the stuff locally, but I haven't yet been able to > successfully get the port forwarding feature working properly on the > apple airport extreme--perplexing. Next, I might just try to install > via the Build.pl script. > > Hmm, checking the wiki, it seems I'll still be able to run remote > blast and use the basic seq modules, although some discrepancies and > idiosyncrasies may be expected? Any head-ups about any false > assumptions by me would be greatly appreciated. > > Thanks in advance, > Roy > > On Fri, Nov 20, 2009 at 5:00 AM, Chris Fields wrote: >> >> On Nov 20, 2009, at 4:44 AM, Charles Plessy wrote: >> >>> Le Fri, Nov 20, 2009 at 02:21:54AM -0800, Chu, Roy a ?crit : >>>> >>>> Does anyone use dreamhost as a web hosting service? I'm just curious >>>> if anyone has had any luck installing the module as their daemon seems >>>> to kill my process whenever I try to install it. Dreamhost tech >>>> support attributes it to either exceeding the allocated memory cache >>>> or exceeding the processing time. I tried to nice the process, but >>>> that didn't help for me. Any luck or experience in resolving this >>>> would be much appreciated. I suppose my next attempt would be to try >>>> installing it directly and hope I don't need root... >>> >>> Dear Roy, >>> >>> DreamHost uses Debian, so you can suggest them to install the Debian package. >>> If you are in contact with the tech service, do not hesitate to tell them to >>> contact me if they are interested by a backport of the 1.6.0 package. For >>> version 1.6.1, it may be more difficult as it depends on perl 5.10.1. >> >> Any reason why this is so? We specify compatibility back to 5.6.1. >> >> Alex mentioned the reliance on the specific Extutils::Manifest version. The version requested has an important bug fix, is present on CPAN, and is backwards-compatible to 5.6.1. It should be fairly easy to request that as a separate package. >> >> A strict requirement for perl 5.10.1 doesn't make sense in that light, unless said perl maintainer can enlighten us as to why this is an issue? This one may require a ranty blog post. >> >>> PS: if you propse to install BioPerl as a feature in the Dreamhost panel, I >>> will vote for it :) >>> >>> Have a nice day, >>> >>> -- >>> Charles Plessy >>> Debian Med packaging team, >>> http://www.debian.org/devel/debian-med >>> Tsurumi, Kanagawa, Japan >> >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From charles-listes+bioperl at plessy.org Fri Nov 20 20:07:23 2009 From: charles-listes+bioperl at plessy.org (Charles Plessy) Date: Sat, 21 Nov 2009 10:07:23 +0900 Subject: [Bioperl-l] Installing Bio-perl on dreamhost via CPAN In-Reply-To: References: <4d7f3e450911200221m5f39ace2hb979712115fb9d78@mail.gmail.com> <20091120104445.GG31318@kunpuu.plessy.org> Message-ID: <20091121010723.GA7786@kunpuu.plessy.org> Le Fri, Nov 20, 2009 at 07:00:45AM -0600, Chris Fields a ?crit : > On Nov 20, 2009, at 4:44 AM, Charles Plessy wrote: > > > > DreamHost uses Debian, so you can suggest them to install the Debian > > package. If you are in contact with the tech service, do not hesitate to > > tell them to contact me if they are interested by a backport of the 1.6.0 > > package. For version 1.6.1, it may be more difficult as it depends on perl > > 5.10.1. > > Any reason why this is so? We specify compatibility back to 5.6.1. Dear Chris, you make a good point: although for building we need to either depend on perl 5.10.1 or package separately Extutils::Manifest, the resulting bioperl package does not depend on such a high version. Therefore, there is no need for a backport, and the latest Debian package can be installed on Debian stable (5.0/Lenny) system. I just checked the Dreamhost machine on which I happen to have an acces, ?waratahs?, and it seems to be older, but nevertheless it may be worth asking the admins anyway (with the big drawback that they would have to be asked for each update). Have a nice week-end, -- Charles Plessy Debian Med packaging team, http://www.debian.org/devel/debian-med Tsurumi, Kanagawa, Japan From robert.bradbury at gmail.com Fri Nov 20 20:40:14 2009 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Fri, 20 Nov 2009 20:40:14 -0500 Subject: [Bioperl-l] Excessive CPU use by various Bioperl sites Message-ID: I run a Linux system which is in a gradual process of evolution from the default Linux browsers (Galeon, Epiphany, etc.) through Firefox (better) to Google's Chromium (IMO, perhaps the best so far). Chromium allows one to create a process per tab/URL so one can effectively track what it is doing. It also allows one to track the machine usage of these processes (through the Developer > Task manager [shift-escape keyboard] option) which though expensive in terms of overhead allows one to track offending windows (in terms of memory or CPU use). My processor recently jumped from a typical 700 MHz to 1.4 GHz speed (using the Linux Ondemand scheduler - which saves ~20 W at the wall outlet -- I've measured it) to the full tilt 2.8 GHz the CPU is capable of. Looking at the chrome task manager I was not surprised to find the NY Times high on the list (they are pushing content, esp. using Javascript) but much to my dismay the Jalview and Howto:Trees:Bioperl appeared to be high on the list. Now I am forced to ask myself *why* sites which are simply distributing static information are eating up CPU on my machine! This is a fundamental flaw in the architecture of the sites -- wherein there should be conscious efforts to minimize user-CPU use (or avoid Javascript entirely). This would not be a problem if I were using Firefox as I can easily use NoScript to block Javacscript from non-approved sites. But it raises the question of when one should allow Javascript to run (one would "normally" approve academic sites by default) when even the academic sites are abusing my CPU. There needs to be much greater awareness both on the part of software distributors and software consumers that it is *MY* CPU and *MY* Electricty and *MY* contribution to global warming. And the developers/distributors should not be sucking down those resources without first saying "May I?" and I have the option of saying "No you may not." There is enough we can do productively (running low homology blast searches) without engaging in endless wheel spinning of Javascripts or looped GIFs. Robert From maj at fortinbras.us Fri Nov 20 23:17:12 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 20 Nov 2009 23:17:12 -0500 Subject: [Bioperl-l] ohlohers Message-ID: You can now add your Ohloh widgets and increase your carbon footprint with the less crufty: {{#ohloh|acct_id|TYPE}} where TYPE is [Detailed|Rank|Tiny]. Taint checks aplenty. MAJ From maj at fortinbras.us Fri Nov 20 23:33:02 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 20 Nov 2009 23:33:02 -0500 Subject: [Bioperl-l] Installing Bio-perl on dreamhost via CPAN In-Reply-To: <4d7f3e450911201223w59cb308q5af7690a28697966@mail.gmail.com> References: <4d7f3e450911200221m5f39ace2hb979712115fb9d78@mail.gmail.com><20091120104445.GG31318@kunpuu.plessy.org> <4d7f3e450911201223w59cb308q5af7690a28697966@mail.gmail.com> Message-ID: <9ECC66C2F23F47469AF0F07E3F9307FC@NewLife> Maybe 'nightmarehost' is more appropriate. I've had no problems on AWS, but this may not exactly what you need. MAJ ----- Original Message ----- From: "Chu, Roy" To: Sent: Friday, November 20, 2009 3:23 PM Subject: Re: [Bioperl-l] Installing Bio-perl on dreamhost via CPAN "sounds very much like you process was killed for prolonged execution time, or memory usage. We have a daemon in place that monitors for processes that take up too much of a shared web server's resources, and this may have kicked in (and often does when trying to install packages on a shared server)." This was the explanation they had. Regarding asking their admins to install, it seems is a "they'll try to get to it but don't hold your breath situation." Hmmm, I tried some other attempts, installing 1.4.0 posed no problems. I'm not a perl guru, so I tried to increase the build cache size from the default, 10 MB, hoping that that may be the problem--can't imagine how though, since I can't imagine how big the whole package version can differ by (though honestly, I haven't checked). Whenever I try to install 1.6.1, it runs into a problem I guess after the 'make' step and lists the modules--BioPerl-1.6.0/t/Variation/SeqDiff.t BioPerl-1.6.0/t/Variation/SNP.t BioPerl-1.6.0/t/Variation/Variation_IO.t --and typically gets killed here '> Killed' Next, I tried 1.6.0, then I get this: "(I think you ran Build.PL directly, so will use CPAN to install prerequisites on demand) CPAN: Storable loaded ok (v2.12) Going to read '/home/$username/.cpan/Metadata' Killed" (everything prior works and it seems to get further along than when I try to install 1.6.1) Any insight into why this may be happening would be appreciated. Something EQUALLY appreciated would be a recommendation of a decent enough hosting service where someone has had success installing Bio-Perl. I'd try to set up my Mac web sharing feature and then try to setup the stuff locally, but I haven't yet been able to successfully get the port forwarding feature working properly on the apple airport extreme--perplexing. Next, I might just try to install via the Build.pl script. Hmm, checking the wiki, it seems I'll still be able to run remote blast and use the basic seq modules, although some discrepancies and idiosyncrasies may be expected? Any head-ups about any false assumptions by me would be greatly appreciated. Thanks in advance, Roy On Fri, Nov 20, 2009 at 5:00 AM, Chris Fields wrote: > > On Nov 20, 2009, at 4:44 AM, Charles Plessy wrote: > >> Le Fri, Nov 20, 2009 at 02:21:54AM -0800, Chu, Roy a ?crit : >>> >>> Does anyone use dreamhost as a web hosting service? I'm just curious >>> if anyone has had any luck installing the module as their daemon seems >>> to kill my process whenever I try to install it. Dreamhost tech >>> support attributes it to either exceeding the allocated memory cache >>> or exceeding the processing time. I tried to nice the process, but >>> that didn't help for me. Any luck or experience in resolving this >>> would be much appreciated. I suppose my next attempt would be to try >>> installing it directly and hope I don't need root... >> >> Dear Roy, >> >> DreamHost uses Debian, so you can suggest them to install the Debian package. >> If you are in contact with the tech service, do not hesitate to tell them to >> contact me if they are interested by a backport of the 1.6.0 package. For >> version 1.6.1, it may be more difficult as it depends on perl 5.10.1. > > Any reason why this is so? We specify compatibility back to 5.6.1. > > Alex mentioned the reliance on the specific Extutils::Manifest version. The > version requested has an important bug fix, is present on CPAN, and is > backwards-compatible to 5.6.1. It should be fairly easy to request that as a > separate package. > > A strict requirement for perl 5.10.1 doesn't make sense in that light, unless > said perl maintainer can enlighten us as to why this is an issue? This one may > require a ranty blog post. > >> PS: if you propse to install BioPerl as a feature in the Dreamhost panel, I >> will vote for it :) >> >> Have a nice day, >> >> -- >> Charles Plessy >> Debian Med packaging team, >> http://www.debian.org/devel/debian-med >> Tsurumi, Kanagawa, Japan > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Fri Nov 20 23:38:23 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 20 Nov 2009 22:38:23 -0600 Subject: [Bioperl-l] Excessive CPU use by various Bioperl sites In-Reply-To: References: Message-ID: <8163BC62-3F3E-4936-AAA9-61A4FB307C99@illinois.edu> Robert, Not sure why you're seeing that, but the HOWTO (and, AFAIK, the wiki in general) do not use JS, unless there is a specific addition I'm unaware of. Now, the site wiki was recently 'parasited' for redirects, which may be the culprit, but this is now fixed. Can you at least retest to see if this persists? Anyone else know about this? chris On Nov 20, 2009, at 7:40 PM, Robert Bradbury wrote: > I run a Linux system which is in a gradual process of evolution from the > default Linux browsers (Galeon, Epiphany, etc.) through Firefox (better) to > Google's Chromium (IMO, perhaps the best so far). Chromium allows one to > create a process per tab/URL so one can effectively track what it is doing. > It also allows one to track the machine usage of these processes (through > the Developer > Task manager [shift-escape keyboard] option) which though > expensive in terms of overhead allows one to track offending windows (in > terms of memory or CPU use). My processor recently jumped from a typical > 700 MHz to 1.4 GHz speed (using the Linux Ondemand scheduler - which saves > ~20 W at the wall outlet -- I've measured it) to the full tilt 2.8 GHz the > CPU is capable of. Looking at the chrome task manager I was not surprised > to find the NY Times high on the list (they are pushing content, esp. using > Javascript) but much to my dismay the Jalview and Howto:Trees:Bioperl > appeared to be high on the list. Now I am forced to ask myself *why* sites > which are simply distributing static information are eating up CPU on my > machine! This is a fundamental flaw in the architecture of the sites -- > wherein there should be conscious efforts to minimize user-CPU use (or avoid > Javascript entirely). This would not be a problem if I were using Firefox > as I can easily use NoScript to block Javacscript from non-approved sites. > But it raises the question of when one should allow Javascript to run (one > would "normally" approve academic sites by default) when even the academic > sites are abusing my CPU. There needs to be much greater awareness both on > the part of software distributors and software consumers that it is *MY* CPU > and *MY* Electricty and *MY* contribution to global warming. And the > developers/distributors should not be sucking down those resources without > first saying "May I?" and I have the option of saying "No you may not." > There is enough we can do productively (running low homology blast > searches) without engaging in endless wheel spinning of Javascripts or > looped GIFs. > > Robert > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From sdavis2 at mail.nih.gov Sat Nov 21 00:11:34 2009 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Fri, 20 Nov 2009 21:11:34 -0800 Subject: [Bioperl-l] Excessive CPU use by various Bioperl sites In-Reply-To: <8163BC62-3F3E-4936-AAA9-61A4FB307C99@illinois.edu> References: <8163BC62-3F3E-4936-AAA9-61A4FB307C99@illinois.edu> Message-ID: <264855a00911202111u4b1f1020r4aa6e0e9b0ea61@mail.gmail.com> On Fri, Nov 20, 2009 at 8:38 PM, Chris Fields wrote: > Robert, > > Not sure why you're seeing that, but the HOWTO (and, AFAIK, the wiki in > general) do not use JS, unless there is a specific addition I'm unaware of. > Now, the site wiki was recently 'parasited' for redirects, which may be the > culprit, but this is now fixed. Can you at least retest to see if this > persists? > > Anyone else know about this? > > The page in question does include javascript, it appears from the source. This is a function of using mediawiki, though, I believe and not something specific to that page. Sean From cjfields at illinois.edu Sat Nov 21 00:20:37 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 20 Nov 2009 23:20:37 -0600 Subject: [Bioperl-l] Excessive CPU use by various Bioperl sites In-Reply-To: <264855a00911202111u4b1f1020r4aa6e0e9b0ea61@mail.gmail.com> References: <8163BC62-3F3E-4936-AAA9-61A4FB307C99@illinois.edu> <264855a00911202111u4b1f1020r4aa6e0e9b0ea61@mail.gmail.com> Message-ID: On Nov 20, 2009, at 11:11 PM, Sean Davis wrote: > On Fri, Nov 20, 2009 at 8:38 PM, Chris Fields wrote: > >> Robert, >> >> Not sure why you're seeing that, but the HOWTO (and, AFAIK, the wiki in >> general) do not use JS, unless there is a specific addition I'm unaware of. >> Now, the site wiki was recently 'parasited' for redirects, which may be the >> culprit, but this is now fixed. Can you at least retest to see if this >> persists? >> >> Anyone else know about this? >> >> > The page in question does include javascript, it appears from the source. > This is a function of using mediawiki, though, I believe and not something > specific to that page. > > Sean Sean, thanks for pointing that out. chris From robert.bradbury at gmail.com Sat Nov 21 13:26:05 2009 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Sat, 21 Nov 2009 13:26:05 -0500 Subject: [Bioperl-l] Bio::DB::EUtilities question In-Reply-To: References: Message-ID: It sounds like NCBI may be counting frequency of requests, how much data they send or something similar. Are you delaying the time between fetches? The code I've seen typically sleeps for a few seconds each time around a loop. You might try longer delays between fetches and see if that gets you any more data. Alternatively perhaps the libraries aren't reusing the TCP/IP connection properly. Is there a difference between the amount of memory on the machines? Have you watched the size of the process to see if it grows over time? I think the bug which prevented me from fetching a not-so-large genome from a few months ago (eating up 3GB of memory in the process) has not been resolved. If so that could be your problem. Robert On Fri, Nov 20, 2009 at 12:44 PM, Alessandra wrote: > > > I'm testing Bio::DB::EUtilities - webagent which interacts with and > retrieves data from NCBI's eUtils. My perl script works but it works > only if I request less than ~450 times get_Response function.. else I > have got this error message: > > ------------- EXCEPTION ------------- > MSG: Response Error > Can't connect to eutils.ncbi.nlm.nih.gov:80 (connect: No route to host) > STACK Bio::DB::GenericWebAgent::get_Response > /usr/local/share/perl/5.10.0/Bio/DB/GenericWebAgent.pm:215 > STACK toplevel ./wget4gbk.pl:77 > From cjfields at illinois.edu Sat Nov 21 14:19:24 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 21 Nov 2009 13:19:24 -0600 Subject: [Bioperl-l] Bio::DB::EUtilities question In-Reply-To: References: Message-ID: <837CE7E7-E625-4285-AD54-06FD168C0DF3@illinois.edu> NCBI has specific rules about the repeated queries to its servers: http://eutils.ncbi.nlm.nih.gov/#UserSystemRequirements Acc. to that, if you are making over 100 requests at peak times you will run into problems (they'll probably temp-block your IP), even if the timeout is much shorter now (it's 3 requests/second, whereas a year or two ago it was once every 3 sec). In general it's best to run something like this during off-hours. The actual limit on number of server requests is one specific part of Bio::DB::EUtilities that hasn't been added yet, but is tentatively planned. chris On Nov 21, 2009, at 12:26 PM, Robert Bradbury wrote: > It sounds like NCBI may be counting frequency of requests, how much data > they send or something similar. Are you delaying the time between fetches? > The code I've seen typically sleeps for a few seconds each time around a > loop. You might try longer delays between fetches and see if that gets you > any more data. > > Alternatively perhaps the libraries aren't reusing the TCP/IP connection > properly. Is there a difference between the amount of memory on the > machines? Have you watched the size of the process to see if it grows over > time? I think the bug which prevented me from fetching a not-so-large > genome from a few months ago (eating up 3GB of memory in the process) has > not been resolved. If so that could be your problem. > > Robert > > On Fri, Nov 20, 2009 at 12:44 PM, Alessandra > wrote: >> >> >> I'm testing Bio::DB::EUtilities - webagent which interacts with and >> retrieves data from NCBI's eUtils. My perl script works but it works >> only if I request less than ~450 times get_Response function.. else I >> have got this error message: >> >> ------------- EXCEPTION ------------- >> MSG: Response Error >> Can't connect to eutils.ncbi.nlm.nih.gov:80 (connect: No route to host) >> STACK Bio::DB::GenericWebAgent::get_Response >> /usr/local/share/perl/5.10.0/Bio/DB/GenericWebAgent.pm:215 >> STACK toplevel ./wget4gbk.pl:77 >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Sat Nov 21 21:58:37 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 21 Nov 2009 20:58:37 -0600 Subject: [Bioperl-l] BioPerl on FLOSS Weekly Message-ID: <05EB7AF4-8A20-4046-A585-FBF41EA8350A@illinois.edu> Jason and I were recently interviewed (Wednesday!) about BioPerl for FLOSS Weekly by Randal Schwartz, Leo Laporte, Marc Pelletier, and Kirsten Sanford. The interview is now available online, so get your favorite flavor (MP3, podcast) here: http://twit.tv/floss96 Enjoy! chris and jason From adsj at novozymes.com Sun Nov 22 07:37:40 2009 From: adsj at novozymes.com (Adam =?iso-8859-1?Q?Sj=F8gren?=) Date: Sun, 22 Nov 2009 13:37:40 +0100 Subject: [Bioperl-l] BioPerl on FLOSS Weekly In-Reply-To: <05EB7AF4-8A20-4046-A585-FBF41EA8350A@illinois.edu> (Chris Fields's message of "Sat, 21 Nov 2009 20:58:37 -0600") References: <05EB7AF4-8A20-4046-A585-FBF41EA8350A@illinois.edu> Message-ID: <87aaye91m3.fsf@topper.koldfront.dk> On Sat, 21 Nov 2009 20:58:37 -0600, Chris wrote: > Jason and I were recently interviewed (Wednesday!) about BioPerl for > FLOSS Weekly by Randal Schwartz, Leo Laporte, Marc Pelletier, and > Kirsten Sanford. Great! How about linking to it on bioperl.org? :-), Adam -- Adam Sj?gren adsj at novozymes.com From cjfields at illinois.edu Sun Nov 22 15:30:01 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 22 Nov 2009 14:30:01 -0600 Subject: [Bioperl-l] BioPerl on FLOSS Weekly In-Reply-To: <87aaye91m3.fsf@topper.koldfront.dk> References: <05EB7AF4-8A20-4046-A585-FBF41EA8350A@illinois.edu> <87aaye91m3.fsf@topper.koldfront.dk> Message-ID: <2F050081-8B44-4B4C-82D2-7AC71F156588@illinois.edu> On Nov 22, 2009, at 6:37 AM, Adam Sj?gren wrote: > On Sat, 21 Nov 2009 20:58:37 -0600, Chris wrote: > >> Jason and I were recently interviewed (Wednesday!) about BioPerl for >> FLOSS Weekly by Randal Schwartz, Leo Laporte, Marc Pelletier, and >> Kirsten Sanford. > > Great! > > How about linking to it on bioperl.org? > > > :-), > > Adam > > -- > Adam Sj?gren > adsj at novozymes.com Now posted via O|B|F News; I'll try to make that feed more prominent on the main page. Since this is the second such interview (Jason did one a few years back for PerlCast), I'm thinking we need a media page of some sort. chris From maj at fortinbras.us Sun Nov 22 15:48:39 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sun, 22 Nov 2009 15:48:39 -0500 Subject: [Bioperl-l] BioPerl on FLOSS Weekly In-Reply-To: <2F050081-8B44-4B4C-82D2-7AC71F156588@illinois.edu> References: <05EB7AF4-8A20-4046-A585-FBF41EA8350A@illinois.edu><87aaye91m3.fsf@topper.koldfront.dk> <2F050081-8B44-4B4C-82D2-7AC71F156588@illinois.edu> Message-ID: <247658CC6D9A4529B281F4482BD3E4BD@NewLife> We do have http://www.bioperl.org/wiki/Category:BioPerl_Media -- ----- Original Message ----- From: "Chris Fields" To: "Adam Sj?gren" Cc: Sent: Sunday, November 22, 2009 3:30 PM Subject: Re: [Bioperl-l] BioPerl on FLOSS Weekly On Nov 22, 2009, at 6:37 AM, Adam Sj?gren wrote: > On Sat, 21 Nov 2009 20:58:37 -0600, Chris wrote: > >> Jason and I were recently interviewed (Wednesday!) about BioPerl for >> FLOSS Weekly by Randal Schwartz, Leo Laporte, Marc Pelletier, and >> Kirsten Sanford. > > Great! > > How about linking to it on bioperl.org? > > > :-), > > Adam > > -- > Adam Sj?gren > adsj at novozymes.com Now posted via O|B|F News; I'll try to make that feed more prominent on the main page. Since this is the second such interview (Jason did one a few years back for PerlCast), I'm thinking we need a media page of some sort. chris _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From jardim.rodrigo at gmail.com Sun Nov 22 11:06:40 2009 From: jardim.rodrigo at gmail.com (Rodrigo Jardim) Date: Sun, 22 Nov 2009 14:06:40 -0200 Subject: [Bioperl-l] Problems with Genbank Proteins File Message-ID: I have been problem to parser genbank protein file. I think that because this file have a other order of fields. For example: In most general genbank files: ======================== LOCUS AA399704 183 bp mRNA linear EST 03-MAR-2000 ACCESSION AA399704 VERSION AA399704.1 GI:2053305 DEFINITION TEUF0001 T.cruzi epimastigote non-normalized cDNA Library Trypanosoma cruzi cDNA clone 1 5' similar to T. cruzi gene for histone H2b (X60982), mRNA sequence. KEYWORDS EST. SOURCE Trypanosoma cruzi In genbank protein files: =================== LOCUS XP_628849 510 aa linear INV 31-OCT-2008 DEFINITION hypothetical protein [Dictyostelium discoideum AX4]. ACCESSION XP_628849 VERSION XP_628849.1 GI:66799847 DBSOURCE REFSEQ: accession XM_628847.1 KEYWORDS . SOURCE Dictyostelium discoideum AX4. When I try to parser, Bioperl abort with message error. Any ideas? Thanks all, -- Atc, Rodrigo Jardim jardim.rodrigo at gmail.com From biopython at maubp.freeserve.co.uk Mon Nov 23 12:36:36 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 23 Nov 2009 17:36:36 +0000 Subject: [Bioperl-l] Problems with Genbank Proteins File In-Reply-To: References: Message-ID: <320fb6e00911230936ofb9d897rbd45abb73a361250@mail.gmail.com> On Sun, Nov 22, 2009 at 4:06 PM, Rodrigo Jardim wrote: > I have been problem to parser genbank protein file. I think that because > this file have a other order of fields. For example: > > ... > > When I try to parser, Bioperl abort with message error. > > Any ideas? There are some important bits of information missing - what is the error message, and what version of BioPerl are you using? Peter From maj at fortinbras.us Mon Nov 23 12:58:46 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 23 Nov 2009 12:58:46 -0500 Subject: [Bioperl-l] building samtools/Bio::DB::Sam on cygwin Message-ID: Hi All-- I've had some hard-won success installing samtools and Lincoln's Bio::DB::Sam under cygwin; thought some on the list would be able to use my notes. (Yes, Jason, I'm working on Bio::Tools::Run::BWA...) (To get the current samtools, ping http://sourceforge.net/projects/samtools/files/samtools/0.1.7/samtools-0.1.7a.tar.bz2/download ) * Getting samtools to make from scratch in cygwin The following diff details the changes to the samtools Makefile I made by hand. The key points are -D_WIN32 and the additional variable LFLAGS and its interpolations. To get the linker to see libgcc libstdc++ I needed to add symlinks from /lib to the correct files in /lib/gcc/i386-pc-cygwin/4.3.2/. Your gcc version may differ. --- ../old/samtools-0.1.7a/Makefile 2009-11-16 10:13:43.000000000 -0500 +++ Makefile 2009-11-23 12:14:18.529000000 -0500 @@ -1,16 +1,18 @@ CC= gcc CFLAGS= -g -Wall -O2 #-m64 #-arch ppc -DFLAGS= -D_FILE_OFFSET_BITS=64 -D_USE_KNETFILE -D_CURSES_LIB=1 +LFLAGS= -lws2_32 -lgcc -lcygwin -lbz2 -lz -lstdc++ +DFLAGS= -D_WIN32 -D_FILE_OFFSET_BITS=64 -D_CURSES_LIB=1 LOBJS= bgzf.o kstring.o bam_aux.o bam.o bam_import.o sam.o bam_index.o \ bam_pileup.o bam_lpileup.o bam_md.o glf.o razf.o faidx.o knetfile.o \ bam_sort.o sam_header.o AOBJS= bam_tview.o bam_maqcns.o bam_plcmd.o sam_view.o \ bam_rmdup.o bam_rmdupse.o bam_mate.o bam_stat.o bam_color.o \ bamtk.o kaln.o @@ -36,13 +38,13 @@ $(AR) -cru $@ $(LOBJS) samtools:lib $(AOBJS) - $(CC) $(CFLAGS) -o $@ $(AOBJS) -lm $(LIBPATH) $(LIBCURSES) -lz -L. -lbam + $(CC) $(CFLAGS) -o $@ $(AOBJS) -Xlinker --enable-auto-import -lm $(LIBPATH) $(LIBCURSES) -lz -L. -lbam $(LFLAGS) razip:razip.o razf.o knetfile.o - $(CC) $(CFLAGS) -o $@ razf.o razip.o knetfile.o -lz + $(CC) $(CFLAGS) -o $@ razf.o razip.o knetfile.o -lz -lm -lws2_32 bgzip:bgzip.o bgzf.o - $(CC) $(CFLAGS) -o $@ bgzf.o bgzip.o -lz + $(CC) $(CFLAGS) -o $@ bgzf.o bgzip.o -lz -lm -lws2_32 razip.o:razf.h bam.o:bam.h razf.h bam_endian.h kstring.h sam_header.h * Getting Bio::DB::Sam to compile and install Bio::DB::Sam requires not the samtools.exe, but the bam library created during the samtools build, as well as all the samtools header files. Create a symlink in /lib to libbam.a in the build directory (or copy libbam.a up to /lib), and create symlinks or copy *.h into /usr/include. Then in cygwin bash shell $ cpan cpan> install Bio::DB::Sam should fly. Hope someone finds this useful. These mods led me to a successful Bio::DB::Sam install--have not yet checked original code based on Bio::DB::Sam. If they don't work for you, reply to the list. cheers, MAJ From jcline at ieee.org Mon Nov 23 14:13:26 2009 From: jcline at ieee.org (Jonathan Cline) Date: Mon, 23 Nov 2009 13:13:26 -0600 Subject: [Bioperl-l] Installing Bio-perl on dreamhost via CPAN In-Reply-To: References: Message-ID: <4B0ADED6.8040901@ieee.org> Dreamhost has terrible reliability. I have stats going back years on a standard dreamhost hosting account (non-dedicated server), and on some days the web server doesn't respond. Dreamhost service is OK for a hobby blog however it is definitely *not* suitable for anything real. Add in latency, arbitrary account limits/restrictions, etc, and as a hosting service, it is a bad idea to host a project there. Although some users apparently get lucky with server allocation and end up on a "good server", the provider can change this at any time as well. I think more typically, the accounts users don't notice, since most are simple bloggers. Here's a data snip that illustrates the problem with a typical dreamhost account: ---------------------------------------------------------------------- date uptime dns connect request ttfb ttlb 2008-08-05 91.40 0.000 0.528 0.528 2.257 1.619 2008-08-04 89.13 0.002 0.301 0.301 1.302 0.971 2008-08-03 94.62 0.000 0.567 0.567 1.506 0.913 2008-08-02 100.00 0.000 0.335 0.335 1.475 1.079 2008-08-01 100.00 0.000 0.310 0.310 1.587 0.825 2008-07-31 93.55 0.023 0.386 0.386 1.280 0.759 2008-07-30 100.00 0.000 0.345 0.345 1.373 0.860 2008-07-29 100.00 0.000 0.358 0.358 1.335 0.757 2008-07-28 100.00 0.000 0.327 0.327 1.462 0.896 2008-07-27 100.00 0.000 0.292 0.292 1.410 0.966 2008-07-26 100.00 0.000 0.283 0.283 1.280 0.815 2008-07-25 100.00 0.000 0.297 0.297 1.231 0.853 2008-07-24 100.00 0.000 0.362 0.362 1.258 0.699 2008-07-23 100.00 0.000 0.339 0.339 1.270 0.785 ---------------------------------------------------------------------- minimum 89.13 0.000 0.283 0.283 1.231 0.699 maximum 100.00 0.023 0.567 0.567 2.257 1.619 average 97.76 0.002 0.359 0.359 1.430 0.914 ---------------------------------------------------------------------- Or this month: ---------------------------------------------------------------------- date uptime dns connect request ttfb ttlb 2009-11-11 100.00 0.011 0.097 0.097 1.260 1.638 2009-11-10 100.00 0.008 0.094 0.094 1.285 1.647 2009-11-09 100.00 0.008 0.094 0.094 1.494 1.872 2009-11-08 100.00 0.015 0.101 0.101 1.509 1.894 2009-11-07 100.00 0.006 0.092 0.092 1.453 1.831 2009-11-06 100.00 0.011 0.097 0.097 1.500 1.882 2009-11-05 97.80 0.012 0.097 0.097 1.445 1.806 2009-11-04 100.00 0.010 0.096 0.096 1.235 1.605 2009-11-03 95.65 0.007 0.093 0.093 1.266 1.612 2009-11-02 100.00 0.010 0.096 0.096 1.267 1.637 2009-11-01 100.00 0.007 0.093 0.093 1.311 1.692 2009-10-31 100.00 0.009 0.095 0.095 1.225 1.594 2009-10-30 100.00 0.009 0.095 0.095 1.364 1.739 2009-10-29 100.00 0.017 0.103 0.103 1.121 1.505 ---------------------------------------------------------------------- minimum 95.65 0.006 0.092 0.092 1.121 1.505 maximum 100.00 0.017 0.103 0.103 1.509 1.894 average 99.53 0.010 0.096 0.096 1.338 1.711 ---------------------------------------------------------------------- ## Jonathan Cline ## jcline at ieee.org ## Mobile: +1-805-617-0223 ######################## From cjfields at illinois.edu Mon Nov 23 22:19:02 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 23 Nov 2009 21:19:02 -0600 Subject: [Bioperl-l] verbosity and error stack, was accessing EMBL database In-Reply-To: <3277368F-615A-4DD3-B9B3-5D32A5EEEE98@sbc.su.se> References: <475F74057A618245A773CD325E105D1E033334AC@phy-srv01.physiol.physiology.wisc.edu> <3277368F-615A-4DD3-B9B3-5D32A5EEEE98@sbc.su.se> Message-ID: <167D2408-653E-4DF5-BCD7-134CE2549E44@illinois.edu> Okay, so I think it's feasible to add this into trunk. I like the idea of optionally having a log class, if someone comes up with a nice way of adding it in I would be for it. chris On Nov 20, 2009, at 4:15 AM, Dave Messina wrote: > Chris, I took a look at how you implemented this in Biome -- very nice! > > >> I like this verbose/strict separability a lot. Should we go for it? > > Me too. So yes, I think so. > > >> We could even allow finer-grained control of verbosity (states which cover all combinations) w/o affecting strictness. > > > Perhaps this is a job for Log::Log4Perl or Log::Dispatch? > http://search.cpan.org/~mschilli/Log-Log4perl-1.25/lib/Log/Log4perl.pm > http://search.cpan.org/~drolsky/Log-Dispatch-2.26/lib/Log/Dispatch.pm > > > That might be overkill, though. > > Dave > From David.Messina at sbc.su.se Tue Nov 24 11:18:22 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 24 Nov 2009 17:18:22 +0100 Subject: [Bioperl-l] verbosity and error stack, was accessing EMBL database In-Reply-To: <167D2408-653E-4DF5-BCD7-134CE2549E44@illinois.edu> References: <475F74057A618245A773CD325E105D1E033334AC@phy-srv01.physiol.physiology.wisc.edu> <3277368F-615A-4DD3-B9B3-5D32A5EEEE98@sbc.su.se> <167D2408-653E-4DF5-BCD7-134CE2549E44@illinois.edu> Message-ID: <3FD2086D-062F-4706-9DC8-2A53224C4913@sbc.su.se> > I like the idea of optionally having a log class, if someone comes up with a nice way of adding it in I would be for it. My suggestion of the logging modules was actually to handle the various levels of verbose output -- I think both of the ones I mentioned "log" to STDERR by default. But of course a nice side effect of using such a logging module is that it would allow optional logging to a file, too. Dave From paolo.pavan at gmail.com Tue Nov 24 14:28:09 2009 From: paolo.pavan at gmail.com (Paolo Pavan) Date: Tue, 24 Nov 2009 20:28:09 +0100 Subject: [Bioperl-l] Bio::Tools::Run::Cap3 usage question Message-ID: <56be91b60911241128s52613a56u99e5b1cb3ba8d19a@mail.gmail.com> Dear, I'm confused about the proper usage of the module Bio::Tools::Run::Cap3. As documented in the pod, the run(@seqs) method returns the cap3 report file while I expect to return a Bio::Assembly object, consistently with other Bio::Tools::Run classes. However, I went around this by getting from the factory object the location and the names of the temp output files (actually accessing a private property, although) and reading them via the Assembly::IO system. I was just wandering what is the proper designed way to do this job. Thank you for enlighten the way! Paolo From Russell.Smithies at agresearch.co.nz Tue Nov 24 17:04:31 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Wed, 25 Nov 2009 11:04:31 +1300 Subject: [Bioperl-l] Bio::DB::Fasta Message-ID: <18DF7D20DFEC044098A1062202F5FFF32B63085409@exchsth.agresearch.co.nz> Is there any way to pass a filename to Bio::DB::Fasta for the location of where to write the directory.index? It's writing in the same dir as the fasta but I'd rather have it write in /tmp as it's part of a web app. Thanx, Russell ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From Russell.Smithies at agresearch.co.nz Tue Nov 24 17:21:52 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Wed, 25 Nov 2009 11:21:52 +1300 Subject: [Bioperl-l] Bio::DB::Fasta In-Reply-To: <4296CD1039FC44B89034A1FD3E6721F3@NewLife> References: <18DF7D20DFEC044098A1062202F5FFF32B63085409@exchsth.agresearch.co.nz> <4296CD1039FC44B89034A1FD3E6721F3@NewLife> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32B6308542C@exchsth.agresearch.co.nz> That's what I ended up doing. Also, there's no "obvious" way to index a single file so I ended putting the filename in the glob parameter. my $db = Bio::DB::Fasta->new( "$tmp", -glob => "test.faa", -reindex => 1 ); --Russell > -----Original Message----- > From: Mark A. Jensen [mailto:maj at fortinbras.us] > Sent: Wednesday, 25 November 2009 11:19 a.m. > To: Smithies, Russell; 'bioperl-l' > Subject: Re: [Bioperl-l] Bio::DB::Fasta > > The code (method index_dir() ) seems to expect all the fasta files to be > contained in that directory. Looks hairy; what about creating symlinks to > your > fasta files in a /tmp subdir and calling new() with that subdir? > ----- Original Message ----- > From: "Smithies, Russell" > To: "'bioperl-l'" > Sent: Tuesday, November 24, 2009 5:04 PM > Subject: [Bioperl-l] Bio::DB::Fasta > > > > Is there any way to pass a filename to Bio::DB::Fasta for the location > of > > where to write the directory.index? > > It's writing in the same dir as the fasta but I'd rather have it write > in /tmp > > as it's part of a web app. > > > > Thanx, > > > > Russell > > > > > > ======================================================================= > > Attention: The information contained in this message and/or attachments > > from AgResearch Limited is intended only for the persons or entities > > to which it is addressed and may contain confidential and/or privileged > > material. Any review, retransmission, dissemination or other use of, or > > taking of any action in reliance upon, this information by persons or > > entities other than the intended recipients is prohibited by AgResearch > > Limited. If you have received this message in error, please notify the > > sender immediately. > > ======================================================================= > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > From maj at fortinbras.us Tue Nov 24 17:18:51 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 24 Nov 2009 17:18:51 -0500 Subject: [Bioperl-l] Bio::DB::Fasta In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32B63085409@exchsth.agresearch.co.nz> References: <18DF7D20DFEC044098A1062202F5FFF32B63085409@exchsth.agresearch.co.nz> Message-ID: <4296CD1039FC44B89034A1FD3E6721F3@NewLife> The code (method index_dir() ) seems to expect all the fasta files to be contained in that directory. Looks hairy; what about creating symlinks to your fasta files in a /tmp subdir and calling new() with that subdir? ----- Original Message ----- From: "Smithies, Russell" To: "'bioperl-l'" Sent: Tuesday, November 24, 2009 5:04 PM Subject: [Bioperl-l] Bio::DB::Fasta > Is there any way to pass a filename to Bio::DB::Fasta for the location of > where to write the directory.index? > It's writing in the same dir as the fasta but I'd rather have it write in /tmp > as it's part of a web app. > > Thanx, > > Russell > > > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From florent.angly at gmail.com Tue Nov 24 17:54:48 2009 From: florent.angly at gmail.com (Florent Angly) Date: Tue, 24 Nov 2009 14:54:48 -0800 Subject: [Bioperl-l] Bio::Tools::Run::Cap3 usage question In-Reply-To: <56be91b60911241128s52613a56u99e5b1cb3ba8d19a@mail.gmail.com> References: <56be91b60911241128s52613a56u99e5b1cb3ba8d19a@mail.gmail.com> Message-ID: <4B0C6438.8070405@gmail.com> Hi Paolo, It turns out that there is no standard for what is to be passed to the Bio::Tools::Run wrappers and returned by them. I noticed the inconsistency between the assembly wrappers recently while implementing support for new wrapper. I implemented inital support for additional de novo assembly programs in BioPerl (454 Newbler and Minimo) a couple of weeks ago and Mark Jensen added support for Maq, a program that assembler reads against a reference. In the process, all the assembly wrappers were changed to take the same type of input data (a FASTA sequence or an array reference of sequence objects) and return one of the following: * a Bio::Assembly::Scaffold object (the default), or * a Bio::Assembly::IO object, or * the name of a file for the output of the assembler Use the out_type method to set up which output you want, e.g.: $factory->out_type('Bio::Assembly::IO'); or $factory->out_type('cap3_results.ace'); You'll have to use the code in the bioperl-run subversion if you want to use these new features. Cheers, Florent Paolo Pavan wrote: > Dear, > I'm confused about the proper usage of the module Bio::Tools::Run::Cap3. > As documented in the pod, the run(@seqs) method returns the cap3 report file > while I expect to return a Bio::Assembly object, consistently with other > Bio::Tools::Run classes. > However, I went around this by getting from the factory object the location > and the names of the temp output files (actually accessing a private > property, although) and reading them via the Assembly::IO system. > I was just wandering what is the proper designed way to do this job. > > Thank you for enlighten the way! > Paolo > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From roychu at gmail.com Tue Nov 24 18:00:58 2009 From: roychu at gmail.com (Roy) Date: Tue, 24 Nov 2009 15:00:58 -0800 Subject: [Bioperl-l] Remote Blast - same script but different results Message-ID: <4d7f3e450911241500y7df305acq1d03819ea1ec7d3e@mail.gmail.com> Hi bioperl community, I've tried searching the old lists to see if this topic has been covered, and perhaps this question arises from my own lack of familiarity with BLAST, but (from my perl script listed below) I get different results with remote blast when I call my script (that is, I will either get hits or no hits at all). I'll call the script one time, and get no hits. Then call the script again (with the same parameters), and get the same several hits that I may have before after having gotten no hits. I use a subroutine to parse the blast report information, and then I use a boolean to indicate whether results are returned or not. Any insight into what I may have missed would be appreciated. Short question, is this behavior typical? My understanding of how BLAST works is that it shouldn'tl... Thanks in advance, Roy #!/usr/bin/perl -w use strict; use warnings; use Carp; use Bio::Perl; use CGI; use Bio::SeqIO; use Bio::SearchIO; use Bio::SeqFeature::Generic; use Bio::Restriction::Analysis; use Bio::Tools::Run::RemoteBlast; use Bio::SimpleAlign; use Bio::AlignIO; use Bio::LocatableSeq; my $five_seqobj = Bio::Seq->new( -seq => 'ATTCCCACCGGGACCTGCGGGGCTGAGTGCCCTTCTCGGTTGCTGCCGCTGAGGAGCCCGCCCAGCCAGCCAGGGCCGCGAGGCCGAGGCCAGGCCGCAGCCCAGGAGCCGCCCCACCGCAGCTGGCGATGGACCCGCCGAGGCCCGCGCTGCTGGCGCTGCTGGCGCTGCCTGCGCTGCTGCTGCTGCTGCTGGCGGGCGCCAGGGCCG', -display_id => 'genomic_a', -alphabet => 'dna', ); my $three_seqobj = Bio::Seq->new( -seq => 'GTGAGTGCGCGGCCGCTCTGCGGGCGCAGAGGGAGCGGGAGGGAGCCGGCGGCACGAGGTTGGCCGGGGCAGCCTGGGCCTAGGCCAGAGGGAGGGCAGCCACAGGGTCCAGGGCGAGTGGGGGGATTGGACCAGCTGGCGGCCCCTGCAGGCTCAGGATGGGGGGCGCGGGATGGAGGGGCTGAGGAGGGGGTCTCCGGAGCCTGCCTC', -display_id => 'genomic_b', -alphabet => 'dna', ); my @params = ( '-program' => 'blastn', '-database' => 'refseq_genomic', '-expect' => '10', '-readmethod' => 'blastxml' ); $Bio::Tools::Run::RemoteBlast::HEADER{'MEGABLAST'} = 'YES'; $Bio::Tools::Run::RemoteBlast::HEADER{'PERC_IDENT'} = 75; $Bio::Tools::Run::RemoteBlast::HEADER{'FORMAT_TYPE'} = 'XML'; $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Homo sapiens [ORGN]'; $Bio::Tools::Run::RemoteBlast::HEADER{'HITLIST_SIZE'} = 100; # Put: limit number of hits my $factory_a = Bio::Tools::Run::RemoteBlast->new(@params); $factory_a->retrieve_parameter('FORMAT_TYPE', 'XML'); my $hits_a; my $hits_b; my $r; my $bool_hit; print "Submitting BLAST query - 5' end (MEGABLAST = YES)\n"; $Bio::Tools::Run::RemoteBlast::HEADER{'MEGABLAST'} = 'YES'; $r = $factory_a->submit_blast($a_seqobj); $bool_hit = fetch_blast_report($factory_a); unless ($bool_hit) { print "\nNo hits\n"; print "Re-submitting BLAST query - 5' end (MEGABLAST = NO)\n"; sleep 5; $Bio::Tools::Run::RemoteBlast::HEADER{'MEGABLAST'} = 'NO'; $r = $factory_a->submit_blast($a_seqobj); ($bool_hit, $hits_a) = fetch_blast_report($factory_a); if ($bool_hit == 0) { print "No hits\n"; } sleep 5; } my $factory_b = Bio::Tools::Run::RemoteBlast->new(@params); print "\n--------------------------------------------------\n\n"; print "Submitting BLAST query - 3' end (MEGABLAST = YES)\n"; $Bio::Tools::Run::RemoteBlast::HEADER{'MEGABLAST'} = 'YES'; $r = $remote_blast_three->submit_blast($b_seqobj); $bool_hit = fetch_blast_report($factory_b); unless ($bool_hit) { print " No hits\n"; print "Re-submitting BLAST query - 3' end (MEGABLAST = NO)\n"; sleep 5; $Bio::Tools::Run::RemoteBlast::HEADER{'MEGABLAST'} = 'NO'; $r = $factory_b->submit_blast($b_seqobj); ($bool_hit, $hits_b) = fetch_blast_report($factory_b); if ($bool_hit == 0) { print " No hits\n"; } sleep 5; } print "\nbye\n\n"; print "$hits_a\n$hits_b\n"; exit; sub fetch_blast_report { my ($factory) = @_; my $v = 1; my $bool_hit = 0; my $hits = ''; print STDERR "waiting..."; while (my @rids = $factory->each_rid) { foreach my $rid (@rids) { print STDERR "."; my $rc = $factory->retrieve_blast($rid); # retrieves blast report from remote blast queue, # returns -1 on error, 0 on 'job not finished', Bio::SearchIO object # args, remote blast id (rid) if (!ref($rc)) { # if not empty string, ref EXPR returns a non-empty string if EXPR is a reference if ($rc < 0) { $factory->remove_rid($rid); } print STDERR "." if ($v > 0); ##################################################################################### is this printing out as multiple dots? when and why? sleep 5; } else { $bool_hit = 1; my $result = $rc->next_result(); unless ($result->num_hits > 0) { $bool_hit = 0; } # returns: Bio::Search::Result::ResultI object $factory->remove_rid($rid); print "\ndatabase:\t", $result->database_name,"\n"; print "query name:\t", $result->query_name,"\n"; print "query length\t", $result->query_length,"\n"; print "num hits\t", $result->num_hits,"\n"; if ($result->num_hits) { # $result->hits returns an array of hits # $results->no_hits_found, boolean vs $#{@hits} ie. filtering\ while (my $hit = $result->next_hit) { print "\nhit name:\t", $hit->name,"\n"; print "description:\t", $hit->description,"\n"; print "locus:\t", $hit->locus,"\n"; print "algorithm: ", $hit->algorithm,"\thit length: ", $hit->length,"\thit ranking: ", $hit->rank,"\n"; while (my $hsp = $hit->next_hsp) { print "evalue: ", $hsp->evalue,"\tscore: ", $hsp->score,"\tpercent_id: ", $hsp->percent_identity,"\n"; print "query_start: ", $hsp->query->start,"\tquery_end: ", $hsp->query->end; print "\tquery_length: ", $hsp->query->length,"\tquery_strand: ", $hsp->strand('query'), "\n"; print "subject_start: ", $hsp->subject->start,"\tsubject_end: ", $hsp->subject->end; print "\tsubject_length: ", $hsp->subject->length,"\tsubject_strand: ", $hsp->strand('subject'), "\n\n"; my $aln = $hsp->get_aln; if ($aln->is_flush) { foreach my $seq ($aln->each_seq) { print $seq->seq,"\n"; } print $aln->gap_line, "\n"; print $aln->consensus_string(95), "\n\n"; } $hits .= $hit->name."\t".$hsp->subject->start."\t".$hsp->subject->end."\t".$hsp->strand('subject')."\n"; } } } } } return ($bool_hit, $hits); } } From maj at fortinbras.us Tue Nov 24 23:12:13 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 24 Nov 2009 23:12:13 -0500 Subject: [Bioperl-l] Bio::DB::Fasta In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32B6308542C@exchsth.agresearch.co.nz> References: <18DF7D20DFEC044098A1062202F5FFF32B63085409@exchsth.agresearch.co.nz> <4296CD1039FC44B89034A1FD3E6721F3@NewLife> <18DF7D20DFEC044098A1062202F5FFF32B6308542C@exchsth.agresearch.co.nz> Message-ID: <3ECFA0236D1B467181EE63C8C6BE7E1F@NewLife> I seem to be able to do $db = Bio::DB::Fasta->new("$tmp/test.faa"); without a problem- something in the mixing of named and unnamed parameters? ----- Original Message ----- From: "Smithies, Russell" To: "'Mark A. Jensen'" ; "'bioperl-l'" Sent: Tuesday, November 24, 2009 5:21 PM Subject: RE: [Bioperl-l] Bio::DB::Fasta That's what I ended up doing. Also, there's no "obvious" way to index a single file so I ended putting the filename in the glob parameter. my $db = Bio::DB::Fasta->new( "$tmp", -glob => "test.faa", -reindex => 1 ); --Russell > -----Original Message----- > From: Mark A. Jensen [mailto:maj at fortinbras.us] > Sent: Wednesday, 25 November 2009 11:19 a.m. > To: Smithies, Russell; 'bioperl-l' > Subject: Re: [Bioperl-l] Bio::DB::Fasta > > The code (method index_dir() ) seems to expect all the fasta files to be > contained in that directory. Looks hairy; what about creating symlinks to > your > fasta files in a /tmp subdir and calling new() with that subdir? > ----- Original Message ----- > From: "Smithies, Russell" > To: "'bioperl-l'" > Sent: Tuesday, November 24, 2009 5:04 PM > Subject: [Bioperl-l] Bio::DB::Fasta > > > > Is there any way to pass a filename to Bio::DB::Fasta for the location > of > > where to write the directory.index? > > It's writing in the same dir as the fasta but I'd rather have it write > in /tmp > > as it's part of a web app. > > > > Thanx, > > > > Russell > > > > > > ======================================================================= > > Attention: The information contained in this message and/or attachments > > from AgResearch Limited is intended only for the persons or entities > > to which it is addressed and may contain confidential and/or privileged > > material. Any review, retransmission, dissemination or other use of, or > > taking of any action in reliance upon, this information by persons or > > entities other than the intended recipients is prohibited by AgResearch > > Limited. If you have received this message in error, please notify the > > sender immediately. > > ======================================================================= > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > From maj at fortinbras.us Wed Nov 25 12:25:30 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 25 Nov 2009 12:25:30 -0500 Subject: [Bioperl-l] question for all regarding a sam-based Bio::Assembly::IO Message-ID: <1E72D5B0A190448FA27545DB5B68638D@NewLife> Short-readers, I'm working on an Assembly::IO class for sam alignments. I'm currently making a decision about handling multiple reference sequences: would you prefer that next_assembly() return an assembly that covers all reference sequences, or that next_assembly iterates over each reference sequence? (Or both?) thanks for your input- MAJ From timbourine81 at gmail.com Wed Nov 25 12:40:52 2009 From: timbourine81 at gmail.com (Tim) Date: Wed, 25 Nov 2009 18:40:52 +0100 Subject: [Bioperl-l] How to parse BLAST output - all hits of each query in new file Message-ID: <4B0D6C24.2080308@gmail.com> Dear bioperl users, I am a real newbie and have - maybe a very trivial - question. I searched the mailing list archive and many howtos but I have not found a concrete answer to my problem. So hopefully you can help me :) Background: I use the latest Bioperl version (installed it two weeks before). When I use Bio::Tools::Run::StandAloneBlast to BLAST one fasta file including different sequences, I get a BLAST output with many queries each having several hits / sbjcts. My problem is how to parse *all* hits of *one* query into a single new file. And this for all the queries I have in my BLAST output file. Or is it better the other way round; first to make fasta files with only single sequences inside and BLAST each file? But how can I automize that using Bioperl? I tried Bio::SearchIO but can only parse all queries and their respective hits in only one file... I think iteration is also necessary here, but I do not really know how to include that into Bio::SearchIO. Or do I have to use Module:Bio::Index::Blast? I can index a file (see below), but I have no idea what comes next... ###How I index a file... #!/usr/bin/perl -w $ENV{BIOPERL_INDEX_TYPE} = "SDBM_File"; use Bio::Index::Fasta; $file_name = "8_to_BLAST_two_seq_index.fasta"; $id = "48882"; $inx = Bio::Index::Fasta->new (-filename => $file_name . ".idx", -write_flag => 1); $inx->make_index($file_name); Hopefully, you can give me at least hints what to look for. A big THANKS in advance! Cheers, Tim From timbourine81 at gmail.com Wed Nov 25 12:53:34 2009 From: timbourine81 at gmail.com (Tim) Date: Wed, 25 Nov 2009 18:53:34 +0100 Subject: [Bioperl-l] How to parse different (fasta) files Message-ID: <4B0D6F1E.8@gmail.com> Hey everybody, another question from me...if you do not mind :) My situation is like this: I have parsed a standalone BLAST output using SearchIO with only the hit names. Now I have a second fasta file with the same sequences like in the BLAST database but including an alignment (meaning "." and "-"). (There is no chance to make a BLAST database with fasta files including the alignment, unfortunately...). My intention is now to take the name of the hit sequences (BLAST output) and to get the corresponding aligned sequences (fasta file incl. alignment) and putting it in a new file. Is anybody out there who has tried that before? Again, I am a absolute greenhorn in using (Bio)perl. Maybe it is very simple :D Looking forward to get an answer of you. All the best, Tim -- Tim K?hler MPI for Terrestrial Microbiology Karl-von-Frisch-Stra?e D-35043 Marburg / Germany Email: koehlerd at mpi-marburg.mpg.de Phone: +49 6421 178-740 Fax: +49 6421 178-999 From maj at fortinbras.us Wed Nov 25 13:20:03 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 25 Nov 2009 13:20:03 -0500 Subject: [Bioperl-l] How to parse BLAST output - all hits of each query innew file In-Reply-To: <4B0D6C24.2080308@gmail.com> References: <4B0D6C24.2080308@gmail.com> Message-ID: <53DE480F205E42CE8D2B9421592AAF0E@NewLife> hey Tim-- Sound like you need to go about collecting your queries inside out: my %hits_by_query; for ($result->hits) { push @{$hits_by_query{$hit->name}} $hit; } I believe now each hash element, keyed by the query name, will contain an arrayref to the set of hits assoc with that query. >From here, I believe use Bio::Search::Result::BlastResult; use Bio::SearchIO; foreach my $qid ( keys %hits_by_query ) { my $result = Bio::Search::Result::BlastResult->new(); $result->add_hit($_) for ( @{$hits_by_query{$qid}} ); my $blio = Bio::SearchIO->new( -file => $qid.".bls", -format=>'blast' ); $blio->write_result($result); } will do what you want. hope this helps - Mark ----- Original Message ----- From: "Tim" To: Sent: Wednesday, November 25, 2009 12:40 PM Subject: [Bioperl-l] How to parse BLAST output - all hits of each query innew file > Dear bioperl users, > > I am a real newbie and have - maybe a very trivial - question. > > I searched the mailing list archive and many howtos but I have not found > a concrete answer to my problem. So hopefully you can help me :) > > Background: I use the latest Bioperl version (installed it two weeks > before). > When I use Bio::Tools::Run::StandAloneBlast to BLAST one fasta file > including different sequences, I get a BLAST output with many queries > each having several hits / sbjcts. > > My problem is how to parse *all* hits of *one* query into a single new > file. And this for all the queries I have in my BLAST output file. > > Or is it better the other way round; first to make fasta files with only > single sequences inside and BLAST each file? But how can I automize that > using Bioperl? > > I tried Bio::SearchIO but can only parse all queries and their > respective hits in only one file... > I think iteration is also necessary here, but I do not really know how > to include that into Bio::SearchIO. > Or do I have to use Module:Bio::Index::Blast? > > I can index a file (see below), but I have no idea what comes next... > > ###How I index a file... > > #!/usr/bin/perl -w > > $ENV{BIOPERL_INDEX_TYPE} = "SDBM_File"; > > use Bio::Index::Fasta; > > > $file_name = "8_to_BLAST_two_seq_index.fasta"; > $id = "48882"; > $inx = Bio::Index::Fasta->new (-filename => $file_name . ".idx", > -write_flag => 1); > $inx->make_index($file_name); > > > Hopefully, you can give me at least hints what to look for. > > A big THANKS in advance! > > Cheers, > > Tim > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From Russell.Smithies at agresearch.co.nz Wed Nov 25 14:07:26 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 26 Nov 2009 08:07:26 +1300 Subject: [Bioperl-l] How to parse BLAST output - all hits of each query in new file In-Reply-To: <4B0D6C24.2080308@gmail.com> References: <4B0D6C24.2080308@gmail.com> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32B63085701@exchsth.agresearch.co.nz> Hi Tim, Here's some code for a job I'm working on at the moment that contains all the bits you'll probably need. It's extracting 2 species-specific databases from nr (based on tax ids), doing a blast, then parsing the results and creating a substitution matrix. I was initially using Bio::DB::Eutilities to query and retrieve sequences but I kept getting errors and time-outs from NCBI when pulling back large numbers of sequences. It should give you a rough idea of how to run Bio::Tools::Run::StandAloneBlast, Bio::DB::Fasta and Bio::SearchIO. Email me direct if you want further explaination as it's not well commented ;-) Russell Smithies Bioinformatics Applications Developer T +64 3 489 9085 E russell.smithies at agresearch.co.nz Invermay Research Centre Puddle Alley, Mosgiel, New Zealand T +64 3 489 3809 F +64 3 489 9174 www.agresearch.co.nz ======================================= #!/usr/local/bin/perl use strict; use Bio::Tools::Run::StandAloneBlast; use Bio::SeqIO; use Bio::SearchIO; use Bio::DB::Fasta; use Storable; # Parameters: # Percentage can be specified as either 20p, 20P or 20% # So for 20% of rice sequences blasted against oil palm: # 4530 51953 20p (4530=rice,51953=oil_palm, 20p=20%) # Or for 20 searches: # 4530 51953 20 # my ( $q, $s, $c ) = @ARGV; my $nr = "/data/databases/flatfile/illuminati_blastdata/nr"; my $tax_file = "/data/anonftp/pub/mirror/taxonomy/gi_taxid_prot.dmp.gz"; my $tmp = "/tmp/tax"; my %stats = (); my $total_subs = 0; my $min_hsp_len = 0; my $min_hsp_identity = 0; my $num_searches = $c || 10; my $blast_e = '1e-6'; my $count = 0; # check if all the fasta and blast files exist # if not, extract new fasta and re-formatdb the database foreach my $t ( $q, $s ) { foreach ( map { "$tmp/$t.$_" } qw(faa list phr pin psq) ) { unless ( -e $_ ) { print "Creating database for $t\n"; &create_database($t); last; } } } my @params = ( -database => "$tmp/$q", -program => 'blastp', -e => $blast_e, -outfile => "$tmp/blast.out", -v => '1', -b => '1' ); my $factory = Bio::Tools::Run::StandAloneBlast->new(@params) or die $!; # load the query sequences into a db # makes it easier to randomly access them my $db = Bio::DB::Fasta->new( "$tmp", -glob => "$s.faa", -reindex => 1 ); my @ids = $db->ids; my $id_count = $#ids; exit "No sequences\n" unless $id_count; # if a percentage is requested, calculate # the required number of searches if ( $num_searches =~ m/(\d+)[pP%]/ ) { $num_searches = int( ( $1 / 100 ) * $id_count ); warn "Searching random $1 percent ($num_searches) of $id_count sequences from taxid $q\n"; } my $summary_file = "$tmp/".$$."_summary.txt"; open( OUT, ">", $summary_file ) or die $!; print OUT "#Summary of $num_searches random blast searches from taxid $q against taxid $s.\n"; print OUT "#Parameters used were:\n"; print OUT "#blast_e: $blast_e\n"; print OUT "#min_hsp_len: $min_hsp_len\n"; print OUT "#min_hsp_identity: $min_hsp_identity\n"; print OUT "\n"; while ( my $seq = $db->get_Seq_by_id( $ids[ rand($#ids) ] ) ) { next unless $seq; warn "Processing ", $seq->id, "\n"; eval { my $blast_report = $factory->blastall($seq); sleep 5; }; my $blast_in = new Bio::SearchIO( -format => "blast", -file => "$tmp/blast.out" ); while ( my $result = $blast_in->next_result ) { if ( $result->num_hits <= 0 ) { warn "No hits for ", $result->query_accession, "\n"; print OUT "No hits for ", $result->query_accession, "\n"; next; } $count++; while ( my $hit = $result->next_hit ) { while ( my $hsp = $hit->next_hsp ) { warn sprintf( "%s had %s hsp%s\n", $result->query_accession, $hit->num_hsps, $hit->num_hsps > 1 ? "s" : "" ); print OUT sprintf( "%s had %s hsp%s\n", $result->query_accession, $hit->num_hsps, $hit->num_hsps > 1 ? "s" : "" ); # http://www.bioperl.org/wiki/HOWTO:SearchIO#Table_of_Methods if ( $hsp->length('total') > $min_hsp_len ) { if ( $hsp->percent_identity >= $min_hsp_identity ) { my @query_string = split '', $hsp->query_string; my @homol_string = split '', $hsp->homology_string; my @hit_string = split '', $hsp->hit_string; for ( my $i = 0; $i < $#query_string; $i++ ) { next unless $homol_string[$i] =~ /\+/; $stats{ $query_string[$i] }{ $hit_string[$i] }++; $total_subs++; } } } } } } unlink '$tmp/blast.out' if -e '$tmp/blast.out'; last if $count >= $num_searches; } # create summary frequency list my %summary = (); for my $query ( keys %stats ) { for my $hit ( keys %{ $stats{$query} } ) { $summary{"$query->$hit"} = sprintf( "%6f", $stats{$query}{$hit} / $total_subs ); } } print OUT "\n"; # sort by decending frequencies and print to summary file foreach my $k ( sort { $summary{$b} <=> $summary{$a} } keys %summary ) { print OUT "$k\t", $summary{$k}, "\n" unless $k =~ /TOTAL/; } print OUT "\n\n"; # print substitution matrix my $i = 0; my @prots = qw(A R N D C Q E G H I L K M F P S T W Y V); my $sep = "\t"; print OUT sprintf( "%7s %s", $_, $sep ) foreach ( " ", @prots ); print OUT "\n"; foreach my $x (@prots) { print OUT sprintf( "%7s|%s", $prots[ $i++ ], $sep ); foreach my $y (@prots) { my $val = defined( $stats{$x}{$y} ) ? sprintf( "%0.6f", $stats{$x}{$y} / $total_subs ) : "--------"; print OUT sprintf( "%s%s", $val, $sep ); } print OUT "\n"; } close OUT; open(IN, $summary_file) or die $!; print $_ while(); close IN; # extract sequences from nr database based on taxid. sub create_database { my $txid = shift; my %hash = (); my $gi_stored = "/tmp/gi.dat"; if ( -e $gi_stored ) { %hash = %{ retrieve($gi_stored) }; } else { open( TXID, "zcat $tax_file | " ) or die $!; while () { chomp; my ( $gi, $tx ) = split( "\t", $_ ); push( @{ $hash{$tx} }, $gi ); } close TXID; store( \%hash, $gi_stored ); } my $txlist = "$tmp/$txid.list"; my $txseq = "$tmp/$txid.faa"; die "No sequences found for taxid $txid\n" unless defined( @{ $hash{$txid} }); my $num_seqs = scalar( @{ $hash{$txid} }); warn "Found $num_seqs sequences for taxid $txid in $tax_file\n"; open OUT, ">", $txlist or die $!; print OUT "$_\n" foreach ( @{ $hash{$txid} } ); close OUT; my $cmd = "fastacmd -d $nr -i $txlist -t T -o $txseq 2>/dev/null"; system $cmd; my $count = `grep -c '>' $txseq`; $count =~ s/\n//; warn "Could only extract $count sequences from $nr\n"; $cmd = "formatdb -p T -i $tmp/$txid.faa -n $tmp/$txid -l $tmp/formatdb.log"; system $cmd; $cmd = "fastacmd -d $tmp/$txid -I"; system $cmd; warn "Check the formatdb.log for any errors\n"; } ======================================= > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Tim > Sent: Thursday, 26 November 2009 6:41 a.m. > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] How to parse BLAST output - all hits of each query in > new file > > Dear bioperl users, > > I am a real newbie and have - maybe a very trivial - question. > > I searched the mailing list archive and many howtos but I have not found > a concrete answer to my problem. So hopefully you can help me :) > > Background: I use the latest Bioperl version (installed it two weeks > before). > When I use Bio::Tools::Run::StandAloneBlast to BLAST one fasta file > including different sequences, I get a BLAST output with many queries > each having several hits / sbjcts. > > My problem is how to parse *all* hits of *one* query into a single new > file. And this for all the queries I have in my BLAST output file. > > Or is it better the other way round; first to make fasta files with only > single sequences inside and BLAST each file? But how can I automize that > using Bioperl? > > I tried Bio::SearchIO but can only parse all queries and their > respective hits in only one file... > I think iteration is also necessary here, but I do not really know how > to include that into Bio::SearchIO. > Or do I have to use Module:Bio::Index::Blast? > > I can index a file (see below), but I have no idea what comes next... > > ###How I index a file... > > #!/usr/bin/perl -w > > $ENV{BIOPERL_INDEX_TYPE} = "SDBM_File"; > > use Bio::Index::Fasta; > > > $file_name = "8_to_BLAST_two_seq_index.fasta"; > $id = "48882"; > $inx = Bio::Index::Fasta->new (-filename => $file_name . ".idx", > -write_flag => 1); > $inx->make_index($file_name); > > > Hopefully, you can give me at least hints what to look for. > > A big THANKS in advance! > > Cheers, > > Tim > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From maj at fortinbras.us Wed Nov 25 14:21:27 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 25 Nov 2009 14:21:27 -0500 Subject: [Bioperl-l] How to parse BLAST output - all hits of each queryinnew file In-Reply-To: <53DE480F205E42CE8D2B9421592AAF0E@NewLife> References: <4B0D6C24.2080308@gmail.com> <53DE480F205E42CE8D2B9421592AAF0E@NewLife> Message-ID: <815D2A47BC9C4D89B8DEF0B10DA9EAF8@NewLife> whoops: change the following line: my $blio = Bio::SearchIO->new( -file => $qid.".bls", -format=>'blast' ); to my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", -format=>'blast' ); (I always forget that...) MAJ ----- Original Message ----- From: "Mark A. Jensen" To: "Tim" ; Sent: Wednesday, November 25, 2009 1:20 PM Subject: Re: [Bioperl-l] How to parse BLAST output - all hits of each queryinnew file > hey Tim-- > > Sound like you need to go about collecting your queries inside out: > > my %hits_by_query; > for ($result->hits) { > push @{$hits_by_query{$hit->name}} $hit; > } > > I believe now each hash element, keyed by the query name, will contain > an arrayref to the set of hits assoc with that query. >>From here, I believe > > use Bio::Search::Result::BlastResult; > use Bio::SearchIO; > > foreach my $qid ( keys %hits_by_query ) { > my $result = Bio::Search::Result::BlastResult->new(); > $result->add_hit($_) for ( @{$hits_by_query{$qid}} ); > my $blio = Bio::SearchIO->new( -file => $qid.".bls", -format=>'blast' ); > $blio->write_result($result); > } > > will do what you want. > > hope this helps - > Mark > > ----- Original Message ----- > From: "Tim" > To: > Sent: Wednesday, November 25, 2009 12:40 PM > Subject: [Bioperl-l] How to parse BLAST output - all hits of each query innew > file > > >> Dear bioperl users, >> >> I am a real newbie and have - maybe a very trivial - question. >> >> I searched the mailing list archive and many howtos but I have not found >> a concrete answer to my problem. So hopefully you can help me :) >> >> Background: I use the latest Bioperl version (installed it two weeks >> before). >> When I use Bio::Tools::Run::StandAloneBlast to BLAST one fasta file >> including different sequences, I get a BLAST output with many queries >> each having several hits / sbjcts. >> >> My problem is how to parse *all* hits of *one* query into a single new >> file. And this for all the queries I have in my BLAST output file. >> >> Or is it better the other way round; first to make fasta files with only >> single sequences inside and BLAST each file? But how can I automize that >> using Bioperl? >> >> I tried Bio::SearchIO but can only parse all queries and their >> respective hits in only one file... >> I think iteration is also necessary here, but I do not really know how >> to include that into Bio::SearchIO. >> Or do I have to use Module:Bio::Index::Blast? >> >> I can index a file (see below), but I have no idea what comes next... >> >> ###How I index a file... >> >> #!/usr/bin/perl -w >> >> $ENV{BIOPERL_INDEX_TYPE} = "SDBM_File"; >> >> use Bio::Index::Fasta; >> >> >> $file_name = "8_to_BLAST_two_seq_index.fasta"; >> $id = "48882"; >> $inx = Bio::Index::Fasta->new (-filename => $file_name . ".idx", >> -write_flag => 1); >> $inx->make_index($file_name); >> >> >> Hopefully, you can give me at least hints what to look for. >> >> A big THANKS in advance! >> >> Cheers, >> >> Tim >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From alden.huang at gmail.com Thu Nov 26 05:54:30 2009 From: alden.huang at gmail.com (Alden Huang) Date: Thu, 26 Nov 2009 02:54:30 -0800 Subject: [Bioperl-l] Function that determines serious mutations In-Reply-To: References: Message-ID: <9e408d720911260254r1e85169lb92d944d88a1880c@mail.gmail.com> Hey rob, Sorting Intolerant from Tolerant http://sift.jcvi.org/ ~alden ...a bit late, i kno; I just read you post now while cleaning the inbox On Fri, Nov 6, 2009 at 9:35 AM, Robert Bradbury wrote: > Is there a function in the library (or has someone written one) that can > take a genbank entry and determine which mutations are harmful? > > It would be used to produce a table summary of: > ?GENE ? ? ? ? ?# SNP ? ? ?# BadSNP > > One kind of gets this from NCBI if you lookup in the "GENE" db a gene name > and then go to the "GeneView" om dbSNP page it has the information I want > but largely in a graphical format while I simply want numbers I can dump > into a spreadsheet. > > I don't think it would be hard, fetch the gene, run through the features for > the SNP database, figure out whether they are good or bad SNPs, accumulate > the statistics and dump it. ?I think the functions available are flexible > enough to do it but I can't believe nobody has already done it. ?It could be > a bit more complex in that one could do an analysis to see if the mutations > are in a conserved domain or mutations that code for Cysteine or Methionine > (or othe potentially "critical" amino acids) but since "critical" is in the > eye of the beholder there would have to be some kind of callback to a > scoring function. > > Thanks, > Robert > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From robert.bradbury at gmail.com Thu Nov 26 06:27:50 2009 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Thu, 26 Nov 2009 06:27:50 -0500 Subject: [Bioperl-l] Function that determines serious mutations In-Reply-To: <9e408d720911260254r1e85169lb92d944d88a1880c@mail.gmail.com> References: <9e408d720911260254r1e85169lb92d944d88a1880c@mail.gmail.com> Message-ID: On Thu, Nov 26, 2009 at 5:54 AM, Alden Huang wrote: > > Sorting Intolerant from Tolerant > http://sift.jcvi.org/ > > Ah yes, thank you very much. This looks very much like a tool that can be adapted for various uses. Robert From jason at bioperl.org Thu Nov 26 12:16:17 2009 From: jason at bioperl.org (Jason Stajich) Date: Thu, 26 Nov 2009 09:16:17 -0800 Subject: [Bioperl-l] question about a Bio::Tree::Tree method In-Reply-To: <30960443.966281259248778372.JavaMail.defaultUser@defaultHost> References: <30960443.966281259248778372.JavaMail.defaultUser@defaultHost> Message-ID: <14F4B8C9-A1F4-436B-813F-50E139932D3D@bioperl.org> Emilio - please ask your questions on the list - many people there can help answer questions. get_nodes returns all the nodes in the tree, the options specify the order they are returned in. Depending on your question the order probably won't matter so you can just call it without any arguments like in the examples and the HOWTO. The documentation for the method says: Title : get_nodes Usage : my @nodes = $tree?>get_nodes() Function: Return list of Bio::Tree::NodeI objects Returns : array of Bio::Tree::NodeI objects Args : (named values) hash with one value order => ?b?breadth? first order or ?d?depth? first order So you can provide no arguments and get the default (breadth-first I believe) or you can specify -order => 'd' or -order => 'depth' to get the nodes in depth-first order. -jason On Nov 26, 2009, at 7:19 AM, miglio83 at libero.it wrote: > Hi Jason, > I'm Emilio Siena, a PhD student of the University of Perugia. > I have > a question about the method "get_nodes" of the "Bio::Tree::Tree" > class. > In > particular I didn't understand which type of arguments it accepts > and in which > format an argument should be given. > > Thank you in advance! > > Emilio -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From maj at fortinbras.us Thu Nov 26 12:40:45 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 26 Nov 2009 12:40:45 -0500 Subject: [Bioperl-l] Bio::Assembly::IO::sam is alpha Message-ID: <599F8BABCD2848EFA98FB24A4419674E@NewLife> in bioperl-live/trunk with plenty pod; bravehearts can (please!) test on .bam files cheers, MAJ From mauricio at open-bio.org Thu Nov 26 16:45:43 2009 From: mauricio at open-bio.org (Mauricio Herrera Cuadra) Date: Thu, 26 Nov 2009 15:45:43 -0600 Subject: [Bioperl-l] [DAS] DAS workshop 7th-9th April 2010 In-Reply-To: References: Message-ID: <4B0EF707.6080202@open-bio.org> Hi Jonathan, Any chance it can be webcasted? I'm sure it would attract a lot of remote attendees ;) Regards, Mauricio. Jonathan Warren wrote: > We are considering running a Distributed Annotation System workshop here > at the Sanger/EBI in the UK subject to decent demand. > The workshop will be held from Wednesday 7th-Friday 9th April 2010. If > you would be interested in attending either to present or just take part > then please email me jw12 at sanger.ac.uk > > The format of the workshop is likely to be similar to last years (1st > day for beginners, 2nd for both beginners and advanced users, 3rd day > for advanced), information for which can be found here: > http://www.dasregistry.org/course.jsp > > If you would like to present then please send a short summary of what > you would like to talk about. > > Thanks > > Jonathan. > > Jonathan Warren > Senior Developer and DAS coordinator > jw12 at sanger.ac.uk > > > > > > > > > From robert.bradbury at gmail.com Thu Nov 26 21:06:40 2009 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Thu, 26 Nov 2009 21:06:40 -0500 Subject: [Bioperl-l] BioPerl "guts" question regarding forked processes Message-ID: I'm currently running near my process limit and running sequence fetches from swissprot (I've also had this happen with getting gi's from NCBI) and am running out of processes about halfway through the set I'm trying to fetch [1]. Now, is there someplace in the bioperl documentation that documents where one is supposed to wait() for defunct processes after each sequence fetch. I'm encountering the problem both when the sequence fetches succeed as well as when they fail. Thanks in advance. Robert 1. This is due to a bug in chromium's use of flash that involves it leaving many defunct processes that are uncollected and therefore counting towards ones "process limit". From kanzure at gmail.com Thu Nov 26 21:12:46 2009 From: kanzure at gmail.com (Bryan Bishop) Date: Thu, 26 Nov 2009 20:12:46 -0600 Subject: [Bioperl-l] BioPerl "guts" question regarding forked processes In-Reply-To: References: Message-ID: <55ad6af70911261812q583277d5l71df0d66e756f617@mail.gmail.com> On Thu, Nov 26, 2009 at 8:06 PM, Robert Bradbury wrote: > I'm currently running near my process limit and running sequence fetches > from swissprot (I've also had this happen with getting gi's from NCBI) and > am running out of processes about halfway through the set I'm trying to > fetch [1]. Hey Robert, sorry for the off-topic question, but I was wondering if you're the same Robert Bradbury from the extropy-chat list. Hi? - Bryan http://heybryan.org/ 1 512 203 0507 From paolo.pavan at gmail.com Fri Nov 27 06:35:03 2009 From: paolo.pavan at gmail.com (Paolo Pavan) Date: Fri, 27 Nov 2009 12:35:03 +0100 Subject: [Bioperl-l] More general Bio::Assembly::Contig question (was Bio::Tools::Run::Cap3 usage question) Message-ID: <56be91b60911270335s3a50ab0cpb03aabb6660f81dc@mail.gmail.com> Dear Florent, Thank you for your kind answer and for your efforts spent in this module. Since you are working on these topics I would like to seize the day and put you some questions about some doubts I have in mind, if you agree, of course :-) Some times ago I tried to work with bioperl, loading the data from an ACE file originated by Newbler; my need was to extract part of the contig like an alignment of reads and I tought to do it with a slice() method, since I saw Bio::Assembly::Contig implements Bio::AlignI interface. Unfortunately I realize that this interface is inherited but not implemented. I tried to hack it by adding a slice method which would act on a Bio::Alignment created from the array of LocatableSeqs representing the reads. This is the question: If I'm not wrong (please correct me if yes), Bio::Assembly::Contig class stores reads informations in: Bio::Assembly::Contigs->{_elem}{READ_NAME}{_feat}{ _align_clipping:READ_NAME} _aligned_coord:READ_NAME} _quality_clipping:READ_NAME} Anyone of these 3 features _align_clipping, _aligned_coord, _quality_clipping, contains a Bio::SeqFeature::Generic, which of them is more suitable to the purpose expressed before, the slice method? And more, If you apologize me for being too long, is consequently to the previous: I don't have perfectly clear the purpose of this 3 feature per read, can you explain it? Really thanks you for the time you would spend. Bye bye, Paolo 2009/11/24 Florent Angly > Hi Paolo, > > It turns out that there is no standard for what is to be passed to the > Bio::Tools::Run wrappers and returned by them. I noticed the inconsistency > between the assembly wrappers recently while implementing support for new > wrapper. I implemented inital support for additional de novo assembly > programs in BioPerl (454 Newbler and Minimo) a couple of weeks ago and Mark > Jensen added support for Maq, a program that assembler reads against a > reference. In the process, all the assembly wrappers were changed to take > the same type of input data (a FASTA sequence or an array reference of > sequence objects) and return one of the following: > * a Bio::Assembly::Scaffold object (the default), or > * a Bio::Assembly::IO object, or > * the name of a file for the output of the assembler > Use the out_type method to set up which output you want, e.g.: > $factory->out_type('Bio::Assembly::IO'); > or > $factory->out_type('cap3_results.ace'); > You'll have to use the code in the bioperl-run subversion if you want to > use these new features. > > Cheers, > > Florent > > > > > Paolo Pavan wrote: > >> Dear, >> I'm confused about the proper usage of the module Bio::Tools::Run::Cap3. >> As documented in the pod, the run(@seqs) method returns the cap3 report >> file >> while I expect to return a Bio::Assembly object, consistently with other >> Bio::Tools::Run classes. >> However, I went around this by getting from the factory object the >> location >> and the names of the temp output files (actually accessing a private >> property, although) and reading them via the Assembly::IO system. >> I was just wandering what is the proper designed way to do this job. >> >> Thank you for enlighten the way! >> Paolo >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > > From jw12 at sanger.ac.uk Thu Nov 26 09:57:35 2009 From: jw12 at sanger.ac.uk (Jonathan Warren) Date: Thu, 26 Nov 2009 14:57:35 +0000 Subject: [Bioperl-l] DAS workshop 7th-9th April 2010 Message-ID: We are considering running a Distributed Annotation System workshop here at the Sanger/EBI in the UK subject to decent demand. The workshop will be held from Wednesday 7th-Friday 9th April 2010. If you would be interested in attending either to present or just take part then please email me jw12 at sanger.ac.uk The format of the workshop is likely to be similar to last years (1st day for beginners, 2nd for both beginners and advanced users, 3rd day for advanced), information for which can be found here: http://www.dasregistry.org/course.jsp If you would like to present then please send a short summary of what you would like to talk about. Thanks Jonathan. Jonathan Warren Senior Developer and DAS coordinator jw12 at sanger.ac.uk -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From timbourine81 at googlemail.com Thu Nov 26 11:02:30 2009 From: timbourine81 at googlemail.com (Tim Koehler) Date: Thu, 26 Nov 2009 17:02:30 +0100 Subject: [Bioperl-l] How to parse BLAST output - all hits of each queryinnew file In-Reply-To: <4B0EA44D.2050507@gmail.com> References: <4B0D6C24.2080308@gmail.com> <53DE480F205E42CE8D2B9421592AAF0E@NewLife> <815D2A47BC9C4D89B8DEF0B10DA9EAF8@NewLife> <4B0EA44D.2050507@gmail.com> Message-ID: ups, sent too early... Hey Mark, thanks for the answer. But I am still struggling, especially where to put in your code. Here ist the code I have, so far: #!/usr/bin/perl -w ### should I put your code here as push is a perl command? my %hits_by_query; for ($result->hits) { ### I inserted a comma after name}}; if there is no comma, there was the error: Scalar found where operator expected at 12_BLAST_two_sequence_each_query_one_file.PL line7, near "} $hit" ### (Missing operator before $hit?) ###Useless use of push with no values at 12_BLAST_two_sequence_each_query_one_file.PL line 7. ###syntax error at 12_BLAST_two_sequence_each_query_one_file.PL line 7, near "} $hit" ###BEGIN not safe after errors--compilation aborted at 12_BLAST_two_sequence_each_query_one_file.PL line 13. push @{$hits_by_query{$hit->name}}, $hit; ###here, every time this terror appears: Name "main::result" used only once: possible typo at 12_BLAST_two_sequence_each_query_one_file.PL line 5. ###error: Can't call method "hits" on an undefined value at 12_BLAST_two_sequence_each_query_one_file.PL line 5. } use strict; use Bio::Tools::Run::StandAloneBlast; use Bio::SeqIO; use Bio::SearchIO; use Bio::Search::Result::BlastResult; my $Seq_in = Bio::SeqIO->new ( -file => "/home/koehler/Programs/for_BLAST/BLAST_Pipeline/1_to_BLAST_two_seq.fasta", -format => 'fasta' ); while (my $query = $Seq_in->next_seq()) { my $factory = Bio::Tools::Run::StandAloneBlast->new( 'program' => 'blastn', 'database' => '/home/koehler/Programs/for_BLAST/BLAST_Pipeline/3_BLAST_db', _READMETHOD => "Blast" ); my $blast_report = $factory->blastall($query); ### Should I need to use a module? are the commands here at the right position? errors, e.g., Global symbol "$hit" requires explicit package name #my %hits_by_query; #for ($result->hits) { ### inserted comma after name}} # push @{$hits_by_query{$hit->name}}, $hit; #} foreach my $qid ( keys %hits_by_query ) { my $result = Bio::Search::Result::BlastResult->new(); $result->add_hit($_) for ( @{$hits_by_query{$qid}} ); my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", -format=>'blast' ); $blio->write_result($result); } ###where are the files stored? what is their name. Sorry, but I cannot get behind that :( while( my $result = $blast_report->next_result ) { ## $result is a Bio::Search::Result::ResultI compliant object while( my $hit = $result->next_hit ) { ## $hit is a Bio::Search::Hit::HitI compliant object while( my $hsp = $hit->next_hsp ) { ## $hsp is a Bio::Search::HSP::HSPI compliant object if( $hsp->length('total') > 50 ) { if ( $hsp->percent_identity >= 75 ) { print "Query= ", $result->query_name, "Hit= ", $hit->name, "Length= ", $hsp->length('total'), "Percent_id= ", $hsp->percent_identity, "Subject=", $hsp->hit_string,"\n"; } } } } } } Again, a big thanks in advance :) All the best, Tim On Thu, Nov 26, 2009 at 4:52 PM, Tim wrote: > Hey Mark, > > thanks for the answer > > On 25.11.2009 20:21, Mark A. Jensen wrote: > > whoops: change the following line: > > my $blio = Bio::SearchIO->new( -file => $qid.".bls", -format=>'blast' ); > > > > to > > > > my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", -format=>'blast' ); > > > > (I always forget that...) > > MAJ > > > > ----- Original Message ----- From: "Mark A. Jensen" > > To: "Tim" ; > > Sent: Wednesday, November 25, 2009 1:20 PM > > Subject: Re: [Bioperl-l] How to parse BLAST output - all hits of each > > queryinnew file > > > > > >> hey Tim-- > >> > >> Sound like you need to go about collecting your queries inside out: > >> > >> my %hits_by_query; > >> for ($result->hits) { > >> push @{$hits_by_query{$hit->name}} $hit; > >> } > >> > >> I believe now each hash element, keyed by the query name, will contain > >> an arrayref to the set of hits assoc with that query. > >>> From here, I believe > >> > >> use Bio::Search::Result::BlastResult; > >> use Bio::SearchIO; > >> > >> foreach my $qid ( keys %hits_by_query ) { > >> my $result = Bio::Search::Result::BlastResult->new(); > >> $result->add_hit($_) for ( @{$hits_by_query{$qid}} ); > >> my $blio = Bio::SearchIO->new( -file => $qid.".bls", -format=>'blast' > ); > >> $blio->write_result($result); > >> } > >> > >> will do what you want. > >> > >> hope this helps - > >> Mark > >> > >> ----- Original Message ----- From: "Tim" > >> To: > >> Sent: Wednesday, November 25, 2009 12:40 PM > >> Subject: [Bioperl-l] How to parse BLAST output - all hits of each > >> query innew file > >> > >> > >>> Dear bioperl users, > >>> > >>> I am a real newbie and have - maybe a very trivial - question. > >>> > >>> I searched the mailing list archive and many howtos but I have not > found > >>> a concrete answer to my problem. So hopefully you can help me :) > >>> > >>> Background: I use the latest Bioperl version (installed it two weeks > >>> before). > >>> When I use Bio::Tools::Run::StandAloneBlast to BLAST one fasta file > >>> including different sequences, I get a BLAST output with many queries > >>> each having several hits / sbjcts. > >>> > >>> My problem is how to parse *all* hits of *one* query into a single new > >>> file. And this for all the queries I have in my BLAST output file. > >>> > >>> Or is it better the other way round; first to make fasta files with > only > >>> single sequences inside and BLAST each file? But how can I automize > that > >>> using Bioperl? > >>> > >>> I tried Bio::SearchIO but can only parse all queries and their > >>> respective hits in only one file... > >>> I think iteration is also necessary here, but I do not really know how > >>> to include that into Bio::SearchIO. > >>> Or do I have to use Module:Bio::Index::Blast? > >>> > >>> I can index a file (see below), but I have no idea what comes next... > >>> > >>> ###How I index a file... > >>> > >>> #!/usr/bin/perl -w > >>> > >>> $ENV{BIOPERL_INDEX_TYPE} = "SDBM_File"; > >>> > >>> use Bio::Index::Fasta; > >>> > >>> > >>> $file_name = "8_to_BLAST_two_seq_index.fasta"; > >>> $id = "48882"; > >>> $inx = Bio::Index::Fasta->new (-filename => $file_name . ".idx", > >>> -write_flag => 1); > >>> $inx->make_index($file_name); > >>> > >>> > >>> Hopefully, you can give me at least hints what to look for. > >>> > >>> A big THANKS in advance! > >>> > >>> Cheers, > >>> > >>> Tim > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >>> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > > > > Tim K?hler MPI for Terrestrial Microbiology Karl-von-Frisch-Stra?e D-35043 Marburg / Germany Email: koehlerd at mpi-marburg.mpg.de Phone: +49 6421 178-740 Fax: +49 6421 178-999 From rtbio.2009 at gmail.com Sat Nov 28 02:53:43 2009 From: rtbio.2009 at gmail.com (Roopa Raghuveer) Date: Sat, 28 Nov 2009 08:53:43 +0100 Subject: [Bioperl-l] Linking of two cgi scripts Message-ID: hello everyone, I have a small question. I would like to link two cgi scripts i.e., I have an input sequence being entered in a text area ex:->gi|at442323|... ATGCCCCCTTGGAACCAAAAAAA.... So I would like to compare this with the query sequences.These query sequences would be from a BLAST script in the module blast.pm So once I enter the input sequence and request for BLAST using submit button,my request should go to a program which performs BLAST search.After this, the sequences obtained from BLAST have to be returned to a program Roopa.pm which compares the input sequence and the sequences obtained from blast. But I am unable to provide this link between the cgi scripts.(i.e.,one script to use BLAST,the other script to compare the sequences and send the results to the browser) Could any one help me in this regard? Regards, Roopa. From s.denaxas at gmail.com Sat Nov 28 05:56:15 2009 From: s.denaxas at gmail.com (Spiros Denaxas) Date: Sat, 28 Nov 2009 10:56:15 +0000 Subject: [Bioperl-l] Linking of two cgi scripts In-Reply-To: References: Message-ID: Hello, Why do they both have to be CGi scripts? cant all the processing happen server side, i.e. both BLAST and comparison of returned results? If that is strictly a requirement, you could: a) get input from user on script A, i.e. the input sequence b) do a HTTP request from the CGI to the other script B using LWP::UserAgent c) get results from script B, pass on to comparison module d) return results to user As I said, this will be clunky so either do everything in one go or consider AJAX hope this helps Spiros On Sat, Nov 28, 2009 at 7:53 AM, Roopa Raghuveer wrote: > hello everyone, > > I have a small question. > > I would like to link two cgi scripts i.e., > > I have an input sequence being entered in a text area > > ex:->gi|at442323|... > ATGCCCCCTTGGAACCAAAAAAA.... > > So I would like to compare this with the query sequences.These query > sequences would be from a BLAST script in the module blast.pm > So once I enter the input sequence and request for BLAST using submit > button,my request should go to a program which performs BLAST search.After > this, the sequences obtained from BLAST have to be returned to a program > Roopa.pm which compares the input sequence and the sequences obtained from > blast. > > But I am unable to provide this link between the cgi scripts.(i.e.,one > script to use BLAST,the other script to compare the sequences and send the > results to the browser) > > Could any one help me in this regard? > > Regards, > Roopa. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From maj at fortinbras.us Sat Nov 28 11:23:53 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sat, 28 Nov 2009 11:23:53 -0500 Subject: [Bioperl-l] Run wrappers for BWA and Samtools Message-ID: <7F56A6EEEB0E4EE291D5340F27DF7D3A@NewLife> Hi All, Run wrappers for the bwa assembler and the samtools suite are now available as beta in the bioperl-run/trunk. The bwa wrapper allows you to run a canned assembly pipeline, or to execute individual bwa components. The assembly pipeline can return a Bio::Assembly::Scaffold object via the new Bio::Assembly::IO::sam module in bioperl-live/trunk (this requires lstein's Bio::DB::Sam, from CPAN). Details at http://www.bioperl.org/wiki/HOWTO:Short-read_assemblies_with_BWA and, of course, in the pod. Cheers, MAJ From maj at fortinbras.us Sat Nov 28 21:55:42 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sat, 28 Nov 2009 21:55:42 -0500 Subject: [Bioperl-l] How to parse BLAST output - all hits of eachqueryinnew file In-Reply-To: References: <4B0D6C24.2080308@gmail.com><53DE480F205E42CE8D2B9421592AAF0E@NewLife><815D2A47BC9C4D89B8DEF0B10DA9EAF8@NewLife><4B0EA44D.2050507@gmail.com> Message-ID: <21BFD947CEEF43CCAC8AFFDB7A064A49@NewLife> Hi Tim-- There's a bug in my code; should be for my $hit ($result->hits) { ... } and you're right about the comma. My bad. But I don't think you need this-- you're already looping over your query sequences and doing blastn on each one. So in the middle of your loop, you can simply write the blast result that you got: my $blio = Bio::SearchIO->new( -file => ">".$query->id.".bls", -format=>"blast" ); $blio->write_result($result); and forget about the foreach my $qid loop entirely. The files should show up in the directory from which you're running the script. cheers, MAJ ----- Original Message ----- From: "Tim Koehler" To: Sent: Thursday, November 26, 2009 11:02 AM Subject: Re: [Bioperl-l] How to parse BLAST output - all hits of eachqueryinnew file ups, sent too early... Hey Mark, thanks for the answer. But I am still struggling, especially where to put in your code. Here ist the code I have, so far: #!/usr/bin/perl -w ### should I put your code here as push is a perl command? my %hits_by_query; for ($result->hits) { ### I inserted a comma after name}}; if there is no comma, there was the error: Scalar found where operator expected at 12_BLAST_two_sequence_each_query_one_file.PL line7, near "} $hit" ### (Missing operator before $hit?) ###Useless use of push with no values at 12_BLAST_two_sequence_each_query_one_file.PL line 7. ###syntax error at 12_BLAST_two_sequence_each_query_one_file.PL line 7, near "} $hit" ###BEGIN not safe after errors--compilation aborted at 12_BLAST_two_sequence_each_query_one_file.PL line 13. push @{$hits_by_query{$hit->name}}, $hit; ###here, every time this terror appears: Name "main::result" used only once: possible typo at 12_BLAST_two_sequence_each_query_one_file.PL line 5. ###error: Can't call method "hits" on an undefined value at 12_BLAST_two_sequence_each_query_one_file.PL line 5. } use strict; use Bio::Tools::Run::StandAloneBlast; use Bio::SeqIO; use Bio::SearchIO; use Bio::Search::Result::BlastResult; my $Seq_in = Bio::SeqIO->new ( -file => "/home/koehler/Programs/for_BLAST/BLAST_Pipeline/1_to_BLAST_two_seq.fasta", -format => 'fasta' ); while (my $query = $Seq_in->next_seq()) { my $factory = Bio::Tools::Run::StandAloneBlast->new( 'program' => 'blastn', 'database' => '/home/koehler/Programs/for_BLAST/BLAST_Pipeline/3_BLAST_db', _READMETHOD => "Blast" ); my $blast_report = $factory->blastall($query); ### Should I need to use a module? are the commands here at the right position? errors, e.g., Global symbol "$hit" requires explicit package name #my %hits_by_query; #for ($result->hits) { ### inserted comma after name}} # push @{$hits_by_query{$hit->name}}, $hit; #} foreach my $qid ( keys %hits_by_query ) { my $result = Bio::Search::Result::BlastResult->new(); $result->add_hit($_) for ( @{$hits_by_query{$qid}} ); my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", -format=>'blast' ); $blio->write_result($result); } ###where are the files stored? what is their name. Sorry, but I cannot get behind that :( while( my $result = $blast_report->next_result ) { ## $result is a Bio::Search::Result::ResultI compliant object while( my $hit = $result->next_hit ) { ## $hit is a Bio::Search::Hit::HitI compliant object while( my $hsp = $hit->next_hsp ) { ## $hsp is a Bio::Search::HSP::HSPI compliant object if( $hsp->length('total') > 50 ) { if ( $hsp->percent_identity >= 75 ) { print "Query= ", $result->query_name, "Hit= ", $hit->name, "Length= ", $hsp->length('total'), "Percent_id= ", $hsp->percent_identity, "Subject=", $hsp->hit_string,"\n"; } } } } } } Again, a big thanks in advance :) All the best, Tim On Thu, Nov 26, 2009 at 4:52 PM, Tim wrote: > Hey Mark, > > thanks for the answer > > On 25.11.2009 20:21, Mark A. Jensen wrote: > > whoops: change the following line: > > my $blio = Bio::SearchIO->new( -file => $qid.".bls", -format=>'blast' ); > > > > to > > > > my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", -format=>'blast' ); > > > > (I always forget that...) > > MAJ > > > > ----- Original Message ----- From: "Mark A. Jensen" > > To: "Tim" ; > > Sent: Wednesday, November 25, 2009 1:20 PM > > Subject: Re: [Bioperl-l] How to parse BLAST output - all hits of each > > queryinnew file > > > > > >> hey Tim-- > >> > >> Sound like you need to go about collecting your queries inside out: > >> > >> my %hits_by_query; > >> for ($result->hits) { > >> push @{$hits_by_query{$hit->name}} $hit; > >> } > >> > >> I believe now each hash element, keyed by the query name, will contain > >> an arrayref to the set of hits assoc with that query. > >>> From here, I believe > >> > >> use Bio::Search::Result::BlastResult; > >> use Bio::SearchIO; > >> > >> foreach my $qid ( keys %hits_by_query ) { > >> my $result = Bio::Search::Result::BlastResult->new(); > >> $result->add_hit($_) for ( @{$hits_by_query{$qid}} ); > >> my $blio = Bio::SearchIO->new( -file => $qid.".bls", -format=>'blast' > ); > >> $blio->write_result($result); > >> } > >> > >> will do what you want. > >> > >> hope this helps - > >> Mark > >> > >> ----- Original Message ----- From: "Tim" > >> To: > >> Sent: Wednesday, November 25, 2009 12:40 PM > >> Subject: [Bioperl-l] How to parse BLAST output - all hits of each > >> query innew file > >> > >> > >>> Dear bioperl users, > >>> > >>> I am a real newbie and have - maybe a very trivial - question. > >>> > >>> I searched the mailing list archive and many howtos but I have not > found > >>> a concrete answer to my problem. So hopefully you can help me :) > >>> > >>> Background: I use the latest Bioperl version (installed it two weeks > >>> before). > >>> When I use Bio::Tools::Run::StandAloneBlast to BLAST one fasta file > >>> including different sequences, I get a BLAST output with many queries > >>> each having several hits / sbjcts. > >>> > >>> My problem is how to parse *all* hits of *one* query into a single new > >>> file. And this for all the queries I have in my BLAST output file. > >>> > >>> Or is it better the other way round; first to make fasta files with > only > >>> single sequences inside and BLAST each file? But how can I automize > that > >>> using Bioperl? > >>> > >>> I tried Bio::SearchIO but can only parse all queries and their > >>> respective hits in only one file... > >>> I think iteration is also necessary here, but I do not really know how > >>> to include that into Bio::SearchIO. > >>> Or do I have to use Module:Bio::Index::Blast? > >>> > >>> I can index a file (see below), but I have no idea what comes next... > >>> > >>> ###How I index a file... > >>> > >>> #!/usr/bin/perl -w > >>> > >>> $ENV{BIOPERL_INDEX_TYPE} = "SDBM_File"; > >>> > >>> use Bio::Index::Fasta; > >>> > >>> > >>> $file_name = "8_to_BLAST_two_seq_index.fasta"; > >>> $id = "48882"; > >>> $inx = Bio::Index::Fasta->new (-filename => $file_name . ".idx", > >>> -write_flag => 1); > >>> $inx->make_index($file_name); > >>> > >>> > >>> Hopefully, you can give me at least hints what to look for. > >>> > >>> A big THANKS in advance! > >>> > >>> Cheers, > >>> > >>> Tim > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >>> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > > > > Tim K?hler MPI for Terrestrial Microbiology Karl-von-Frisch-Stra?e D-35043 Marburg / Germany Email: koehlerd at mpi-marburg.mpg.de Phone: +49 6421 178-740 Fax: +49 6421 178-999 _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Sat Nov 28 22:32:42 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sat, 28 Nov 2009 22:32:42 -0500 Subject: [Bioperl-l] HOWTO copyright policy vs FDL on wiki Message-ID: <9EC73CA501BD45BA912F2D77954D6CD7@NewLife> The HOWTOs appear to have a more restrictive copyright than FDL-- in particular, the blurb at the bottom of the HOWTO page asks users to use the documents for personal use only. I'm for this; I think we should therefore have some explicit license for these that specifies this kind of restriction, and then express that on each howto and in BioPerl:Copyright. Any thoughts on the right license and whether this is a good plan? MAJ From florent.angly at gmail.com Sat Nov 28 22:47:45 2009 From: florent.angly at gmail.com (Florent Angly) Date: Sat, 28 Nov 2009 19:47:45 -0800 Subject: [Bioperl-l] More general Bio::Assembly::Contig question (was Bio::Tools::Run::Cap3 usage question) In-Reply-To: <56be91b60911270335s3a50ab0cpb03aabb6660f81dc@mail.gmail.com> References: <56be91b60911270335s3a50ab0cpb03aabb6660f81dc@mail.gmail.com> Message-ID: <4B11EEE1.8070907@gmail.com> Hi Paolo, The aligned reads of a contig are stored in Bio::Assembly::Contigs->{_elem}{READ_NAME}{_seq}. To implement a slice() method, you could retrieve the reads using get_seq_ids(), get_seq_by_name() or get_seq_by_pos(). To retrieve the position of an aligned read in the contig, use get_seq_coord() which returns a Bio::SeqFeature::Generic object (from Bio::Assembly::Contigs->{_elem}{READ_NAME}{_feat}{_aligned_coord:READ_NAME}) on which you can call the start() and end() methods. I'm not entirely sure what Bio::Assembly::Contigs->{_elem}{READ_NAME}{_feat}{_align_clipping:READ_NAME} and {_quality_clipping:READ_NAME} are. I believe that they represent the clear range of the read/contig. Hope it helps, Florent Paolo Pavan wrote: > Dear Florent, > Thank you for your kind answer and for your efforts spent in this module. > Since you are working on these topics I would like to seize the day > and put you some questions about some doubts I have in mind, if you > agree, of course :-) > Some times ago I tried to work with bioperl, loading the data from an > ACE file originated by Newbler; my need was to extract part of the > contig like an alignment of reads and I tought to do it with a slice() > method, since I saw Bio::Assembly::Contig implements Bio::AlignI > interface. Unfortunately I realize that this interface is inherited > but not implemented. > I tried to hack it by adding a slice method which would act on a > Bio::Alignment created from the array of LocatableSeqs representing > the reads. > > This is the question: > If I'm not wrong (please correct me if yes), Bio::Assembly::Contig > class stores reads informations in: > Bio::Assembly::Contigs->{_elem}{READ_NAME}{_feat}{ > _align_clipping:READ_NAME} > _aligned_coord:READ_NAME} > _quality_clipping:READ_NAME} > > Anyone of these 3 features _align_clipping, _aligned_coord, > _quality_clipping, contains a Bio::SeqFeature::Generic, which of them > is more suitable to the purpose expressed before, the slice method? > And more, If you apologize me for being too long, is consequently to > the previous: I don't have perfectly clear the purpose of this 3 > feature per read, can you explain it? > > Really thanks you for the time you would spend. > Bye bye, > Paolo From bimber at wisc.edu Sun Nov 29 00:31:25 2009 From: bimber at wisc.edu (Ben Bimber) Date: Sat, 28 Nov 2009 23:31:25 -0600 Subject: [Bioperl-l] using bioperl to compare sequences Message-ID: <9f985cdc0911282131l350bc525gd9ad4717c101ac63@mail.gmail.com> Hello, I have a couple years programming experience, but am reasonably new to perl and extremely new to bioperl. I have been reading through the bioperl documentation and am trying to understand the best way to approach a particular problem. I'm hoping someone could offer some tips and point me in the right direction. If someone has solved this sort of problem before, i'd prefer not to reinvent things. Here's what I'm trying to do: Our lab generates mRNA sequence data, consisting of alleles of a given gene or genes I want to compare each of these sequences against a reference using BLAST or clustalw (will need the ability to choose at run time) Take the result of this alignment, then record positions of difference between the experimental sequence and reference sequence (SNPs) Translate the corresponding AA change(s) associated with each SNP. There can be overlapping ORFs. I see that bioperl has modules for BLAST and clustal. I've also been looking at the modules under variation. I havent fully wrapped my head around them, but they look to be what i'd use for SNP detection. has anyone has written code to perform similar things and if so, would you be willing to share specific examples? Anything concrete to see exactly how these modules operate would be extremely helpful. Thanks in advance for any tips or help. From jason at bioperl.org Sun Nov 29 10:54:53 2009 From: jason at bioperl.org (Jason Stajich) Date: Sun, 29 Nov 2009 07:54:53 -0800 Subject: [Bioperl-l] How to parse BLAST output - all hits of eachqueryinnew file In-Reply-To: <21BFD947CEEF43CCAC8AFFDB7A064A49@NewLife> References: <4B0D6C24.2080308@gmail.com><53DE480F205E42CE8D2B9421592AAF0E@NewLife><815D2A47BC9C4D89B8DEF0B10DA9EAF8@NewLife><4B0EA44D.2050507@gmail.com> <21BFD947CEEF43CCAC8AFFDB7A064A49@NewLife> Message-ID: <897A8DB4-AF29-4601-A1E5-9A04D9D8C151@bioperl.org> or while( my $hit = $result->next_hit ) { } On Nov 28, 2009, at 6:55 PM, Mark A. Jensen wrote: > Hi Tim-- > There's a bug in my code; should be > for my $hit ($result->hits) { > ... > } > and you're right about the comma. My bad. > > But I don't think you need this-- you're already looping over your > query sequences and doing blastn on each one. So in the middle of > your loop, you can simply write the blast result that you got: > > my $blio = Bio::SearchIO->new( -file => ">".$query->id.".bls", - > format=>"blast" ); > $blio->write_result($result); > > and forget about the foreach my $qid loop entirely. > > The files should show up in the directory from which you're > running the script. > cheers, MAJ > > > > ----- Original Message ----- From: "Tim Koehler" > > To: > Sent: Thursday, November 26, 2009 11:02 AM > Subject: Re: [Bioperl-l] How to parse BLAST output - all hits of > eachqueryinnew file > > > ups, sent too early... > > Hey Mark, > > thanks for the answer. But I am still struggling, especially where > to put in > your code. > > Here ist the code I have, so far: > > #!/usr/bin/perl -w > > ### should I put your code here as push is a perl command? > my %hits_by_query; > for ($result->hits) { > ### I inserted a comma after name}}; if there is no comma, there was > the > error: Scalar found where operator expected at > 12_BLAST_two_sequence_each_query_one_file.PL line7, near "} $hit" > ### (Missing operator before $hit?) > ###Useless use of push with no values at > 12_BLAST_two_sequence_each_query_one_file.PL line 7. > ###syntax error at 12_BLAST_two_sequence_each_query_one_file.PL line > 7, near > "} $hit" > ###BEGIN not safe after errors--compilation aborted at > 12_BLAST_two_sequence_each_query_one_file.PL line 13. > push @{$hits_by_query{$hit->name}}, $hit; > ###here, every time this terror appears: Name "main::result" used > only once: > possible typo at 12_BLAST_two_sequence_each_query_one_file.PL line 5. > ###error: Can't call method "hits" on an undefined value at > 12_BLAST_two_sequence_each_query_one_file.PL line 5. > } > > > use strict; > use Bio::Tools::Run::StandAloneBlast; > use Bio::SeqIO; > use Bio::SearchIO; > use Bio::Search::Result::BlastResult; > > my $Seq_in = Bio::SeqIO->new ( > -file => > "/home/koehler/Programs/for_BLAST/BLAST_Pipeline/ > 1_to_BLAST_two_seq.fasta", > -format => 'fasta' > ); > while (my $query = $Seq_in->next_seq()) { > my $factory = Bio::Tools::Run::StandAloneBlast->new( > 'program' => 'blastn', > 'database' => '/home/koehler/Programs/for_BLAST/BLAST_Pipeline/ > 3_BLAST_db', > _READMETHOD => "Blast" > ); > > my $blast_report = $factory->blastall($query); > > ### Should I need to use a module? are the commands here at the right > position? errors, e.g., Global symbol "$hit" requires explicit > package name > #my %hits_by_query; > #for ($result->hits) { > ### inserted comma after name}} > # push @{$hits_by_query{$hit->name}}, $hit; > #} > > foreach my $qid ( keys %hits_by_query ) { > my $result = Bio::Search::Result::BlastResult->new(); > $result->add_hit($_) for ( @{$hits_by_query{$qid}} ); > my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", - > format=>'blast' ); > $blio->write_result($result); > } > > ###where are the files stored? what is their name. Sorry, but I > cannot get > behind that :( > > while( my $result = $blast_report->next_result ) { > ## $result is a Bio::Search::Result::ResultI compliant object > while( my $hit = $result->next_hit ) { > ## $hit is a Bio::Search::Hit::HitI compliant object > while( my $hsp = $hit->next_hsp ) { > ## $hsp is a Bio::Search::HSP::HSPI compliant object > if( $hsp->length('total') > 50 ) { > if ( $hsp->percent_identity >= 75 ) { > print "Query= ", $result->query_name, > "Hit= ", $hit->name, > "Length= ", $hsp->length('total'), > "Percent_id= ", $hsp->percent_identity, > "Subject=", $hsp->hit_string,"\n"; > } > } > } > } > } > } > > Again, a big thanks in advance :) > > All the best, > > Tim > > > On Thu, Nov 26, 2009 at 4:52 PM, Tim wrote: > >> Hey Mark, >> >> thanks for the answer >> >> On 25.11.2009 20:21, Mark A. Jensen wrote: >> > whoops: change the following line: >> > my $blio = Bio::SearchIO->new( -file => $qid.".bls", - >> format=>'blast' ); >> > >> > to >> > >> > my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", - >> format=>'blast' ); >> > >> > (I always forget that...) >> > MAJ >> > >> > ----- Original Message ----- From: "Mark A. Jensen" > > >> > To: "Tim" ; >> > Sent: Wednesday, November 25, 2009 1:20 PM >> > Subject: Re: [Bioperl-l] How to parse BLAST output - all hits of >> each >> > queryinnew file >> > >> > >> >> hey Tim-- >> >> >> >> Sound like you need to go about collecting your queries inside >> out: >> >> >> >> my %hits_by_query; >> >> for ($result->hits) { >> >> push @{$hits_by_query{$hit->name}} $hit; >> >> } >> >> >> >> I believe now each hash element, keyed by the query name, will >> contain >> >> an arrayref to the set of hits assoc with that query. >> >>> From here, I believe >> >> >> >> use Bio::Search::Result::BlastResult; >> >> use Bio::SearchIO; >> >> >> >> foreach my $qid ( keys %hits_by_query ) { >> >> my $result = Bio::Search::Result::BlastResult->new(); >> >> $result->add_hit($_) for ( @{$hits_by_query{$qid}} ); >> >> my $blio = Bio::SearchIO->new( -file => $qid.".bls", - >> format=>'blast' >> ); >> >> $blio->write_result($result); >> >> } >> >> >> >> will do what you want. >> >> >> >> hope this helps - >> >> Mark >> >> >> >> ----- Original Message ----- From: "Tim" >> >> To: >> >> Sent: Wednesday, November 25, 2009 12:40 PM >> >> Subject: [Bioperl-l] How to parse BLAST output - all hits of each >> >> query innew file >> >> >> >> >> >>> Dear bioperl users, >> >>> >> >>> I am a real newbie and have - maybe a very trivial - question. >> >>> >> >>> I searched the mailing list archive and many howtos but I have >> not >> found >> >>> a concrete answer to my problem. So hopefully you can help me :) >> >>> >> >>> Background: I use the latest Bioperl version (installed it two >> weeks >> >>> before). >> >>> When I use Bio::Tools::Run::StandAloneBlast to BLAST one fasta >> file >> >>> including different sequences, I get a BLAST output with many >> queries >> >>> each having several hits / sbjcts. >> >>> >> >>> My problem is how to parse *all* hits of *one* query into a >> single new >> >>> file. And this for all the queries I have in my BLAST output >> file. >> >>> >> >>> Or is it better the other way round; first to make fasta files >> with >> only >> >>> single sequences inside and BLAST each file? But how can I >> automize >> that >> >>> using Bioperl? >> >>> >> >>> I tried Bio::SearchIO but can only parse all queries and their >> >>> respective hits in only one file... >> >>> I think iteration is also necessary here, but I do not really >> know how >> >>> to include that into Bio::SearchIO. >> >>> Or do I have to use Module:Bio::Index::Blast? >> >>> >> >>> I can index a file (see below), but I have no idea what comes >> next... >> >>> >> >>> ###How I index a file... >> >>> >> >>> #!/usr/bin/perl -w >> >>> >> >>> $ENV{BIOPERL_INDEX_TYPE} = "SDBM_File"; >> >>> >> >>> use Bio::Index::Fasta; >> >>> >> >>> >> >>> $file_name = "8_to_BLAST_two_seq_index.fasta"; >> >>> $id = "48882"; >> >>> $inx = Bio::Index::Fasta->new (-filename => $file_name . ".idx", >> >>> -write_flag => 1); >> >>> $inx->make_index($file_name); >> >>> >> >>> >> >>> Hopefully, you can give me at least hints what to look for. >> >>> >> >>> A big THANKS in advance! >> >>> >> >>> Cheers, >> >>> >> >>> Tim >> >>> _______________________________________________ >> >>> Bioperl-l mailing list >> >>> Bioperl-l at lists.open-bio.org >> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >>> >> >>> >> >> >> >> _______________________________________________ >> >> Bioperl-l mailing list >> >> Bioperl-l at lists.open-bio.org >> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> >> >> > >> >> Tim K?hler > MPI for Terrestrial Microbiology > Karl-von-Frisch-Stra?e > D-35043 Marburg / Germany > > Email: koehlerd at mpi-marburg.mpg.de > Phone: +49 6421 178-740 > Fax: +49 6421 178-999 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From suzi at berkeleybop.org Sun Nov 29 23:03:09 2009 From: suzi at berkeleybop.org (Suzanna Lewis) Date: Sun, 29 Nov 2009 20:03:09 -0800 Subject: [Bioperl-l] [DAS] DAS workshop 7th-9th April 2010 In-Reply-To: References: Message-ID: <3AD3C819-4BAA-4D90-B141-9611F48C5CAD@ berkeleybop.org> I/we (Gregg) would be interested in attending. We'd present an update on the collaborative, web-based version of Apollo. We will be working with Ian Holmes and Mitch Skinner using JBrowse for basic display. -S On Nov 26, 2009, at 6:57 AM, Jonathan Warren wrote: > We are considering running a Distributed Annotation System workshop here at the Sanger/EBI in the UK subject to decent demand. > The workshop will be held from Wednesday 7th-Friday 9th April 2010. If you would be interested in attending either to present or just take part > then please email me jw12 at sanger.ac.uk > > The format of the workshop is likely to be similar to last years (1st day for beginners, 2nd for both beginners and advanced users, 3rd day for advanced), information for which can be found here: > http://www.dasregistry.org/course.jsp > > If you would like to present then please send a short summary of what you would like to talk about. > > Thanks > > Jonathan. > > Jonathan Warren > Senior Developer and DAS coordinator > jw12 at sanger.ac.uk > > > > > > > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome ResearchLimited, a charity registered in England with number 1021457 and acompany registered in England with number 2742969, whose registeredoffice is 215 Euston Road, London, NW1 2BE._______________________________________________ > DAS mailing list > DAS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das > From maj at fortinbras.us Mon Nov 30 09:31:27 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 30 Nov 2009 09:31:27 -0500 Subject: [Bioperl-l] HOWTO copyright policy vs FDL on wiki In-Reply-To: <81B3C4A1-9F14-4FF9-A4AF-F7E90817A2F1@verizon.net> References: <9EC73CA501BD45BA912F2D77954D6CD7@NewLife> <81B3C4A1-9F14-4FF9-A4AF-F7E90817A2F1@verizon.net> Message-ID: <513F1C824EF84974993A76F0CC719CDF@NewLife> Well, it has a history, Jason's point. So the question could be: "is this still a valid issue"? A while back, a user on the wiki, with natural and good intentions, removed the authorship and revision info from a couple of the HOWTOs; it is more wiki-like, after all. But Chris had some objections to that, which I seconded, mainly on the basis of the special status that seems implied by the copyright note on the HOWTO page. I also think that the nature of the howto is somewhat different from other info on the site -- that developers themselves put a lot of time in to explaining how to use their modules, and that in this world where devs get paid by recognition, it is a reasonable thing to allow this extra horn-tooting. Now, that is a policy that could be completely separable from the issue of copyright. However, devs may also get paid by using their materials in teaching seminars. The dilemma would be that people who like to use the wiki are people who like to share, and so it feels unnatural to withhold from the community the materials they develop, but people who like to share also like to eat and wear shoes... so I'm interested in everyone's thoughts about it. ----- Original Message ----- From: "Brian Osborne" To: "Mark A. Jensen" Cc: "Chris Fields" ; "Jason Stajich" ; "bioperl List" Sent: Monday, November 30, 2009 9:16 AM Subject: Re: [Bioperl-l] HOWTO copyright policy vs FDL on wiki > Mark, > > Let me ask you a question, and don't take this question as an implicit > criticism of your suggestion, it is not. Why would you want this more > restrictive copyright? > > Brian O. > > On Nov 28, 2009, at 10:32 PM, Mark A. Jensen wrote: > >> The HOWTOs appear to have a more restrictive copyright >> than FDL-- in particular, the blurb at the bottom of the >> HOWTO page asks users to use the documents for personal >> use only. I'm for this; I think we should therefore have some >> explicit license for these that specifies this kind of restriction, >> and then express that on each howto and in BioPerl:Copyright. >> Any thoughts on the right license and whether this is a good plan? >> MAJ >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From bosborne11 at verizon.net Mon Nov 30 10:15:32 2009 From: bosborne11 at verizon.net (Brian Osborne) Date: Mon, 30 Nov 2009 10:15:32 -0500 Subject: [Bioperl-l] HOWTO copyright policy vs FDL on wiki In-Reply-To: <513F1C824EF84974993A76F0CC719CDF@NewLife> References: <9EC73CA501BD45BA912F2D77954D6CD7@NewLife> <81B3C4A1-9F14-4FF9-A4AF-F7E90817A2F1@verizon.net> <513F1C824EF84974993A76F0CC719CDF@NewLife> Message-ID: <54671455-A02C-4139-8C39-AC17B50D5CE6@verizon.net> Mark, I have no objection to a more restrictive copyright, and I also have no objection to using FDL, or things like it. Brian O. On Nov 30, 2009, at 9:31 AM, Mark A. Jensen wrote: > Well, it has a history, Jason's point. So the question could > be: "is this still a valid issue"? A while back, a user on the wiki, > with natural and good intentions, removed the authorship and revision > info from a couple of the HOWTOs; it is more wiki-like, > after all. But Chris had some objections to that, which I > seconded, mainly on the basis of the special status that > seems implied by the copyright note on the HOWTO > page. I also think that the nature of the howto is somewhat > different from other info on the site -- that developers themselves > put a lot of time in to explaining how to use their modules, and > that in this world where devs get paid by recognition, it is a > reasonable > thing to allow this extra horn-tooting. Now, that is a policy > that could be completely separable from the issue of copyright. > However, devs may also get paid by using their materials in teaching > seminars. The dilemma would be that people who like to use the > wiki are people who like to share, and so it feels unnatural to > withhold from the community the materials they develop, but > people who like to share also like to eat and wear shoes... > so I'm interested in everyone's thoughts about it. > ----- Original Message ----- From: "Brian Osborne" > > To: "Mark A. Jensen" > Cc: "Chris Fields" ; "Jason Stajich" >; "bioperl List" > Sent: Monday, November 30, 2009 9:16 AM > Subject: Re: [Bioperl-l] HOWTO copyright policy vs FDL on wiki > > >> Mark, >> >> Let me ask you a question, and don't take this question as an >> implicit criticism of your suggestion, it is not. Why would you >> want this more restrictive copyright? >> >> Brian O. >> >> On Nov 28, 2009, at 10:32 PM, Mark A. Jensen wrote: >> >>> The HOWTOs appear to have a more restrictive copyright >>> than FDL-- in particular, the blurb at the bottom of the >>> HOWTO page asks users to use the documents for personal >>> use only. I'm for this; I think we should therefore have some >>> explicit license for these that specifies this kind of restriction, >>> and then express that on each howto and in BioPerl:Copyright. >>> Any thoughts on the right license and whether this is a good plan? >>> MAJ >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > From bosborne11 at verizon.net Mon Nov 30 09:16:07 2009 From: bosborne11 at verizon.net (Brian Osborne) Date: Mon, 30 Nov 2009 09:16:07 -0500 Subject: [Bioperl-l] HOWTO copyright policy vs FDL on wiki In-Reply-To: <9EC73CA501BD45BA912F2D77954D6CD7@NewLife> References: <9EC73CA501BD45BA912F2D77954D6CD7@NewLife> Message-ID: <81B3C4A1-9F14-4FF9-A4AF-F7E90817A2F1@verizon.net> Mark, Let me ask you a question, and don't take this question as an implicit criticism of your suggestion, it is not. Why would you want this more restrictive copyright? Brian O. On Nov 28, 2009, at 10:32 PM, Mark A. Jensen wrote: > The HOWTOs appear to have a more restrictive copyright > than FDL-- in particular, the blurb at the bottom of the > HOWTO page asks users to use the documents for personal > use only. I'm for this; I think we should therefore have some > explicit license for these that specifies this kind of restriction, > and then express that on each howto and in BioPerl:Copyright. > Any thoughts on the right license and whether this is a good plan? > MAJ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Mon Nov 30 12:41:44 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 30 Nov 2009 12:41:44 -0500 Subject: [Bioperl-l] How to parse BLAST output - all hits of each queryinnew file In-Reply-To: References: <4B0D6C24.2080308@gmail.com> <53DE480F205E42CE8D2B9421592AAF0E@NewLife> <815D2A47BC9C4D89B8DEF0B10DA9EAF8@NewLife> <4B0EA44D.2050507@gmail.com> <18DF7D20DFEC044098A1062202F5FFF32B630E6C53@exchsth.agresearch.co.nz> <52D67F20A9CB4953B86FF794ADE0BE96@NewLife> <18DF7D20DFEC044098A1062202F5FFF32B630E6D05@exchsth.agresearch.co.nz> Message-ID: <8C288FEF9CEB4055B0CDD19267FBA26C@NewLife> thanks Tim! corrected (I hope) in r16432... MAJ ----- Original Message ----- From: Tim Koehler To: Smithies, Russell Cc: Mark A. Jensen ; bioperl-l at lists.open-bio.org Sent: Monday, November 30, 2009 12:23 PM Subject: Re: [Bioperl-l] How to parse BLAST output - all hits of each queryinnew file Hello everybody, thanks a lot for the overwhelming answers! All these codes are different flavors and worked all. For me the added code works the best. But I think I found a bug in ...Bio/SearchIO/blast.pm. There the DEFAULT_BLAST_... variable is set to Bio::Search::Writer::HitTableWriter instead of Bio::SearchIO::Writer::HitTableWriter. This variable I changed also to HTMLResultWriter and others. So again: THANKS for the support! Cheers, Tim #!/usr/bin/perl -w use strict; use Bio::Tools::Run::StandAloneBlast; use Bio::SeqIO; use Bio::SearchIO; ### add here the writer you want use Bio::SearchIO::Writer::HitTableWriter; use Bio::Search::Result::BlastResult; use Data::Dumper; my $Seq_in = Bio::SeqIO->new( -file => "/home/koehler/Programs/for_BLAST/1_to_BLAST_two_seq.fasta", -format => "fasta" ); while ( my $query = $Seq_in->next_seq() ) { warn "Processing ",$query->id, "\n"; my $factory = Bio::Tools::Run::StandAloneBlast->new( program => "blastn", database => "/home/koehler/Programs/for_BLAST/BLAST_Pipeline/3_BLAST_db", _READMETHOD => "Blast" ); my $blast_report = $factory->blastall($query); sleep 5; # just write the result we got for this query into a #new blast-formatted file...named after the id of the query seq... my $result = $blast_report->next_result; my $blio = Bio::SearchIO->new( -file => ">".$query->id.".bls", -format => "blast" ) or die $!; $blio->write_result($result); # below, just looking at the current blast result ###this does not appear in the output files while ( my $result = $blast_report->next_result ) { ## $result is a Bio::Search::Result::ResultI compliant object while ( my $hit = $result->next_hit ) { ## $hit is a Bio::Search::Hit::HitI compliant object while ( my $hsp = $hit->next_hsp ) { ## $hsp is a Bio::Search::HSP::HSPI compliant object if ( $hsp->length('total') > 50 ) { if ( $hsp->percent_identity >= 75 ) { print "Query= ", $result->query_name, "Hit= ", $hit->name, "Length= ", $hsp->length('total'), "Percent_id= ", $hsp->percent_identity, "Subject=", $hsp->hit_string, "\n"; } } } } } } On Sun, Nov 29, 2009 at 11:29 PM, Smithies, Russell wrote: Changed it to a generic result and added a writer and it seems tio work: foreach my $qid ( keys %hits_by_query ) { warn "qid = $qid\n"; my $res = Bio::Search::Result::GenericResult->new(-algorithm => "blastn") or die $!; # print Dumper $res; foreach my $h ( @{ $hits_by_query{$qid} } ){ warn "adding hit ", $h->name, "\n"; $res->add_hit($h) if defined($h); } my $writerhtml = Bio::SearchIO::Writer::HTMLResultWriter->new(); my $blio = Bio::SearchIO->new(-writer => $writerhtml, -file => ">$qid\.bls\.html", -format => "blast" ) or die $!; $blio->write_result($res); } From: Mark A. Jensen [mailto:maj at fortinbras.us] Sent: Monday, 30 November 2009 10:19 a.m. To: Smithies, Russell; 'Tim Koehler' Subject: Re: [Bioperl-l] How to parse BLAST output - all hits of each queryinnew file My thought here was that since Tim's already going one at a time thru his queries, my scrap was not really necessary: use strict; use Bio::Tools::Run::StandAloneBlast; use Bio::SeqIO; use Bio::SearchIO; use Bio::Search::Result::BlastResult; use Data::Dumper; my $Seq_in = Bio::SeqIO->new( -file => "sequences.fasta", -format => "fasta" ); while ( my $query = $Seq_in->next_seq() ) { warn "Processing ",$query->id, "\n"; my $factory = Bio::Tools::Run::StandAloneBlast->new( program => "blastn", database => "/data/databases/flatfile/illuminati_blastdata/nt", _READMETHOD => "Blast" ); my $blast_report = $factory->blastall($query); sleep 5; # just write the result we got for this query into a #new blast-formatted file...named after the id of the query seq... my $result = $blast_report->next_result; my $blio = Bio::SearchIO->new( -file => ">".$query->id.".bls", -format => "blast" ) or die $!; $blio->write_result($result); # below, just looking at the current blast result while ( my $result = $blast_report->next_result ) { ## $result is a Bio::Search::Result::ResultI compliant object while ( my $hit = $result->next_hit ) { ## $hit is a Bio::Search::Hit::HitI compliant object while ( my $hsp = $hit->next_hsp ) { ## $hsp is a Bio::Search::HSP::HSPI compliant object if ( $hsp->length('total') > 50 ) { if ( $hsp->percent_identity >= 75 ) { print "Query= ", $result->query_name, "Hit= ", $hit->name, "Length= ", $hsp->length('total'), "Percent_id= ", $hsp->percent_identity, "Subject=", $hsp->hit_string, "\n"; } } } } } } ----- Original Message ----- From: Smithies, Russell To: 'Tim Koehler' ; 'maj at fortinbras.us' Sent: Sunday, November 29, 2009 3:58 PM Subject: RE: [Bioperl-l] How to parse BLAST output - all hits of each queryinnew file Hi Tim With various people writing the ?howtos? and other docs, the examples are bound to have differing names for the variables used but as long as you?re consistent, it should all fit together. I think I?ve almost got your code working, just getting errors from Bio::Search::Result::BlastResult which I?m not entirely sure how to use. Perhaps Mark can get this bit going? --Russell =============================== use strict; use Bio::Tools::Run::StandAloneBlast; use Bio::SeqIO; use Bio::SearchIO; use Bio::Search::Result::BlastResult; use Data::Dumper; my $Seq_in = Bio::SeqIO->new( -file => "sequences.fasta", -format => "fasta" ); while ( my $query = $Seq_in->next_seq() ) { warn "Processing ",$query->id, "\n"; my $factory = Bio::Tools::Run::StandAloneBlast->new( program => "blastn", database => "/data/databases/flatfile/illuminati_blastdata/nt", _READMETHOD => "Blast" ); my $blast_report = $factory->blastall($query); sleep 5; my %hits_by_query; while ( my $result = $blast_report->next_result ) { foreach my $hit ( $result->hits ) { warn "Pushed a hit for ",$hit->name, "\n"; push( @{ $hits_by_query{ $hit->name } }, $hit ); } } foreach my $qid ( keys %hits_by_query ) { warn "qid = $qid\n"; my $res = Bio::Search::Result::BlastResult->new() or die $!; print Dumper $res; foreach my $h ( @{ $hits_by_query{$qid} } ){ warn "adding hit ", $h->name, "\n"; $res->add_hit($h) if defined($h); } my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", -format => "blast" ) or die $!; $blio->write_result($res); } while ( my $result = $blast_report->next_result ) { ## $result is a Bio::Search::Result::ResultI compliant object while ( my $hit = $result->next_hit ) { ## $hit is a Bio::Search::Hit::HitI compliant object while ( my $hsp = $hit->next_hsp ) { ## $hsp is a Bio::Search::HSP::HSPI compliant object if ( $hsp->length('total') > 50 ) { if ( $hsp->percent_identity >= 75 ) { print "Query= ", $result->query_name, "Hit= ", $hit->name, "Length= ", $hsp->length('total'), "Percent_id= ", $hsp->percent_identity, "Subject=", $hsp->hit_string, "\n"; } } } } } } =============================== From: Tim Koehler [mailto:timbourine81 at googlemail.com] Sent: Friday, 27 November 2009 10:24 p.m. To: Smithies, Russell; maj at fortinbras.us Subject: Re: [Bioperl-l] How to parse BLAST output - all hits of each queryinnew file Hey guys, please, do not get me wrong that I wanted to put the workload on you. So far I only found the HowTo's but in there in some way the language changed with time (e.g. $in to $Seq_in) or some things I simply could not find. Now I got a tip where else to search: the scrapbook and deobfuscator. I immediately will have a look at that. This is the first time for me touching linux / perl commands; that's why I thought after several days of trial and many errors ;) asking the mailinglist. I was very happy about your fast answers! Cheers and a nice weekend, Tim On Thu, Nov 26, 2009 at 5:02 PM, Tim Koehler wrote: ups, sent too early... Hey Mark, thanks for the answer. But I am still struggling, especially where to put in your code. Here ist the code I have, so far: #!/usr/bin/perl -w ### should I put your code here as push is a perl command? my %hits_by_query; for ($result->hits) { ### I inserted a comma after name}}; if there is no comma, there was the error: Scalar found where operator expected at 12_BLAST_two_sequence_each_query_one_file.PL line7, near "} $hit" ### (Missing operator before $hit?) ###Useless use of push with no values at 12_BLAST_two_sequence_each_query_one_file.PL line 7. ###syntax error at 12_BLAST_two_sequence_each_query_one_file.PL line 7, near "} $hit" ###BEGIN not safe after errors--compilation aborted at 12_BLAST_two_sequence_each_query_one_file.PL line 13. push @{$hits_by_query{$hit->name}}, $hit; ###here, every time this terror appears: Name "main::result" used only once: possible typo at 12_BLAST_two_sequence_each_query_one_file.PL line 5. ###error: Can't call method "hits" on an undefined value at 12_BLAST_two_sequence_each_query_one_file.PL line 5. } use strict; use Bio::Tools::Run::StandAloneBlast; use Bio::SeqIO; use Bio::SearchIO; use Bio::Search::Result::BlastResult; my $Seq_in = Bio::SeqIO->new ( -file => "/home/koehler/Programs/for_BLAST/BLAST_Pipeline/1_to_BLAST_two_seq.fasta", -format => 'fasta' ); while (my $query = $Seq_in->next_seq()) { my $factory = Bio::Tools::Run::StandAloneBlast->new( 'program' => 'blastn', 'database' => '/home/koehler/Programs/for_BLAST/BLAST_Pipeline/3_BLAST_db', _READMETHOD => "Blast" ); my $blast_report = $factory->blastall($query); ### Should I need to use a module? are the commands here at the right position? errors, e.g., Global symbol "$hit" requires explicit package name #my %hits_by_query; #for ($result->hits) { ### inserted comma after name}} # push @{$hits_by_query{$hit->name}}, $hit; #} foreach my $qid ( keys %hits_by_query ) { my $result = Bio::Search::Result::BlastResult->new(); $result->add_hit($_) for ( @{$hits_by_query{$qid}} ); my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", -format=>'blast' ); $blio->write_result($result); } ###where are the files stored? what is their name. Sorry, but I cannot get behind that :( while( my $result = $blast_report->next_result ) { ## $result is a Bio::Search::Result::ResultI compliant object while( my $hit = $result->next_hit ) { ## $hit is a Bio::Search::Hit::HitI compliant object while( my $hsp = $hit->next_hsp ) { ## $hsp is a Bio::Search::HSP::HSPI compliant object if( $hsp->length('total') > 50 ) { if ( $hsp->percent_identity >= 75 ) { print "Query= ", $result->query_name, "Hit= ", $hit->name, "Length= ", $hsp->length('total'), "Percent_id= ", $hsp->percent_identity, "Subject=", $hsp->hit_string,"\n"; } } } } } } Again, a big thanks in advance :) All the best, Tim On Thu, Nov 26, 2009 at 4:52 PM, Tim wrote: Hey Mark, thanks for the answer On 25.11.2009 20:21, Mark A. Jensen wrote: > whoops: change the following line: > my $blio = Bio::SearchIO->new( -file => $qid.".bls", -format=>'blast' ); > > to > > my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", -format=>'blast' ); > > (I always forget that...) > MAJ > > ----- Original Message ----- From: "Mark A. Jensen" > To: "Tim" ; > Sent: Wednesday, November 25, 2009 1:20 PM > Subject: Re: [Bioperl-l] How to parse BLAST output - all hits of each > queryinnew file > > >> hey Tim-- >> >> Sound like you need to go about collecting your queries inside out: >> >> my %hits_by_query; >> for ($result->hits) { >> push @{$hits_by_query{$hit->name}} $hit; >> } >> >> I believe now each hash element, keyed by the query name, will contain >> an arrayref to the set of hits assoc with that query. >>> From here, I believe >> >> use Bio::Search::Result::BlastResult; >> use Bio::SearchIO; >> >> foreach my $qid ( keys %hits_by_query ) { >> my $result = Bio::Search::Result::BlastResult->new(); >> $result->add_hit($_) for ( @{$hits_by_query{$qid}} ); >> my $blio = Bio::SearchIO->new( -file => $qid.".bls", -format=>'blast' ); >> $blio->write_result($result); >> } >> >> will do what you want. >> >> hope this helps - >> Mark >> >> ----- Original Message ----- From: "Tim" >> To: >> Sent: Wednesday, November 25, 2009 12:40 PM >> Subject: [Bioperl-l] How to parse BLAST output - all hits of each >> query innew file >> >> >>> Dear bioperl users, >>> >>> I am a real newbie and have - maybe a very trivial - question. >>> >>> I searched the mailing list archive and many howtos but I have not found >>> a concrete answer to my problem. So hopefully you can help me :) >>> >>> Background: I use the latest Bioperl version (installed it two weeks >>> before). >>> When I use Bio::Tools::Run::StandAloneBlast to BLAST one fasta file >>> including different sequences, I get a BLAST output with many queries >>> each having several hits / sbjcts. >>> >>> My problem is how to parse *all* hits of *one* query into a single new >>> file. And this for all the queries I have in my BLAST output file. >>> >>> Or is it better the other way round; first to make fasta files with only >>> single sequences inside and BLAST each file? But how can I automize that >>> using Bioperl? >>> >>> I tried Bio::SearchIO but can only parse all queries and their >>> respective hits in only one file... >>> I think iteration is also necessary here, but I do not really know how >>> to include that into Bio::SearchIO. >>> Or do I have to use Module:Bio::Index::Blast? >>> >>> I can index a file (see below), but I have no idea what comes next... >>> >>> ###How I index a file... >>> >>> #!/usr/bin/perl -w >>> >>> $ENV{BIOPERL_INDEX_TYPE} = "SDBM_File"; >>> >>> use Bio::Index::Fasta; >>> >>> >>> $file_name = "8_to_BLAST_two_seq_index.fasta"; >>> $id = "48882"; >>> $inx = Bio::Index::Fasta->new (-filename => $file_name . ".idx", >>> -write_flag => 1); >>> $inx->make_index($file_name); >>> >>> >>> Hopefully, you can give me at least hints what to look for. >>> >>> A big THANKS in advance! >>> >>> Cheers, >>> >>> Tim >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > Tim K?hler MPI for Terrestrial Microbiology Karl-von-Frisch-Stra?e D-35043 Marburg / Germany Email: koehlerd at mpi-marburg.mpg.de Phone: +49 6421 178-740 Fax: +49 6421 178-999 -------------------------------------------------------------------------- Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. -------------------------------------------------------------------------- From timbourine81 at googlemail.com Mon Nov 30 12:23:58 2009 From: timbourine81 at googlemail.com (Tim Koehler) Date: Mon, 30 Nov 2009 18:23:58 +0100 Subject: [Bioperl-l] How to parse BLAST output - all hits of each queryinnew file In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32B630E6D05@exchsth.agresearch.co.nz> References: <4B0D6C24.2080308@gmail.com> <53DE480F205E42CE8D2B9421592AAF0E@NewLife> <815D2A47BC9C4D89B8DEF0B10DA9EAF8@NewLife> <4B0EA44D.2050507@gmail.com> <18DF7D20DFEC044098A1062202F5FFF32B630E6C53@exchsth.agresearch.co.nz> <52D67F20A9CB4953B86FF794ADE0BE96@NewLife> <18DF7D20DFEC044098A1062202F5FFF32B630E6D05@exchsth.agresearch.co.nz> Message-ID: Hello everybody, thanks a lot for the overwhelming answers! All these codes are different flavors and worked all. For me the added code works the best. But I think I found a bug in ...Bio/SearchIO/blast.pm. There the DEFAULT_BLAST_... variable is set to Bio::Search::Writer::HitTableWriter instead of Bio::SearchIO::Writer::HitTableWriter. This variable I changed also to HTMLResultWriter and others. So again: THANKS for the support! Cheers, Tim #!/usr/bin/perl -w use strict; use Bio::Tools::Run::StandAloneBlast; use Bio::SeqIO; use Bio::SearchIO; ### add here the writer you want use Bio::SearchIO::Writer::HitTableWriter; use Bio::Search::Result::BlastResult; use Data::Dumper; my $Seq_in = Bio::SeqIO->new( -file => "/home/koehler/Programs/for_BLAST/1_to_BLAST_two_seq.fasta", -format => "fasta" ); while ( my $query = $Seq_in->next_seq() ) { warn "Processing ",$query->id, "\n"; my $factory = Bio::Tools::Run::StandAloneBlast->new( program => "blastn", database => "/home/koehler/Programs/for_BLAST/BLAST_Pipeline/3_BLAST_db", _READMETHOD => "Blast" ); my $blast_report = $factory->blastall($query); sleep 5; # just write the result we got for this query into a #new blast-formatted file...named after the id of the query seq... my $result = $blast_report->next_result; my $blio = Bio::SearchIO->new( -file => ">".$query->id.".bls", -format => "blast" ) or die $!; $blio->write_result($result); # below, just looking at the current blast result ###this does not appear in the output files while ( my $result = $blast_report->next_result ) { ## $result is a Bio::Search::Result::ResultI compliant object while ( my $hit = $result->next_hit ) { ## $hit is a Bio::Search::Hit::HitI compliant object while ( my $hsp = $hit->next_hsp ) { ## $hsp is a Bio::Search::HSP::HSPI compliant object if ( $hsp->length('total') > 50 ) { if ( $hsp->percent_identity >= 75 ) { print "Query= ", $result->query_name, "Hit= ", $hit->name, "Length= ", $hsp->length('total'), "Percent_id= ", $hsp->percent_identity, "Subject=", $hsp->hit_string, "\n"; } } } } } } On Sun, Nov 29, 2009 at 11:29 PM, Smithies, Russell < Russell.Smithies at agresearch.co.nz> wrote: > Changed it to a generic result and added a writer and it seems tio work: > > > > foreach my $qid ( keys %hits_by_query ) { > > warn "qid = $qid\n"; > > my $res = Bio::Search::Result::GenericResult->new(-algorithm => > "blastn") or die $!; > > # print Dumper $res; > > foreach my $h ( @{ $hits_by_query{$qid} } ){ > > warn "adding hit ", $h->name, "\n"; > > $res->add_hit($h) if defined($h); > > } > > my $writerhtml = Bio::SearchIO::Writer::HTMLResultWriter->new(); > > my $blio = Bio::SearchIO->new(-writer => $writerhtml, -file => > ">$qid\.bls\.html", -format => "blast" ) or die $!; > > $blio->write_result($res); > > } > > > > > > *From:* Mark A. Jensen [mailto:maj at fortinbras.us] > *Sent:* Monday, 30 November 2009 10:19 a.m. > *To:* Smithies, Russell; 'Tim Koehler' > > *Subject:* Re: [Bioperl-l] How to parse BLAST output - all hits of each > queryinnew file > > > > My thought here was that since Tim's already going one at a time thru > > his queries, my scrap was not really necessary: > > > > use strict; > > use Bio::Tools::Run::StandAloneBlast; > > use Bio::SeqIO; > > use Bio::SearchIO; > > use Bio::Search::Result::BlastResult; > > > > use Data::Dumper; > > > > my $Seq_in = Bio::SeqIO->new( -file => "sequences.fasta", > > -format => "fasta" ); > > > > while ( my $query = $Seq_in->next_seq() ) { > > warn "Processing ",$query->id, "\n"; > > my $factory = > > Bio::Tools::Run::StandAloneBlast->new( > > program => "blastn", > > database => > "/data/databases/flatfile/illuminati_blastdata/nt", > > _READMETHOD => "Blast" > > ); > > > > my $blast_report = $factory->blastall($query); > > sleep 5; > > > > # just write the result we got for this query into a > > #new blast-formatted file...named after the id of the query seq... > > my $result = $blast_report->next_result; > > my $blio = Bio::SearchIO->new( -file => ">".$query->id.".bls", -format => > "blast" ) or die $!; > > $blio->write_result($result); > > > > # below, just looking at the current blast result > > while ( my $result = $blast_report->next_result ) { > > ## $result is a Bio::Search::Result::ResultI compliant object > > while ( my $hit = $result->next_hit ) { > > ## $hit is a Bio::Search::Hit::HitI compliant object > > while ( my $hsp = $hit->next_hsp ) { > > ## $hsp is a Bio::Search::HSP::HSPI compliant object > > if ( $hsp->length('total') > 50 ) { > > if ( $hsp->percent_identity >= 75 ) { > > print "Query= ", $result->query_name, > > "Hit= ", $hit->name, > > "Length= ", $hsp->length('total'), > > "Percent_id= ", $hsp->percent_identity, > > "Subject=", $hsp->hit_string, "\n"; > > } > > } > > } > > } > > } > > } > > ----- Original Message ----- > > *From:* Smithies, Russell > > *To:* 'Tim Koehler' ; 'maj at fortinbras.us'<%27maj at fortinbras.us%27> > > *Sent:* Sunday, November 29, 2009 3:58 PM > > *Subject:* RE: [Bioperl-l] How to parse BLAST output - all hits of each > queryinnew file > > > > Hi Tim > > With various people writing the ?howtos? and other docs, the examples are > bound to have differing names for the variables used but as long as you?re > consistent, it should all fit together. > > > > I think I?ve almost got your code working, just getting errors from > Bio::Search::Result::BlastResult which I?m not entirely sure how to use. > Perhaps Mark can get this bit going? > > > > --Russell > > =============================== > > > > use strict; > > use Bio::Tools::Run::StandAloneBlast; > > use Bio::SeqIO; > > use Bio::SearchIO; > > use Bio::Search::Result::BlastResult; > > > > use Data::Dumper; > > > > my $Seq_in = Bio::SeqIO->new( -file => "sequences.fasta", > > -format => "fasta" ); > > > > while ( my $query = $Seq_in->next_seq() ) { > > warn "Processing ",$query->id, "\n"; > > my $factory = > > Bio::Tools::Run::StandAloneBlast->new( > > program => "blastn", > > database => > "/data/databases/flatfile/illuminati_blastdata/nt", > > _READMETHOD => "Blast" > > ); > > > > my $blast_report = $factory->blastall($query); > > sleep 5; > > > > > > my %hits_by_query; > > > > while ( my $result = $blast_report->next_result ) { > > foreach my $hit ( $result->hits ) { > > warn "Pushed a hit for ",$hit->name, "\n"; > > push( @{ $hits_by_query{ $hit->name } }, $hit ); > > } > > } > > > > foreach my $qid ( keys %hits_by_query ) { > > warn "qid = $qid\n"; > > my $res = Bio::Search::Result::BlastResult->new() or die $!; > > print Dumper $res; > > foreach my $h ( @{ $hits_by_query{$qid} } ){ > > warn "adding hit ", $h->name, "\n"; > > $res->add_hit($h) if defined($h); > > } > > my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", -format => > "blast" ) or die $!; > > $blio->write_result($res); > > } > > > > while ( my $result = $blast_report->next_result ) { > > ## $result is a Bio::Search::Result::ResultI compliant object > > while ( my $hit = $result->next_hit ) { > > ## $hit is a Bio::Search::Hit::HitI compliant object > > while ( my $hsp = $hit->next_hsp ) { > > ## $hsp is a Bio::Search::HSP::HSPI compliant object > > if ( $hsp->length('total') > 50 ) { > > if ( $hsp->percent_identity >= 75 ) { > > print "Query= ", $result->query_name, > > "Hit= ", $hit->name, > > "Length= ", $hsp->length('total'), > > "Percent_id= ", $hsp->percent_identity, > > "Subject=", $hsp->hit_string, "\n"; > > } > > } > > } > > } > > } > > } > > =============================== > > > > *From:* Tim Koehler [mailto:timbourine81 at googlemail.com] > *Sent:* Friday, 27 November 2009 10:24 p.m. > *To:* Smithies, Russell; maj at fortinbras.us > *Subject:* Re: [Bioperl-l] How to parse BLAST output - all hits of each > queryinnew file > > > > Hey guys, > > please, do not get me wrong that I wanted to put the workload on you. So > far I only found the HowTo's but in there in some way the language changed > with time (e.g. $in to $Seq_in) or some things I simply could not find. > Now I got a tip where else to search: the scrapbook and deobfuscator. > > I immediately will have a look at that. > > This is the first time for me touching linux / perl commands; that's why I > thought after several days of trial and many errors ;) asking the > mailinglist. > > I was very happy about your fast answers! > > Cheers and a nice weekend, > > Tim > > On Thu, Nov 26, 2009 at 5:02 PM, Tim Koehler > wrote: > > ups, sent too early... > > Hey Mark, > > thanks for the answer. But I am still struggling, especially where to put > in your code. > > Here ist the code I have, so far: > > #!/usr/bin/perl -w > > ### should I put your code here as push is a perl command? > > > my %hits_by_query; > for ($result->hits) { > > ### I inserted a comma after name}}; if there is no comma, there was the > error: Scalar found where operator expected at > 12_BLAST_two_sequence_each_query_one_file.PL line7, near "} $hit" > ### (Missing operator before $hit?) > ###Useless use of push with no values at > 12_BLAST_two_sequence_each_query_one_file.PL line 7. > ###syntax error at 12_BLAST_two_sequence_each_query_one_file.PL line 7, > near "} $hit" > ###BEGIN not safe after errors--compilation aborted at > 12_BLAST_two_sequence_each_query_one_file.PL line 13. > > > push @{$hits_by_query{$hit->name}}, $hit; > > ###here, every time this terror appears: Name "main::result" used only > once: possible typo at 12_BLAST_two_sequence_each_query_one_file.PL line 5. > ###error: Can't call method "hits" on an undefined value at > 12_BLAST_two_sequence_each_query_one_file.PL line 5. > > > } > > > use strict; > use Bio::Tools::Run::StandAloneBlast; > use Bio::SeqIO; > use Bio::SearchIO; > > use Bio::Search::Result::BlastResult; > > my $Seq_in = Bio::SeqIO->new ( > -file => > "/home/koehler/Programs/for_BLAST/BLAST_Pipeline/1_to_BLAST_two_seq.fasta", > -format => 'fasta' > ); > while (my $query = $Seq_in->next_seq()) { > > > my $factory = Bio::Tools::Run::StandAloneBlast->new( > > 'program' => 'blastn', > 'database' => '/home/koehler/Programs/for_BLAST/BLAST_Pipeline/3_BLAST_db', > _READMETHOD => "Blast" > ); > > my $blast_report = $factory->blastall($query); > > ### Should I need to use a module? are the commands here at the right > position? errors, e.g., Global symbol "$hit" requires explicit package name > #my %hits_by_query; > #for ($result->hits) { > ### inserted comma after name}} > # push @{$hits_by_query{$hit->name}}, $hit; > #} > > > > foreach my $qid ( keys %hits_by_query ) { > my $result = Bio::Search::Result::BlastResult->new(); > $result->add_hit($_) for ( @{$hits_by_query{$qid}} ); > my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", -format=>'blast' ); > $blio->write_result($result); > } > > ###where are the files stored? what is their name. Sorry, but I cannot get > behind that :( > > while( my $result = $blast_report->next_result ) { > ## $result is a Bio::Search::Result::ResultI compliant object > > > while( my $hit = $result->next_hit ) { > > ## $hit is a Bio::Search::Hit::HitI compliant object > > > while( my $hsp = $hit->next_hsp ) { > > ## $hsp is a Bio::Search::HSP::HSPI compliant object > if( $hsp->length('total') > 50 ) { > if ( $hsp->percent_identity >= 75 ) { > print "Query= ", $result->query_name, > "Hit= ", $hit->name, > "Length= ", $hsp->length('total'), > "Percent_id= ", $hsp->percent_identity, > "Subject=", $hsp->hit_string,"\n"; > } > } > } > } > } > } > > Again, a big thanks in advance :) > > All the best, > > Tim > > On Thu, Nov 26, 2009 at 4:52 PM, Tim wrote: > > Hey Mark, > > thanks for the answer > > > > > On 25.11.2009 20:21, Mark A. Jensen wrote: > > whoops: change the following line: > > my $blio = Bio::SearchIO->new( -file => $qid.".bls", -format=>'blast' ); > > > > to > > > > my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", -format=>'blast' ); > > > > (I always forget that...) > > MAJ > > > > ----- Original Message ----- From: "Mark A. Jensen" > > To: "Tim" ; > > Sent: Wednesday, November 25, 2009 1:20 PM > > Subject: Re: [Bioperl-l] How to parse BLAST output - all hits of each > > queryinnew file > > > > > >> hey Tim-- > >> > >> Sound like you need to go about collecting your queries inside out: > >> > >> my %hits_by_query; > >> for ($result->hits) { > >> push @{$hits_by_query{$hit->name}} $hit; > >> } > >> > >> I believe now each hash element, keyed by the query name, will contain > >> an arrayref to the set of hits assoc with that query. > >>> From here, I believe > >> > >> use Bio::Search::Result::BlastResult; > >> use Bio::SearchIO; > >> > >> foreach my $qid ( keys %hits_by_query ) { > >> my $result = Bio::Search::Result::BlastResult->new(); > >> $result->add_hit($_) for ( @{$hits_by_query{$qid}} ); > >> my $blio = Bio::SearchIO->new( -file => $qid.".bls", -format=>'blast' > ); > >> $blio->write_result($result); > >> } > >> > >> will do what you want. > >> > >> hope this helps - > >> Mark > >> > >> ----- Original Message ----- From: "Tim" > >> To: > >> Sent: Wednesday, November 25, 2009 12:40 PM > >> Subject: [Bioperl-l] How to parse BLAST output - all hits of each > >> query innew file > >> > >> > >>> Dear bioperl users, > >>> > >>> I am a real newbie and have - maybe a very trivial - question. > >>> > >>> I searched the mailing list archive and many howtos but I have not > found > >>> a concrete answer to my problem. So hopefully you can help me :) > >>> > >>> Background: I use the latest Bioperl version (installed it two weeks > >>> before). > >>> When I use Bio::Tools::Run::StandAloneBlast to BLAST one fasta file > >>> including different sequences, I get a BLAST output with many queries > >>> each having several hits / sbjcts. > >>> > >>> My problem is how to parse *all* hits of *one* query into a single new > >>> file. And this for all the queries I have in my BLAST output file. > >>> > >>> Or is it better the other way round; first to make fasta files with > only > >>> single sequences inside and BLAST each file? But how can I automize > that > >>> using Bioperl? > >>> > >>> I tried Bio::SearchIO but can only parse all queries and their > >>> respective hits in only one file... > >>> I think iteration is also necessary here, but I do not really know how > >>> to include that into Bio::SearchIO. > >>> Or do I have to use Module:Bio::Index::Blast? > >>> > >>> I can index a file (see below), but I have no idea what comes next... > >>> > >>> ###How I index a file... > >>> > >>> #!/usr/bin/perl -w > >>> > >>> $ENV{BIOPERL_INDEX_TYPE} = "SDBM_File"; > >>> > >>> use Bio::Index::Fasta; > >>> > >>> > >>> $file_name = "8_to_BLAST_two_seq_index.fasta"; > >>> $id = "48882"; > >>> $inx = Bio::Index::Fasta->new (-filename => $file_name . ".idx", > >>> -write_flag => 1); > >>> $inx->make_index($file_name); > >>> > >>> > >>> Hopefully, you can give me at least hints what to look for. > >>> > >>> A big THANKS in advance! > >>> > >>> Cheers, > >>> > >>> Tim > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >>> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > > > > Tim K?hler > MPI for Terrestrial Microbiology > Karl-von-Frisch-Stra?e > D-35043 Marburg / Germany > > Email: koehlerd at mpi-marburg.mpg.de > Phone: +49 6421 178-740 > Fax: +49 6421 178-999 > > > > > ------------------------------ > > *Attention: *The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities to > which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ------------------------------ > > > > From maj at fortinbras.us Sun Nov 1 23:47:15 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sun, 1 Nov 2009 23:47:15 -0500 Subject: [Bioperl-l] annotations Message-ID: <5150801225E0484D95DC51B2D00AE519@NewLife> I'm cogitating on features and annotations. For a RichSeq, one gets the set of annotations by $seq->annotation->get_Annotations while getting features by $seq->get_Features Is there a reason not to have a method in SeqI sub get_Annotations { shift->annotation->get_Annotations } to allow a user to do what seems natural from a user's perspective, viz. $seq->get_Annotations? I imagine this might save hundreds of hours of frustration, integrated over all newbies. MAJ From cjfields at illinois.edu Mon Nov 2 08:08:54 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 2 Nov 2009 07:08:54 -0600 Subject: [Bioperl-l] annotations In-Reply-To: <5150801225E0484D95DC51B2D00AE519@NewLife> References: <5150801225E0484D95DC51B2D00AE519@NewLife> Message-ID: <6920A9E1-D221-4CF8-9866-0ADBDB254C19@illinois.edu> On Nov 1, 2009, at 10:47 PM, Mark A. Jensen wrote: > I'm cogitating on features and annotations. For a RichSeq, one gets > the set of annotations by > > $seq->annotation->get_Annotations > > while getting features by > > $seq->get_Features > > Is there a reason not to have a method in SeqI > > sub get_Annotations { shift->annotation->get_Annotations } > > to allow a user to do what seems natural from a user's perspective, > viz. $seq->get_Annotations? I imagine this might save hundreds of > hours of frustration, integrated over all newbies. > MAJ One could add the methods to delegate to annotation() (that's essentially what I'm planning on doing for Biome). chris From kiekyon.huang at gmail.com Tue Nov 3 10:14:39 2009 From: kiekyon.huang at gmail.com (Kie Kyon Huang) Date: Tue, 3 Nov 2009 23:14:39 +0800 Subject: [Bioperl-l] render_blast problem Message-ID: Hi, I was trying to follow the HOWTO:Graphics at http://www.bioperl.org/wiki/HOWTO:Graphics When running the command line in cygwin $ perl render_blast1.pl data1.txt | display - I get the following error line, bash: display: command not found I also tried $ perl render_blast1.pl data1.txt > data1.png however, I was unable to open the data1.png file using Microsoft Office Picture Manager or windows Photo Gallery Thanks Huang From biopython at maubp.freeserve.co.uk Tue Nov 3 10:45:37 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 3 Nov 2009 15:45:37 +0000 Subject: [Bioperl-l] render_blast problem In-Reply-To: References: Message-ID: <320fb6e00911030745s68331ef7n729505f460863e21@mail.gmail.com> On Tue, Nov 3, 2009 at 3:14 PM, Kie Kyon Huang wrote: > Hi, > > I was trying to follow the HOWTO:Graphics at > http://www.bioperl.org/wiki/HOWTO:Graphics > > When running the command line in cygwin > > $ perl render_blast1.pl data1.txt | display - > > I get the following error line, > > bash: display: command not found That makes sense on Windows, since display is a Unix command line tool. > I also tried > > $ perl render_blast1.pl data1.txt > data1.png Based on the wiki, I think that ought to have worked. > however, I was unable to open the data1.png file using Microsoft > Office Picture Manager or windows Photo Gallery Did you do this step?: >> Important! If you are on a Windows platform, you need to put >> STDOUT into binary mode so that the PNG file does not go >> through Window's carriage return/linefeed transformations. >> Before the final print statement, put the statement >> binmode(STDOUT). This advice also applies to certain older >> versions of RedHat, which ship with a patched (and possibly >> broken) version of Perl. (BioPerl devs - couldn't that be added to the default render_blast1.pl script with an if statement checking for Windows?) Peter From biopython at maubp.freeserve.co.uk Tue Nov 3 11:04:59 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 3 Nov 2009 16:04:59 +0000 Subject: [Bioperl-l] render_blast problem In-Reply-To: References: <320fb6e00911030745s68331ef7n729505f460863e21@mail.gmail.com> Message-ID: <320fb6e00911030804r62e50da6w373bbb61e9823f28@mail.gmail.com> Mailing list CC'd - solved :) On Tue, Nov 3, 2009 at 3:55 PM, Kie Kyon Huang wrote: > > ok, that fix it > i forget sometimes what platform am i on. > thanks Great. Peter From amackey at virginia.edu Tue Nov 3 12:09:00 2009 From: amackey at virginia.edu (Aaron Mackey) Date: Tue, 3 Nov 2009 12:09:00 -0500 Subject: [Bioperl-l] svn errors? Message-ID: <24c96eca0911030909p7cfbf858h4de5a345cf8a0782@mail.gmail.com> [ajm6q at lc4 bioperl-live]$ svn update svn: Decompression of svndiff data failed I'll admit to not having svn updated in awhile; A clean, anonymous svn co failed with the same message: [...] A bioperl-live/Bio/Structure/StructureI.pm A bioperl-live/Bio/Structure/IO svn: Decompression of svndiff data failed -Aaron P.S. I used this command: svn co svn:// code.open-bio.org/bioperl/bioperl-live/trunk bioperl-live From cjfields at illinois.edu Tue Nov 3 12:17:10 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 3 Nov 2009 11:17:10 -0600 Subject: [Bioperl-l] svn errors? In-Reply-To: <24c96eca0911030909p7cfbf858h4de5a345cf8a0782@mail.gmail.com> References: <24c96eca0911030909p7cfbf858h4de5a345cf8a0782@mail.gmail.com> Message-ID: <8C5FC42D-F957-45AC-9AAC-876ACC9D77E0@illinois.edu> Aaron, Yep, this was reported to support (a couple of users on #bioperl reported the same problem). Chris D. is looking into it. I'm wondering if it's worth setting up a second mirror to github for this purpose. chris On Nov 3, 2009, at 11:09 AM, Aaron Mackey wrote: > [ajm6q at lc4 bioperl-live]$ svn update > svn: Decompression of svndiff data failed > > > I'll admit to not having svn updated in awhile; A clean, anonymous > svn co > failed with the same message: > > [...] > A bioperl-live/Bio/Structure/StructureI.pm > A bioperl-live/Bio/Structure/IO > svn: Decompression of svndiff data failed > > -Aaron > > P.S. I used this command: svn co svn:// > code.open-bio.org/bioperl/bioperl-live/trunk bioperl-live > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Nov 3 12:19:56 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 3 Nov 2009 11:19:56 -0600 Subject: [Bioperl-l] render_blast problem In-Reply-To: <320fb6e00911030745s68331ef7n729505f460863e21@mail.gmail.com> References: <320fb6e00911030745s68331ef7n729505f460863e21@mail.gmail.com> Message-ID: <8336341C-C7B4-4740-A7C3-E2DE5FDAF651@illinois.edu> On Nov 3, 2009, at 9:45 AM, Peter wrote: > ... > Did you do this step?: >>> Important! If you are on a Windows platform, you need to put >>> STDOUT into binary mode so that the PNG file does not go >>> through Window's carriage return/linefeed transformations. >>> Before the final print statement, put the statement >>> binmode(STDOUT). This advice also applies to certain older >>> versions of RedHat, which ship with a patched (and possibly >>> broken) version of Perl. > > (BioPerl devs - couldn't that be added to the default > render_blast1.pl script with an if statement checking for > Windows?) > > Peter Yes, that should be added. I'll work on it. chris From mauricio at open-bio.org Tue Nov 3 12:20:52 2009 From: mauricio at open-bio.org (Mauricio Herrera Cuadra) Date: Tue, 03 Nov 2009 11:20:52 -0600 Subject: [Bioperl-l] svn errors? In-Reply-To: <24c96eca0911030909p7cfbf858h4de5a345cf8a0782@mail.gmail.com> References: <24c96eca0911030909p7cfbf858h4de5a345cf8a0782@mail.gmail.com> Message-ID: <4AF06674.30506@open-bio.org> Hi Aaron, This was reported a few days ago. Chris Dagdigian is working today on a fix for it. Mauricio. Aaron Mackey wrote: > [ajm6q at lc4 bioperl-live]$ svn update > svn: Decompression of svndiff data failed > > > I'll admit to not having svn updated in awhile; A clean, anonymous svn co > failed with the same message: > > [...] > A bioperl-live/Bio/Structure/StructureI.pm > A bioperl-live/Bio/Structure/IO > svn: Decompression of svndiff data failed > > -Aaron > > P.S. I used this command: svn co svn:// > code.open-bio.org/bioperl/bioperl-live/trunk bioperl-live > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From rachitasharma at gmail.com Tue Nov 3 17:12:11 2009 From: rachitasharma at gmail.com (Rachita Sharma) Date: Tue, 3 Nov 2009 14:12:11 -0800 Subject: [Bioperl-l] Trouble parsing PSI-BLAST Message-ID: <48f9c0d0911031412v26935097ib06d13c2266cfd8a@mail.gmail.com> I am having trouble parsing PSI-BLAST results. Please help. The code is: my $in = new Bio::SearchIO( -format => 'blast', -file => "BS_XFpsiRblastoutputs/e${ev}/bloutput${i}.txt"); while( my $result = $in->next_result ) { while( my $hit = $result->next_hit ) { $sth->execute($result->query_name, $hit->name, $hit->significance); print "Query executed!\n"; } } The error is: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: no data for midline ***** No hits found ****** STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:359 STACK: Bio::SearchIO::blast::next_result /usr/lib/perl5/site_perl/5.8.8/Bio/SearchIO/blast.pm:1813 STACK: BSubVCpsiRblast.pl:92 ----------------------------------------------------------- From cjfields at illinois.edu Tue Nov 3 22:42:55 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 3 Nov 2009 21:42:55 -0600 Subject: [Bioperl-l] Trouble parsing PSI-BLAST In-Reply-To: <48f9c0d0911031412v26935097ib06d13c2266cfd8a@mail.gmail.com> References: <48f9c0d0911031412v26935097ib06d13c2266cfd8a@mail.gmail.com> Message-ID: Rachita, You'll have to give us more to go on than this. The best thing to do is file a bug report and attach an example PSI-BLAST report and code that causes the problem. The $sth->execute(...) is a bit odd, but that shouldn't cause the error in question. Also, make sure to stipulate the OS, version of BioPerl, and perl version. chris On Nov 3, 2009, at 4:12 PM, Rachita Sharma wrote: > I am having trouble parsing PSI-BLAST results. Please help. > > The code is: > my $in = new Bio::SearchIO( -format => 'blast', > -file => > "BS_XFpsiRblastoutputs/e${ev}/bloutput${i}.txt"); > > > while( my $result = $in->next_result ) { > while( my $hit = $result->next_hit ) { > > $sth->execute($result->query_name, $hit->name, $hit->significance); > print "Query executed!\n"; > > } > } > > The error is: > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: no data for midline ***** No hits found ****** > STACK: Error::throw > STACK: Bio::Root::Root::throw > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:359 > STACK: Bio::SearchIO::blast::next_result > /usr/lib/perl5/site_perl/5.8.8/Bio/SearchIO/blast.pm:1813 > STACK: BSubVCpsiRblast.pl:92 > ----------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From alexl at users.sourceforge.net Wed Nov 4 02:30:21 2009 From: alexl at users.sourceforge.net (Alex Lancaster) Date: Wed, 04 Nov 2009 02:30:21 -0500 Subject: [Bioperl-l] version of ExtUtils::Manifest too strict? Message-ID: Does the version of ExtUtils::Manifest really need to be strictly greater than or equal to 1.52? Currently this blocks me updating the Fedora package of BioPerl to 1.6.1, because the version of perl that Fedora ships is on 1.51 and hence the build fails with: Checking prerequisites... - ERROR: ExtUtils::Manifest (1.51_01) is installed, but we need version >= 1.52 Full logs are here: http://koji.fedoraproject.org/koji/taskinfo?taskID=1787483 http://koji.fedoraproject.org/koji/getfile?taskID=1787483&name=build.log This is true even with the version of Perl in rawhide/F-12 etc. (ExtUtils::Manifest is in the base perl package). If it really is necessary, I would like to be armed with a good argument why it needs to be updated, since the Perl package maintainer would have to update the entire Perl package simply to get a more recent version of one small subpackage. Regards, Alex From jluis.lavin at unavarra.es Wed Nov 4 03:43:35 2009 From: jluis.lavin at unavarra.es (jluis.lavin at unavarra.es) Date: Wed, 4 Nov 2009 09:43:35 +0100 (CET) Subject: [Bioperl-l] Trouble retrieving multiple sequences from NCBI in a single list query Message-ID: <1386.130.206.164.153.1257324215.squirrel@webmail.unavarra.es> Hello all, I?m a newbie who is having terrible troubles trying to retrieve a list multiple sequences from the NCBI and write them to a single file in Fasta format. The code I?ve written seems to read mylist and retrive the sequences, but it kinda overwrites them so that I only get the last sequence on the list. I?ve been told to ask the people on this mailing list for help, since you may have come across this problem also or at last will know how to solve it... Here is my code, which basically consist on an STDIN for the list to be read into an array and a loop to read each sequence (stopping when the list ends) and retrieve a sequence each time the loop is launched, writting that sequence to a fasta file. I only get a sequence back although it seems to perform the retrieving process with each of the sequences of the list... #!/usr/bin/perl -w use strict; use Bio::DB::GenPept; use Bio::DB::GenBank; use Bio::SeqIO; print "Enter your list name:"; my $archivo=; chomp $archivo; die ("Can?t open input\n") unless (open(INFILE, $archivo)); my @lista = ; foreach my $seq (@lista) { if ($seq eq '') { die ("empty list") } else { my $db = new Bio::DB::GenPept("-format" => "Fasta"); my $seqobj = $db->get_Seq_by_acc($seq); my $out = new Bio::SeqIO (-file => ">extracted_seqs.fasta", -format => 'fasta'); $out->write_seq($seqobj); } } exit; An example list of sequences can be this one: YP_003107578.1 YP_003106103.1 YP_003106552.1 YP_003106560.1 YP_003107053.1 YP_003107450.1 YP_003108000.1 YP_003105023.1 YP_003105264.1 Thanks in advance for your help ;) -- Jos? Luis Lav?n Trueba, PhD Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN From e.osimo at gmail.com Wed Nov 4 04:54:52 2009 From: e.osimo at gmail.com (Emanuele Osimo) Date: Wed, 4 Nov 2009 10:54:52 +0100 Subject: [Bioperl-l] Bio::Graphics and picture format Message-ID: <2ac05d0f0911040154h4eed4a1j8108f78e6e4761f3@mail.gmail.com> Hello everyone, do you know if it is possible to generate an image with Bio::Graphics in a vector format? Is there a list of available formats? Thanks Emanuele From David.Messina at sbc.su.se Wed Nov 4 04:52:53 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 4 Nov 2009 10:52:53 +0100 Subject: [Bioperl-l] Trouble retrieving multiple sequences from NCBI in a single list query In-Reply-To: <1386.130.206.164.153.1257324215.squirrel@webmail.unavarra.es> References: <1386.130.206.164.153.1257324215.squirrel@webmail.unavarra.es> Message-ID: <628aabb70911040152r19ed79dfnbc54f346295d28a8@mail.gmail.com> > > The code I?ve written seems to read mylist and retrive the sequences, but > it kinda overwrites them so that I only get the last sequence on the list. > With this line my $out = new Bio::SeqIO (-file => ">extracted_seqs.fasta", -format => 'fasta'); you are opening the filehandle for the output file inside your loop, so each time it is writing over the previous file with an empty file. Then, you write a single sequence to that file with this line $out->write_seq($seqobj); So when you are done, you just have the last sequence in the output file. If you move the opening of the output filehandle outside the loop (it needs to be done only once), then it should work as you expect. Also, I notice the newline characters are not being removed from your sequence IDs (actually I'm a little surprised that the sequences are being retrieved). Just to be safe, you may want to add the line chomp @lista; after my @lista = ; Dave From jluis.lavin at unavarra.es Wed Nov 4 05:14:40 2009 From: jluis.lavin at unavarra.es (jluis.lavin at unavarra.es) Date: Wed, 4 Nov 2009 11:14:40 +0100 (CET) Subject: [Bioperl-l] Trouble retrieving multiple sequences from NCBI in a single list query In-Reply-To: <628aabb70911040152r19ed79dfnbc54f346295d28a8@mail.gmail.com> References: <1386.130.206.164.153.1257324215.squirrel@webmail.unavarra.es> <628aabb70911040152r19ed79dfnbc54f346295d28a8@mail.gmail.com> Message-ID: <1791.130.206.164.153.1257329680.squirrel@webmail.unavarra.es> Thank you very very much Dave, I?ve had a really frustrating time trying to find out what I was doing wrong, it has been so frustrating that I was about to quit Bioperl. Now I can try to focus on BLAST parsing for my comparative genomic analysis You?re great in this mailing list, because you give a fast and neat advice to all the questions asked here by newbies like me ;) El Mie, 4 de Noviembre de 2009, 10:52, Dave Messina escribi?: >> >> The code I??ve written seems to read mylist and retrive the sequences, >> but >> it kinda overwrites them so that I only get the last sequence on the >> list. >> > > With this line > > my $out = new Bio::SeqIO (-file => ">extracted_seqs.fasta", -format => > 'fasta'); > > > you are opening the filehandle for the output file inside your loop, so > each > time it is writing over the previous file with an empty file. Then, you > write a single sequence to that file with this line > > $out->write_seq($seqobj); > > > So when you are done, you just have the last sequence in the output file. > > If you move the opening of the output filehandle outside the loop (it > needs > to be done only once), then it should work as you expect. > > Also, I notice the newline characters are not being removed from your > sequence IDs (actually I'm a little surprised that the sequences are > being > retrieved). Just to be safe, you may want to add the line > > chomp @lista; > > > after > > my @lista = ; > > > > > Dave > -- Dr. Jos? Luis Lav?n Trueba Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN From hrh at fmi.ch Wed Nov 4 05:05:17 2009 From: hrh at fmi.ch (Hotz, Hans-Rudolf) Date: Wed, 04 Nov 2009 11:05:17 +0100 Subject: [Bioperl-l] Trouble retrieving multiple sequences from NCBI in a single list query In-Reply-To: <1386.130.206.164.153.1257324215.squirrel@webmail.unavarra.es> Message-ID: Hi try my $out = new Bio::SeqIO (-file => ">>extracted_seqs.fasta", ^ this way you no longer overwrite your existing file, but append the next sequence. Regards, Hans On 11/4/09 9:43 AM, "jluis.lavin at unavarra.es" wrote: > > Hello all, > > I?m a newbie who is having terrible troubles trying to retrieve a list > multiple sequences from the NCBI and write them to a single file in Fasta > format. > The code I?ve written seems to read mylist and retrive the sequences, but > it kinda overwrites them so that I only get the last sequence on the list. > I?ve been told to ask the people on this mailing list for help, since you > may have come across this problem also or at last will know how to solve > it... > > Here is my code, which basically consist on an STDIN for the list to be > read into an array and a loop to read each sequence (stopping when the > list ends) and retrieve a sequence each time the loop is launched, > writting that sequence to a fasta file. I only get a sequence back > although it seems to perform the retrieving process with each of the > sequences of the list... > > > #!/usr/bin/perl -w > use strict; > use Bio::DB::GenPept; > use Bio::DB::GenBank; > use Bio::SeqIO; > print "Enter your list name:"; > my $archivo=; > chomp $archivo; > die ("Can?t open input\n") unless (open(INFILE, $archivo)); > my @lista = ; > foreach my $seq (@lista) { > if ($seq eq '') { > die ("empty list") > } > else { > my $db = new Bio::DB::GenPept("-format" => "Fasta"); > my $seqobj = $db->get_Seq_by_acc($seq); > my $out = new Bio::SeqIO (-file => ">extracted_seqs.fasta", > -format => 'fasta'); > $out->write_seq($seqobj); > } > } > exit; > > > An example list of sequences can be this one: > > YP_003107578.1 > YP_003106103.1 > YP_003106552.1 > YP_003106560.1 > YP_003107053.1 > YP_003107450.1 > YP_003108000.1 > YP_003105023.1 > YP_003105264.1 > > Thanks in advance for your help ;) From jluis.lavin at unavarra.es Wed Nov 4 05:25:38 2009 From: jluis.lavin at unavarra.es (jluis.lavin at unavarra.es) Date: Wed, 4 Nov 2009 11:25:38 +0100 (CET) Subject: [Bioperl-l] Trouble retrieving multiple sequences from NCBI in asingle list query In-Reply-To: References: <1386.130.206.164.153.1257324215.squirrel@webmail.unavarra.es> Message-ID: <1834.130.206.164.153.1257330338.squirrel@webmail.unavarra.es> Thank you very much for your answer Hans!!! It works perfectly,also a neat and fast solution, like Dave?s. Blessings to you all ;) El Mie, 4 de Noviembre de 2009, 11:05, Hotz, Hans-Rudolf escribi?: > Hi > > try > > my $out = new Bio::SeqIO (-file => ">>extracted_seqs.fasta", > ^ > > this way you no longer overwrite your existing file, but append the next > sequence. > > Regards, Hans > > > > On 11/4/09 9:43 AM, "jluis.lavin at unavarra.es" > wrote: > >> >> Hello all, >> >> I?m a newbie who is having terrible troubles trying to retrieve a list >> multiple sequences from the NCBI and write them to a single file in >> Fasta >> format. >> The code I?ve written seems to read mylist and retrive the sequences, >> but >> it kinda overwrites them so that I only get the last sequence on the >> list. >> I?ve been told to ask the people on this mailing list for help, since >> you >> may have come across this problem also or at last will know how to solve >> it... >> >> Here is my code, which basically consist on an STDIN for the list to be >> read into an array and a loop to read each sequence (stopping when the >> list ends) and retrieve a sequence each time the loop is launched, >> writting that sequence to a fasta file. I only get a sequence back >> although it seems to perform the retrieving process with each of the >> sequences of the list... >> >> >> #!/usr/bin/perl -w >> use strict; >> use Bio::DB::GenPept; >> use Bio::DB::GenBank; >> use Bio::SeqIO; >> print "Enter your list name:"; >> my $archivo=; >> chomp $archivo; >> die ("Can?t open input\n") unless (open(INFILE, $archivo)); >> my @lista = ; >> foreach my $seq (@lista) { >> if ($seq eq '') { >> die ("empty list") >> } >> else { >> my $db = new Bio::DB::GenPept("-format" => "Fasta"); >> my $seqobj = $db->get_Seq_by_acc($seq); >> my $out = new Bio::SeqIO (-file => ">extracted_seqs.fasta", >> -format => 'fasta'); >> $out->write_seq($seqobj); >> } >> } >> exit; >> >> >> An example list of sequences can be this one: >> >> YP_003107578.1 >> YP_003106103.1 >> YP_003106552.1 >> YP_003106560.1 >> YP_003107053.1 >> YP_003107450.1 >> YP_003108000.1 >> YP_003105023.1 >> YP_003105264.1 >> >> Thanks in advance for your help ;) > > -- Dr. Jos? Luis Lav?n Trueba Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN From scott at scottcain.net Wed Nov 4 08:26:02 2009 From: scott at scottcain.net (Scott Cain) Date: Wed, 4 Nov 2009 08:26:02 -0500 Subject: [Bioperl-l] Bio::Graphics and picture format In-Reply-To: <2ac05d0f0911040154h4eed4a1j8108f78e6e4761f3@mail.gmail.com> References: <2ac05d0f0911040154h4eed4a1j8108f78e6e4761f3@mail.gmail.com> Message-ID: <0FB17FBC-16BE-4A9F-AC75-983D3B4ECE7D@scottcain.net> Hi Emanuele, It is possible to use GD::SVG instead of GD to generate SVG graphics. To use it, you provide an argument of "-image_class GD::SVG" to the constructor of Bio::Graphics::Panel. See the perldoc of Bio::Graphics::Panel for more info. Scott On Nov 4, 2009, at 4:54 AM, Emanuele Osimo wrote: > Hello everyone, > do you know if it is possible to generate an image with > Bio::Graphics in a > vector format? Is there a list of available formats? > Thanks > Emanuele > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ----------------------------------------------------------------------- Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From b3sn7 at UNB.ca Tue Nov 3 12:30:24 2009 From: b3sn7 at UNB.ca (Sharma, Rachita) Date: Tue, 3 Nov 2009 13:30:24 -0400 Subject: [Bioperl-l] Trouble parsing PSI-BLAST Message-ID: <1257269424.4af068b045434@webmail.unb.ca> I am having trouble parsing PSI-BLAST results. Please help. The code is: my $in = new Bio::SearchIO( -format => 'blast', -file => "BS_XFpsiRblastoutputs/e${ev}/bloutput${i}.txt"); while( my $result = $in->next_result ) { while( my $hit = $result->next_hit ) { $sth->execute($result->query_name, $hit->name, $hit->significance); print "Query executed!\n"; } } The error is: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: no data for midline ***** No hits found ****** STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:359 STACK: Bio::SearchIO::blast::next_result /usr/lib/perl5/site_perl/5.8.8/Bio/SearchIO/blast.pm:1813 STACK: BSubVCpsiRblast.pl:92 ----------------------------------------------------------- ******************************* Rachita Sharma Research Assistant (PhD Student) University of New Brunswick, NB, CANADA email: Rachita.Sharma at unb.ca Phone no: 503-895-3619 ******************************* From cjfields at illinois.edu Wed Nov 4 08:53:35 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 4 Nov 2009 07:53:35 -0600 Subject: [Bioperl-l] version of ExtUtils::Manifest too strict? In-Reply-To: References: Message-ID: <1D9E943F-2EDC-49AB-83DE-78DED5A8AC23@illinois.edu> Alex, Not sure why ExtUtils::Manifest can't be bundled as a separate perl package alone. It is part of perl core but it's also available on CPAN separately from perl itself: http://search.cpan.org/~rkobes/ExtUtils-Manifest-1.57/lib/ExtUtils/Manifest.pm This is the commit message for that BTW. This allows spaces in file names for the MANIFEST. v1.52 is a bug fix and is required. http://code.open-bio.org/svnweb/index.cgi/bioperl/revision/?rev=15673 chris On Nov 4, 2009, at 1:30 AM, Alex Lancaster wrote: > Does the version of ExtUtils::Manifest really need to be strictly > greater than or equal to 1.52? > > Currently this blocks me updating the Fedora package of BioPerl to > 1.6.1, because the version of perl that Fedora ships is on 1.51 and > hence the build fails with: > > Checking prerequisites... > - ERROR: ExtUtils::Manifest (1.51_01) is installed, but we need > version >= 1.52 > > Full logs are here: > http://koji.fedoraproject.org/koji/taskinfo?taskID=1787483 > http://koji.fedoraproject.org/koji/getfile?taskID=1787483&name=build.log > > This is true even with the version of Perl in rawhide/F-12 etc. > (ExtUtils::Manifest is in the base perl package). > > If it really is necessary, I would like to be armed with a good > argument why this ca > why it needs to be updated, since the Perl package maintainer would > have > to update the entire Perl package simply to get a more recent > version of > one small subpackage. > > Regards, > Alex > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Nov 4 08:55:34 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 4 Nov 2009 07:55:34 -0600 Subject: [Bioperl-l] Trouble parsing PSI-BLAST In-Reply-To: <1257269424.4af068b045434@webmail.unb.ca> References: <1257269424.4af068b045434@webmail.unb.ca> Message-ID: <70E34111-4E70-463D-86EE-06926EA57073@illinois.edu> Rachita, Asked and answered yesterday. Please submit as a bug. chris On Nov 3, 2009, at 11:30 AM, Sharma, Rachita wrote: > > I am having trouble parsing PSI-BLAST results. Please help. > > The code is: > my $in = new Bio::SearchIO( -format => 'blast', > -file => "BS_XFpsiRblastoutputs/e${ev}/bloutput${i}.txt"); > > > while( my $result = $in->next_result ) { > while( my $hit = $result->next_hit ) { > > $sth->execute($result->query_name, $hit->name, $hit->significance); > print "Query executed!\n"; > > } > } > > The error is: > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: no data for midline ***** No hits found ****** > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/ > Root/Root.pm:359 > STACK: Bio::SearchIO::blast::next_result > /usr/lib/perl5/site_perl/5.8.8/Bio/SearchIO/blast.pm:1813 > STACK: BSubVCpsiRblast.pl:92 > ----------------------------------------------------------- > > > > > ******************************* > Rachita Sharma > Research Assistant (PhD Student) > University of New Brunswick, NB, CANADA > email: Rachita.Sharma at unb.ca > Phone no: 503-895-3619 > ******************************* > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Wed Nov 4 09:11:43 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 4 Nov 2009 15:11:43 +0100 Subject: [Bioperl-l] Trouble retrieving multiple sequences from NCBI in a single list query In-Reply-To: <1791.130.206.164.153.1257329680.squirrel@webmail.unavarra.es> References: <1386.130.206.164.153.1257324215.squirrel@webmail.unavarra.es> <628aabb70911040152r19ed79dfnbc54f346295d28a8@mail.gmail.com> <1791.130.206.164.153.1257329680.squirrel@webmail.unavarra.es> Message-ID: <628aabb70911040611q56b441c8o6888f326d0b314d@mail.gmail.com> Aw shucks, Jos?, glad I could be of help. There are plenty of people who answer questions around here, but my timezone sometimes gives me an advantage for the European ones. :) Dave From daniel.gaston at gmail.com Wed Nov 4 09:45:04 2009 From: daniel.gaston at gmail.com (Daniel Gaston) Date: Wed, 4 Nov 2009 10:45:04 -0400 Subject: [Bioperl-l] SwissProt and Subcellular localization information Message-ID: <50c615ba0911040645j1b28e727p5d7bf47a04db160b@mail.gmail.com> Hi Everyone, I have recently been playing around with SwissProt format flatfiles and want to extract sequences based on subcellular localization. I notice in going through the code for swiss.pm and swissdriver.pm that in both (more so in swissdriver.pm) there are several steps where organelle information based on the OG line could be extracted and added to data structure but isn't. It seems that in both cases the OG line is being added in to the generic lumping of data from the OC, OS, and OX lines in order to extract species names and taxonomy information but getting rid of everything else. Is there a particular reason for this or just a simple oversight? On the surface at least it looks like a relatively simple modification to make although I admit that I am not terribly adept at manipulating these SeqIO datastructures. Thanks for your time, Dan From daniel.gaston at gmail.com Wed Nov 4 12:12:10 2009 From: daniel.gaston at gmail.com (Daniel Gaston) Date: Wed, 4 Nov 2009 13:12:10 -0400 Subject: [Bioperl-l] SwissProt and Subcellular localization information Message-ID: <50c615ba0911040912pfd2483fwe44cd098beed73c7@mail.gmail.com> Sorry folks, it appears I was just being a bonehead and didn't look close enough into Bio:Annotations and Bio:Species objects that store all of this data. Dan On Wed, Nov 4, 2009 at 1:00 PM, wrote: > Send Bioperl-l mailing list submissions to > bioperl-l at lists.open-bio.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://lists.open-bio.org/mailman/listinfo/bioperl-l > or, via email, send a message with subject or body 'help' to > bioperl-l-request at lists.open-bio.org > > You can reach the person managing the list at > bioperl-l-owner at lists.open-bio.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Bioperl-l digest..." > > Today's Topics: > > 1. SwissProt and Subcellular localization information > (Daniel Gaston) > > > ---------- Forwarded message ---------- > From: Daniel Gaston > To: bioperl-l at lists.open-bio.org > Date: Wed, 4 Nov 2009 10:45:04 -0400 > Subject: [Bioperl-l] SwissProt and Subcellular localization information > Hi Everyone, > > I have recently been playing around with SwissProt format flatfiles and > want > to extract sequences based on subcellular localization. I notice in going > through the code for swiss.pm and swissdriver.pm that in both (more so in > swissdriver.pm) there are several steps where organelle information based > on > the OG line could be extracted and added to data structure but isn't. It > seems that in both cases the OG line is being added in to the generic > lumping of data from the OC, OS, and OX lines in order to extract species > names and taxonomy information but getting rid of everything else. Is there > a particular reason for this or just a simple oversight? On the surface at > least it looks like a relatively simple modification to make although I > admit that I am not terribly adept at manipulating these SeqIO > datastructures. > > Thanks for your time, > > Dan > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jluis.lavin at unavarra.es Thu Nov 5 10:28:23 2009 From: jluis.lavin at unavarra.es (jluis.lavin at unavarra.es) Date: Thu, 5 Nov 2009 16:28:23 +0100 (CET) Subject: [Bioperl-l] A question about iBio::Index: and its correct use Message-ID: <2120.130.206.164.153.1257434903.squirrel@webmail.unavarra.es> Hello to all, I?m trying to write a script to retrieve a list of sequences from a local FASTA file (for example a fasta archive where all the protein models of an organism are stored). This file would be used by me as some kind "local database" (sorry if I mistake a few concepts...) I?ve been reading the BioPerl HOWTOs and I came across the Bio::Index::Fasta tool. If I didn?t misunderstood what I read (which can be easy because my low level on programming) this Indexing tool should do the job. I wrote a couple of scripts based on the documentation i read about this tool, but I don?t seem to be able to create the index file to be used later (to retrieve the sequences from). -First of all, I want to ask the people in this forum if the Bio::Index::Fasta is the right one to chose for this tasks. -Then I?ll beg you to take a look at my scripts, because I don?t seem to catch the bug... Best wishes to you all and thanks in advance ;) -- Jos? Luis Lav?n Trueba, PhD Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN From maj at fortinbras.us Thu Nov 5 10:39:05 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 5 Nov 2009 10:39:05 -0500 Subject: [Bioperl-l] A question about iBio::Index: and its correct use In-Reply-To: <2120.130.206.164.153.1257434903.squirrel@webmail.unavarra.es> References: <2120.130.206.164.153.1257434903.squirrel@webmail.unavarra.es> Message-ID: Jos? -- It looks like this is a good solution to your problem. Please send you script so we can look at it- cheers Mark ----- Original Message ----- From: To: Sent: Thursday, November 05, 2009 10:28 AM Subject: [Bioperl-l] A question about iBio::Index: and its correct use Hello to all, I?m trying to write a script to retrieve a list of sequences from a local FASTA file (for example a fasta archive where all the protein models of an organism are stored). This file would be used by me as some kind "local database" (sorry if I mistake a few concepts...) I?ve been reading the BioPerl HOWTOs and I came across the Bio::Index::Fasta tool. If I didn?t misunderstood what I read (which can be easy because my low level on programming) this Indexing tool should do the job. I wrote a couple of scripts based on the documentation i read about this tool, but I don?t seem to be able to create the index file to be used later (to retrieve the sequences from). -First of all, I want to ask the people in this forum if the Bio::Index::Fasta is the right one to chose for this tasks. -Then I?ll beg you to take a look at my scripts, because I don?t seem to catch the bug... Best wishes to you all and thanks in advance ;) -- Jos? Luis Lav?n Trueba, PhD Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From jluis.lavin at unavarra.es Thu Nov 5 10:46:36 2009 From: jluis.lavin at unavarra.es (jluis.lavin at unavarra.es) Date: Thu, 5 Nov 2009 16:46:36 +0100 (CET) Subject: [Bioperl-l] [Fwd: Re: A question about iBio::Index: and its correct use] Message-ID: <2642.130.206.164.153.1257435996.squirrel@webmail.unavarra.es> ---------------------------- Mensaje original ---------------------------- Subject: Re: [Bioperl-l] A question about iBio::Index: and its correct use From: jluis.lavin at unavarra.es Fecha: Jue, 5 de Noviembre de 2009, 16:46 To: "Mark A. Jensen" -------------------------------------------------------------------------- Hi Mark, I?ve actually got two scripts, the first one is to create the index and the second one is to retrieve the sequence lis from the indexed file. 1)Here is the Index creation script: #!/c:/Perl -w use strict; use Bio::Index::Fasta; use strict; print "Enter file for indexing: \n"; my $Index_File_Name = ; my $inx = Bio::Index::Fasta->new(-filename => $Index_File_Name.".idx", -write_flag => 1); $inx->make_index(my $File_Name); 2)And here is the sequence retrieval script: #!/c:/Perl -w use Bio::Index::Fasta; use strict; #PC9.fasta is my genomic file my $Index_File_Name ="PC9.fasta"; my $inx = Bio::Index::Fasta->new($Index_File_Name); #LCS.txt is my sequences list @ARGV = ; foreach my $id (@ARGV) { if ($id eq ''){ die ("empty list") } else { my $seqobj = $inx->fetch($id); my $out = new Bio::SeqIO (-file => ">>extracted_seqs.fasta", -format => 'fasta'); $out->write_seq($seqobj); } } exit; } I hope this code is not a total scum... Thanks in advance ;) El Jue, 5 de Noviembre de 2009, 16:39, Mark A. Jensen escribi?: > Jos? -- It looks like this is a good solution to your problem. Please send > you > script so we can look at it- > cheers Mark > ----- Original Message ----- > From: > To: > Sent: Thursday, November 05, 2009 10:28 AM > Subject: [Bioperl-l] A question about iBio::Index: and its correct use > > > > Hello to all, > > I?m trying to write a script to retrieve a list of sequences from a local > FASTA file (for example a fasta archive where all the protein models of an > organism are stored). This file would be used by me as some kind "local > database" (sorry if I mistake a few concepts...) > I?ve been reading the BioPerl HOWTOs and I came across the > Bio::Index::Fasta tool. > If I didn?t misunderstood what I read (which can be easy because my low > level on programming) this Indexing tool should do the job. > I wrote a couple of scripts based on the documentation i read about this > tool, but I don?t seem to be able to create the index file to be used > later (to retrieve the sequences from). > -First of all, I want to ask the people in this forum if the > Bio::Index::Fasta is the right one to chose for this tasks. > -Then I?ll beg you to take a look at my scripts, because I don?t seem to > catch the bug... > > Best wishes to you all and thanks in advance ;) > > -- > Jos? Luis Lav?n Trueba, PhD > > Dpto. de Producci?n Agraria > Grupo de Gen?tica y Microbiolog?a > Universidad P?blica de Navarra > 31006 Pamplona > Navarra > SPAIN > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > -- Dr. Jos? Luis Lav?n Trueba Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN -- Dr. Jos? Luis Lav?n Trueba Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN From maj at fortinbras.us Thu Nov 5 10:37:53 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 5 Nov 2009 10:37:53 -0500 Subject: [Bioperl-l] Trouble retrieving multiple sequences from NCBI ina single list query In-Reply-To: <628aabb70911040611q56b441c8o6888f326d0b314d@mail.gmail.com> References: <1386.130.206.164.153.1257324215.squirrel@webmail.unavarra.es> <628aabb70911040152r19ed79dfnbc54f346295d28a8@mail.gmail.com> <1791.130.206.164.153.1257329680.squirrel@webmail.unavarra.es> <628aabb70911040611q56b441c8o6888f326d0b314d@mail.gmail.com> Message-ID: <49075FDFF6764EE48E932D95EB994221@NewLife> True, Dave, you compete only with crazed east coast core developers who're doing "just one more thing" at 2am.... ----- Original Message ----- From: "Dave Messina" To: Cc: Sent: Wednesday, November 04, 2009 9:11 AM Subject: Re: [Bioperl-l] Trouble retrieving multiple sequences from NCBI ina single list query > Aw shucks, Jos?, glad I could be of help. There are plenty of people who > answer questions around here, but my timezone sometimes gives me an > advantage for the European ones. :) > > > Dave > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hrh at fmi.ch Thu Nov 5 11:02:48 2009 From: hrh at fmi.ch (Hotz, Hans-Rudolf) Date: Thu, 05 Nov 2009 17:02:48 +0100 Subject: [Bioperl-l] A question about iBio::Index: and its correct use In-Reply-To: <2120.130.206.164.153.1257434903.squirrel@webmail.unavarra.es> Message-ID: Jluis > -Then I?ll beg you to take a look at my scripts, because I don?t seem to > catch the bug... you haven't attached/included any scripts, have you? Anyway, have you considered using BLAST indices (created with the additional flag "-o") together with the tool 'fastacmd' (which also included in the NCBI blast binaries) as a simple (and very fast) alternative for fetching sequences. Regards, Hans From maj at fortinbras.us Thu Nov 5 11:02:09 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 5 Nov 2009 11:02:09 -0500 Subject: [Bioperl-l] [Fwd: Re: A question about iBio::Index: and its correct use] In-Reply-To: <2642.130.206.164.153.1257435996.squirrel@webmail.unavarra.es> References: <2642.130.206.164.153.1257435996.squirrel@webmail.unavarra.es> Message-ID: <1984ED07F36C446284B25F617964B6C6@NewLife> Hey Jos?, The first thing that jumps out it the index file name. Looks like you create it as PC9.fasta.idx But you read it as PC9.fasta Not an unusual mistake. Do my $inx = Bio::Index::Fasta->new('PC9.fasta.idx'); and see if it works. MAJ ----- Original Message ----- From: To: Sent: Thursday, November 05, 2009 10:46 AM Subject: [Bioperl-l] [Fwd: Re: A question about iBio::Index: and its correct use] ---------------------------- Mensaje original ---------------------------- Subject: Re: [Bioperl-l] A question about iBio::Index: and its correct use From: jluis.lavin at unavarra.es Fecha: Jue, 5 de Noviembre de 2009, 16:46 To: "Mark A. Jensen" -------------------------------------------------------------------------- Hi Mark, I?ve actually got two scripts, the first one is to create the index and the second one is to retrieve the sequence lis from the indexed file. 1)Here is the Index creation script: #!/c:/Perl -w use strict; use Bio::Index::Fasta; use strict; print "Enter file for indexing: \n"; my $Index_File_Name = ; my $inx = Bio::Index::Fasta->new(-filename => $Index_File_Name.".idx", -write_flag => 1); $inx->make_index(my $File_Name); 2)And here is the sequence retrieval script: #!/c:/Perl -w use Bio::Index::Fasta; use strict; #PC9.fasta is my genomic file my $Index_File_Name ="PC9.fasta"; my $inx = Bio::Index::Fasta->new($Index_File_Name); #LCS.txt is my sequences list @ARGV = ; foreach my $id (@ARGV) { if ($id eq ''){ die ("empty list") } else { my $seqobj = $inx->fetch($id); my $out = new Bio::SeqIO (-file => ">>extracted_seqs.fasta", -format => 'fasta'); $out->write_seq($seqobj); } } exit; } I hope this code is not a total scum... Thanks in advance ;) El Jue, 5 de Noviembre de 2009, 16:39, Mark A. Jensen escribi?: > Jos? -- It looks like this is a good solution to your problem. Please send > you > script so we can look at it- > cheers Mark > ----- Original Message ----- > From: > To: > Sent: Thursday, November 05, 2009 10:28 AM > Subject: [Bioperl-l] A question about iBio::Index: and its correct use > > > > Hello to all, > > I?m trying to write a script to retrieve a list of sequences from a local > FASTA file (for example a fasta archive where all the protein models of an > organism are stored). This file would be used by me as some kind "local > database" (sorry if I mistake a few concepts...) > I?ve been reading the BioPerl HOWTOs and I came across the > Bio::Index::Fasta tool. > If I didn?t misunderstood what I read (which can be easy because my low > level on programming) this Indexing tool should do the job. > I wrote a couple of scripts based on the documentation i read about this > tool, but I don?t seem to be able to create the index file to be used > later (to retrieve the sequences from). > -First of all, I want to ask the people in this forum if the > Bio::Index::Fasta is the right one to chose for this tasks. > -Then I?ll beg you to take a look at my scripts, because I don?t seem to > catch the bug... > > Best wishes to you all and thanks in advance ;) > > -- > Jos? Luis Lav?n Trueba, PhD > > Dpto. de Producci?n Agraria > Grupo de Gen?tica y Microbiolog?a > Universidad P?blica de Navarra > 31006 Pamplona > Navarra > SPAIN > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > -- Dr. Jos? Luis Lav?n Trueba Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN -- Dr. Jos? Luis Lav?n Trueba Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From jluis.lavin at unavarra.es Thu Nov 5 11:21:57 2009 From: jluis.lavin at unavarra.es (jluis.lavin at unavarra.es) Date: Thu, 5 Nov 2009 17:21:57 +0100 (CET) Subject: [Bioperl-l] [Fwd: Re: A question about iBio::Index: and its correct use] In-Reply-To: <1984ED07F36C446284B25F617964B6C6@NewLife> References: <2642.130.206.164.153.1257435996.squirrel@webmail.unavarra.es> <1984ED07F36C446284B25F617964B6C6@NewLife> Message-ID: <2969.130.206.164.153.1257438117.squirrel@webmail.unavarra.es> Thank you very much Mark, that?s a good point :$ I guess your correction is referred to the second script, isn?t it? If it is so, there is still a problem with the first script, it doesn?t create the PC9.fasta.idx file, instead it creates two files named: -PC9.fasta.idx.pag -PC9.fasta.idx.dir which seem to be clearly related with some kind of indexing process...but, unless the PC9.fasta.idx file is only virtual or remains hidden, I can?t find it anywhere... Forgive me if I?m talking nosense... Thank you very much again for your help ;) El Jue, 5 de Noviembre de 2009, 17:02, Mark A. Jensen escribi?: > Hey Jos?, > The first thing that jumps out it the index file name. Looks > like you create it as > PC9.fasta.idx > But you read it as > PC9.fasta > Not an unusual mistake. Do > my $inx = Bio::Index::Fasta->new('PC9.fasta.idx'); > and see if it works. > MAJ > ----- Original Message ----- > From: > To: > Sent: Thursday, November 05, 2009 10:46 AM > Subject: [Bioperl-l] [Fwd: Re: A question about iBio::Index: and its > correct > use] > > > > > ---------------------------- Mensaje original ---------------------------- > Subject: Re: [Bioperl-l] A question about iBio::Index: and its correct use > From: jluis.lavin at unavarra.es > Fecha: Jue, 5 de Noviembre de 2009, 16:46 > To: "Mark A. Jensen" > -------------------------------------------------------------------------- > > Hi Mark, > > I?ve actually got two scripts, the first one is to create the index and > the second one is to retrieve the sequence lis from the indexed file. > > 1)Here is the Index creation script: > > #!/c:/Perl -w > use strict; > use Bio::Index::Fasta; > use strict; > > print "Enter file for indexing: \n"; > my $Index_File_Name = ; > my $inx = Bio::Index::Fasta->new(-filename => $Index_File_Name.".idx", > -write_flag => 1); > $inx->make_index(my $File_Name); > > 2)And here is the sequence retrieval script: > > #!/c:/Perl -w > use Bio::Index::Fasta; > use strict; > #PC9.fasta is my genomic file > my $Index_File_Name ="PC9.fasta"; > my $inx = Bio::Index::Fasta->new($Index_File_Name); > #LCS.txt is my sequences list > @ARGV = ; > foreach my $id (@ARGV) { > if ($id eq ''){ > die ("empty list") > } > else { > my $seqobj = $inx->fetch($id); > my $out = new Bio::SeqIO (-file => ">>extracted_seqs.fasta", > -format => 'fasta'); > $out->write_seq($seqobj); > } > } > exit; > } > > I hope this code is not a total scum... > > Thanks in advance ;) > > > > El Jue, 5 de Noviembre de 2009, 16:39, Mark A. Jensen escribi?: >> Jos? -- It looks like this is a good solution to your problem. Please >> send >> you >> script so we can look at it- >> cheers Mark >> ----- Original Message ----- >> From: >> To: >> Sent: Thursday, November 05, 2009 10:28 AM >> Subject: [Bioperl-l] A question about iBio::Index: and its correct use >> >> >> >> Hello to all, >> >> I?m trying to write a script to retrieve a list of sequences from a >> local >> FASTA file (for example a fasta archive where all the protein models of >> an >> organism are stored). This file would be used by me as some kind "local >> database" (sorry if I mistake a few concepts...) >> I?ve been reading the BioPerl HOWTOs and I came across the >> Bio::Index::Fasta tool. >> If I didn?t misunderstood what I read (which can be easy because my low >> level on programming) this Indexing tool should do the job. >> I wrote a couple of scripts based on the documentation i read about this >> tool, but I don?t seem to be able to create the index file to be used >> later (to retrieve the sequences from). >> -First of all, I want to ask the people in this forum if the >> Bio::Index::Fasta is the right one to chose for this tasks. >> -Then I?ll beg you to take a look at my scripts, because I don?t seem to >> catch the bug... >> >> Best wishes to you all and thanks in advance ;) >> >> -- >> Jos? Luis Lav?n Trueba, PhD >> >> Dpto. de Producci?n Agraria >> Grupo de Gen?tica y Microbiolog?a >> Universidad P?blica de Navarra >> 31006 Pamplona >> Navarra >> SPAIN >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > > > -- > Dr. Jos? Luis Lav?n Trueba > > Dpto. de Producci?n Agraria > Grupo de Gen?tica y Microbiolog?a > Universidad P?blica de Navarra > 31006 Pamplona > Navarra > SPAIN > > > > -- > Dr. Jos? Luis Lav?n Trueba > > Dpto. de Producci?n Agraria > Grupo de Gen?tica y Microbiolog?a > Universidad P?blica de Navarra > 31006 Pamplona > Navarra > SPAIN > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > -- Dr. Jos? Luis Lav?n Trueba Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN From maj at fortinbras.us Thu Nov 5 11:39:09 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 5 Nov 2009 11:39:09 -0500 Subject: [Bioperl-l] [Fwd: Re: A question about iBio::Index: and its correct use] In-Reply-To: <2969.130.206.164.153.1257438117.squirrel@webmail.unavarra.es> References: <2642.130.206.164.153.1257435996.squirrel@webmail.unavarra.es> <1984ED07F36C446284B25F617964B6C6@NewLife> <2969.130.206.164.153.1257438117.squirrel@webmail.unavarra.es> Message-ID: Yes, these are files created by the SDBM, Perl's internal db manager. You should be able to open the index by simply $inx = Bio::Index::Fasta->new('PC9.fasta.idx'); and the dbm will know what to do-- cheers MAJ ----- Original Message ----- From: To: "Mark A. Jensen" Cc: ; Sent: Thursday, November 05, 2009 11:21 AM Subject: Re: [Bioperl-l] [Fwd: Re: A question about iBio::Index: and its correct use] > Thank you very much Mark, that?s a good point :$ > I guess your correction is referred to the second script, isn?t it? > > If it is so, there is still a problem with the first script, it doesn?t > create the PC9.fasta.idx file, instead it creates two files named: > -PC9.fasta.idx.pag > -PC9.fasta.idx.dir > > which seem to be clearly related with some kind of indexing process...but, > unless the PC9.fasta.idx file is only virtual or remains hidden, I can?t > find it anywhere... > Forgive me if I?m talking nosense... > > Thank you very much again for your help ;) > > > El Jue, 5 de Noviembre de 2009, 17:02, Mark A. Jensen escribi?: >> Hey Jos?, >> The first thing that jumps out it the index file name. Looks >> like you create it as >> PC9.fasta.idx >> But you read it as >> PC9.fasta >> Not an unusual mistake. Do >> my $inx = Bio::Index::Fasta->new('PC9.fasta.idx'); >> and see if it works. >> MAJ >> ----- Original Message ----- >> From: >> To: >> Sent: Thursday, November 05, 2009 10:46 AM >> Subject: [Bioperl-l] [Fwd: Re: A question about iBio::Index: and its >> correct >> use] >> >> >> >> >> ---------------------------- Mensaje original ---------------------------- >> Subject: Re: [Bioperl-l] A question about iBio::Index: and its correct use >> From: jluis.lavin at unavarra.es >> Fecha: Jue, 5 de Noviembre de 2009, 16:46 >> To: "Mark A. Jensen" >> -------------------------------------------------------------------------- >> >> Hi Mark, >> >> I?ve actually got two scripts, the first one is to create the index and >> the second one is to retrieve the sequence lis from the indexed file. >> >> 1)Here is the Index creation script: >> >> #!/c:/Perl -w >> use strict; >> use Bio::Index::Fasta; >> use strict; >> >> print "Enter file for indexing: \n"; >> my $Index_File_Name = ; >> my $inx = Bio::Index::Fasta->new(-filename => $Index_File_Name.".idx", >> -write_flag => 1); >> $inx->make_index(my $File_Name); >> >> 2)And here is the sequence retrieval script: >> >> #!/c:/Perl -w >> use Bio::Index::Fasta; >> use strict; >> #PC9.fasta is my genomic file >> my $Index_File_Name ="PC9.fasta"; >> my $inx = Bio::Index::Fasta->new($Index_File_Name); >> #LCS.txt is my sequences list >> @ARGV = ; >> foreach my $id (@ARGV) { >> if ($id eq ''){ >> die ("empty list") >> } >> else { >> my $seqobj = $inx->fetch($id); >> my $out = new Bio::SeqIO (-file => ">>extracted_seqs.fasta", >> -format => 'fasta'); >> $out->write_seq($seqobj); >> } >> } >> exit; >> } >> >> I hope this code is not a total scum... >> >> Thanks in advance ;) >> >> >> >> El Jue, 5 de Noviembre de 2009, 16:39, Mark A. Jensen escribi?: >>> Jos? -- It looks like this is a good solution to your problem. Please >>> send >>> you >>> script so we can look at it- >>> cheers Mark >>> ----- Original Message ----- >>> From: >>> To: >>> Sent: Thursday, November 05, 2009 10:28 AM >>> Subject: [Bioperl-l] A question about iBio::Index: and its correct use >>> >>> >>> >>> Hello to all, >>> >>> I?m trying to write a script to retrieve a list of sequences from a >>> local >>> FASTA file (for example a fasta archive where all the protein models of >>> an >>> organism are stored). This file would be used by me as some kind "local >>> database" (sorry if I mistake a few concepts...) >>> I?ve been reading the BioPerl HOWTOs and I came across the >>> Bio::Index::Fasta tool. >>> If I didn?t misunderstood what I read (which can be easy because my low >>> level on programming) this Indexing tool should do the job. >>> I wrote a couple of scripts based on the documentation i read about this >>> tool, but I don?t seem to be able to create the index file to be used >>> later (to retrieve the sequences from). >>> -First of all, I want to ask the people in this forum if the >>> Bio::Index::Fasta is the right one to chose for this tasks. >>> -Then I?ll beg you to take a look at my scripts, because I don?t seem to >>> catch the bug... >>> >>> Best wishes to you all and thanks in advance ;) >>> >>> -- >>> Jos? Luis Lav?n Trueba, PhD >>> >>> Dpto. de Producci?n Agraria >>> Grupo de Gen?tica y Microbiolog?a >>> Universidad P?blica de Navarra >>> 31006 Pamplona >>> Navarra >>> SPAIN >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >> >> >> -- >> Dr. Jos? Luis Lav?n Trueba >> >> Dpto. de Producci?n Agraria >> Grupo de Gen?tica y Microbiolog?a >> Universidad P?blica de Navarra >> 31006 Pamplona >> Navarra >> SPAIN >> >> >> >> -- >> Dr. Jos? Luis Lav?n Trueba >> >> Dpto. de Producci?n Agraria >> Grupo de Gen?tica y Microbiolog?a >> Universidad P?blica de Navarra >> 31006 Pamplona >> Navarra >> SPAIN >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > > > -- > Dr. Jos? Luis Lav?n Trueba > > Dpto. de Producci?n Agraria > Grupo de Gen?tica y Microbiolog?a > Universidad P?blica de Navarra > 31006 Pamplona > Navarra > SPAIN > > > > From jluis.lavin at unavarra.es Thu Nov 5 12:48:12 2009 From: jluis.lavin at unavarra.es (jluis.lavin at unavarra.es) Date: Thu, 5 Nov 2009 18:48:12 +0100 (CET) Subject: [Bioperl-l] A question about iBio::Index: and its correct use In-Reply-To: References: <2120.130.206.164.153.1257434903.squirrel@webmail.unavarra.es> Message-ID: <3313.130.206.164.153.1257443292.squirrel@webmail.unavarra.es> Thanks a lot for your help Hans, It's a little bit to hard to understand and turn into script this awesome information you've just given me...I hope I can use it in a near future anyway ;) The issue here is that the sequences I,m indexing are not generated by the NCBI nor stored there...although I belive you?re just refering to the tool itself and not to a retrieval from the NCBI. Thanks again you?re all great giving advice to newbies like me ;) Best wishes to you all El Jue, 5 de Noviembre de 2009, 17:02, Hotz, Hans-Rudolf escribi?: > > > > Jluis > >> -Then I?ll beg you to take a look at my scripts, because I don?t seem to >> catch the bug... > > you haven't attached/included any scripts, have you? > > > Anyway, have you considered using BLAST indices (created with the > additional > flag "-o") together with the tool 'fastacmd' (which also included in the > NCBI blast binaries) as a simple (and very fast) alternative for fetching > sequences. > > > Regards, Hans > > > -- Dr. Jos? Luis Lav?n Trueba Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN From florent.angly at gmail.com Thu Nov 5 13:00:19 2009 From: florent.angly at gmail.com (Florent Angly) Date: Thu, 05 Nov 2009 10:00:19 -0800 Subject: [Bioperl-l] A question about iBio::Index: and its correct use In-Reply-To: <3313.130.206.164.153.1257443292.squirrel@webmail.unavarra.es> References: <2120.130.206.164.153.1257434903.squirrel@webmail.unavarra.es> <3313.130.206.164.153.1257443292.squirrel@webmail.unavarra.es> Message-ID: <4AF312B3.9060009@gmail.com> Hans-Rudolf was talking about a way to retrieve sequences from a BLAST database. If you use BLAST locally, then your database is local too. More info here: http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/formatdb_fastacmd.html Florent jluis.lavin at unavarra.es wrote: > Thanks a lot for your help Hans, > It's a little bit to hard to understand and turn into script this awesome > information you've just given me...I hope I can use it in a near future > anyway ;) > The issue here is that the sequences I,m indexing are not generated by the > NCBI nor stored there...although I belive you?re just refering to the tool > itself and not to a retrieval from the NCBI. > > Thanks again you?re all great giving advice to newbies like me ;) > > Best wishes to you all > > > El Jue, 5 de Noviembre de 2009, 17:02, Hotz, Hans-Rudolf escribi?: > >> >> Jluis >> >> >>> -Then I?ll beg you to take a look at my scripts, because I don?t seem to >>> catch the bug... >>> >> you haven't attached/included any scripts, have you? >> >> >> Anyway, have you considered using BLAST indices (created with the >> additional >> flag "-o") together with the tool 'fastacmd' (which also included in the >> NCBI blast binaries) as a simple (and very fast) alternative for fetching >> sequences. >> >> >> Regards, Hans >> >> >> >> > > > From valiente at lsi.upc.edu Fri Nov 6 03:06:48 2009 From: valiente at lsi.upc.edu (valiente at lsi.upc.edu) Date: Fri, 6 Nov 2009 09:06:48 +0100 (CET) Subject: [Bioperl-l] Bio::SeqIO::genbank.pm Message-ID: <45737.147.83.59.225.1257494808.squirrel@webmail.lsi.upc.edu> There is a line in Bio::SeqIO::genbank.pm to convert data in classification lines into a classification array by splitting only on ';' or '.' so that a classification that is 2 or more words will still get matched,my @class = map { s/^\s+//; s/\s+$//; s/\s{2,}/ /g; $_; } split /(? References: <2120.130.206.164.153.1257434903.squirrel@webmail.unavarra.es> < C718B5B8.5561%hrh@fmi.ch> <3313.130.206.164.153.1257443292.squirrel@webmail.unavarra.es> <4AF312B3.9060009@gmail.com> Message-ID: <1222.130.206.164.153.1257497085.squirrel@webmail.unavarra.es> Thank you for the info Florent! I?ll try to read al the information on the link you provided and try to figure out how to make it work and if it is worthy for me, I mean, I work with several sequence files that come from multiple databases (JGI, BROAD, Genolevures or NCBI). Protein IDs from each of those databases is different from NCBI. Maybe it could be easier to write a script that allows me to enter a fasta file with all the protein models of a single organism, parse it and then extract the sequences of a given list (using the "ID style" of the particular database) than creating a BLAST index for each organism I need to work with...Did I explain the issue correctly? Anyway, since I don?t know anything about this tool Hans and you provided me, I can easily be wrong... Thank you for showing me the local BLAST Index tool, I?ll read the documentation carefully and study all its possibilities. Best wishes JL El Jue, 5 de Noviembre de 2009, 19:00, Florent Angly escribi?: > Hans-Rudolf was talking about a way to retrieve sequences from a BLAST > database. If you use BLAST locally, then your database is local too. > More info here: > http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/formatdb_fastacmd.html > Florent > > > jluis.lavin at unavarra.es wrote: >> Thanks a lot for your help Hans, >> It's a little bit to hard to understand and turn into script this >> awesome >> information you've just given me...I hope I can use it in a near future >> anyway ;) >> The issue here is that the sequences I,m indexing are not generated by >> the >> NCBI nor stored there...although I belive you?re just refering to the >> tool >> itself and not to a retrieval from the NCBI. >> >> Thanks again you?re all great giving advice to newbies like me ;) >> >> Best wishes to you all >> >> >> El Jue, 5 de Noviembre de 2009, 17:02, Hotz, Hans-Rudolf escribi?: >> >>> >>> Jluis >>> >>> >>>> -Then I?ll beg you to take a look at my scripts, because I don?t seem >>>> to >>>> catch the bug... >>>> >>> you haven't attached/included any scripts, have you? >>> >>> >>> Anyway, have you considered using BLAST indices (created with the >>> additional >>> flag "-o") together with the tool 'fastacmd' (which also included in >>> the >>> NCBI blast binaries) as a simple (and very fast) alternative for >>> fetching >>> sequences. >>> >>> >>> Regards, Hans >>> >>> >>> >>> >> >> >> > > -- Dr. Jos? Luis Lav?n Trueba Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN From maj at fortinbras.us Fri Nov 6 07:45:01 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 6 Nov 2009 07:45:01 -0500 Subject: [Bioperl-l] Bioperl In-Reply-To: <16842715.26316.1257510446095.JavaMail.root@durga.amrita.ac.in> References: <16842715.26316.1257510446095.JavaMail.root@durga.amrita.ac.in> Message-ID: Hi Resmi- You should look at http://bioperl.org/ under "Installation" for information on getting and installing BioPerl. An introduction to working with trees in BioPerl is at this link: http://www.bioperl.org/wiki/HOWTO:Trees cheers, Mark ----- Original Message ----- From: Resmi S. To: maj at fortinbras.us Sent: Friday, November 06, 2009 7:27 AM Subject: Bioperl Respected Sir, I am Resmi S studying II MSc Bioinformatics.Now am doing my project in Phylogenetic Tree Construction using BioPerl.I am not much familiar on BioPerl modules.So could please send me the names of the Bioperl modules needed for my project.I also need to know , from where i will get these modules.If that is from CPAN,then send me the location or link.I kindly request you to send me the details soon. Yours Sincerely, Resmi S, II MSc Bioinformatics, School of Biotechnology, Amrita Vishwa Vidyapeetham, Email : amm08bi019 at students.amrita.ac.in ------------------------------------------------------------------------------ ------------------------------------------------------------------- This mail has been scanned by Amrita GAV Server, Amrita Vishwa Vidyapeetham, Amritapuri Campus From robert.bradbury at gmail.com Fri Nov 6 12:35:22 2009 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Fri, 6 Nov 2009 12:35:22 -0500 Subject: [Bioperl-l] Function that determines serious mutations Message-ID: Is there a function in the library (or has someone written one) that can take a genbank entry and determine which mutations are harmful? It would be used to produce a table summary of: GENE # SNP # BadSNP One kind of gets this from NCBI if you lookup in the "GENE" db a gene name and then go to the "GeneView" om dbSNP page it has the information I want but largely in a graphical format while I simply want numbers I can dump into a spreadsheet. I don't think it would be hard, fetch the gene, run through the features for the SNP database, figure out whether they are good or bad SNPs, accumulate the statistics and dump it. I think the functions available are flexible enough to do it but I can't believe nobody has already done it. It could be a bit more complex in that one could do an analysis to see if the mutations are in a conserved domain or mutations that code for Cysteine or Methionine (or othe potentially "critical" amino acids) but since "critical" is in the eye of the beholder there would have to be some kind of callback to a scoring function. Thanks, Robert From nevoband at igb.uiuc.edu Fri Nov 6 15:58:05 2009 From: nevoband at igb.uiuc.edu (kleenix) Date: Fri, 6 Nov 2009 12:58:05 -0800 (PST) Subject: [Bioperl-l] StandAloneBlast Unallowed parameter Message-ID: <26230896.post@talk.nabble.com> I'm not sure if i'm doing this wrong. I am trying to use the -m parameter in blastall using the StandAloneBlast bioperl class. when i add 'm'=>0 to @params i get Unallowed parameter: error. Am I adding the parameter wrong? i'm using StandAloneBlast version 1.51 Thanks -Nevo -- View this message in context: http://old.nabble.com/StandAloneBlast-Unallowed-parameter-tp26230896p26230896.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From veronica.xiaoyu at gmail.com Fri Nov 6 17:25:04 2009 From: veronica.xiaoyu at gmail.com (Xiaoyu Liang) Date: Fri, 6 Nov 2009 17:25:04 -0500 Subject: [Bioperl-l] Parsing BLAST out file to HTML. How to change the description's name of each hit? Message-ID: Hi, I'm using Bio::SearchIO::Writer HTMLResultWriter help me parse BLAST out file into HTML. Anybody knows how to parse and change the description name of each hit? By using hit->description can call hits' description, but it is not allowed to be modified. Thank you very much, Xiaoyu From maj at fortinbras.us Fri Nov 6 19:40:17 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 6 Nov 2009 19:40:17 -0500 Subject: [Bioperl-l] Parsing BLAST out file to HTML. How to change thedescription's name of each hit? In-Reply-To: References: Message-ID: <11592B31D9924FA7A8638D90AE4A3F4A@NewLife> Xiaoyu- That method should work to change the description; are you doing $hit->description('This is my new description'); This method returns the old description when you change the value: $hit->description('old'); $str = $hit->description('new'); # $str eq 'old' $str = $hit->description; # $str eq 'new' MAJ ----- Original Message ----- From: "Xiaoyu Liang" To: Sent: Friday, November 06, 2009 5:25 PM Subject: [Bioperl-l] Parsing BLAST out file to HTML. How to change thedescription's name of each hit? > Hi, > > I'm using Bio::SearchIO::Writer HTMLResultWriter help me parse BLAST out > file into HTML. > > Anybody knows how to parse and change the description name of each hit? > > By using hit->description can call hits' description, but it is not allowed > to be modified. > > Thank you very much, > Xiaoyu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From Daniel.Lang at biologie.uni-freiburg.de Sun Nov 8 09:50:48 2009 From: Daniel.Lang at biologie.uni-freiburg.de (Daniel Lang) Date: Sun, 08 Nov 2009 15:50:48 +0100 Subject: [Bioperl-l] arguments to call back functions in GBrowse2 Message-ID: <4AF6DAC8.8070204@biologie.uni-freiburg.de> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi Lincoln, a while back (May 29, 2009; 09:08pm) you replied to an even older thread ("Re: Access the parent of a Bio::DB::SeqFeature within a gbrowse config callback function"). I missed your reply and did follow it up back then, sorry! I'm currently facing the same issue again with gbrowse2. I have a callback function for "balloon click". Following your last reply I expected 5 arguments, but I am getting only three: $feature,$panel,$track. In principle, I am using the latest releases/checkouts... Which modules do I need to look at/update for this functionality? Furthermore, is there a possibility to share global variables between gbrowse2 and slaves? Should this work via init_code? Should modules initialized in a conf be in the scope of a slave? If not can I introduce modules via the slave config files, or do I need to alter the slave scripts? Thanks, again! Cheers, Daniel PS: gbrowse2 rocks! -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkr22sUACgkQmJnbCpJAG3A2MgCdG61bNRGMFVWExagzMFejKMjO FiUAn16nQNemDGSy8nJBS5dUHQMnDgrP =ODxn -----END PGP SIGNATURE----- From maj at fortinbras.us Sun Nov 8 11:09:43 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sun, 8 Nov 2009 11:09:43 -0500 Subject: [Bioperl-l] GuessSeqFormat: fastq? Message-ID: Hi All- Any plans in the works for a _possibly_fastq sequence guesser? MAJ From maj at fortinbras.us Sun Nov 8 11:20:55 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sun, 8 Nov 2009 11:20:55 -0500 Subject: [Bioperl-l] GuessSeqFormat: fastq? In-Reply-To: References: Message-ID: Never mind; got it covered-- MAJ ----- Original Message ----- From: "Mark A. Jensen" To: "bioperl-l" Sent: Sunday, November 08, 2009 11:09 AM Subject: [Bioperl-l] GuessSeqFormat: fastq? > Hi All- > Any plans in the works for a _possibly_fastq sequence guesser? > MAJ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From saikari78 at gmail.com Mon Nov 9 10:47:10 2009 From: saikari78 at gmail.com (saikari keitele) Date: Mon, 9 Nov 2009 15:47:10 +0000 Subject: [Bioperl-l] Retrieving link to protein from PubChem Message-ID: Hi, I'm using Bioperl to retrieve records from PubChem. I'm trying to find a way-but have been unsuccessful- to retrieve from a compound record, the reference to the protein(s) that can synthesize the compound. Thanks very much. saikari From saikari78 at gmail.com Mon Nov 9 11:05:57 2009 From: saikari78 at gmail.com (saikari keitele) Date: Mon, 9 Nov 2009 16:05:57 +0000 Subject: [Bioperl-l] [bioperl newbie] Retrieving link to protein from PubChem Message-ID: Hi, I'm using Bioperl to retrieve records from PubChem. I'm trying to find a way-but have been unsuccessful- to retrieve from a compound record, the reference to the protein(s) that can synthesize the compound. Thanks very much. saikari From cjfields at illinois.edu Mon Nov 9 11:27:10 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 9 Nov 2009 10:27:10 -0600 Subject: [Bioperl-l] [bioperl newbie] Retrieving link to protein from PubChem In-Reply-To: References: Message-ID: <1ECC543A-F923-4D5E-A0C1-5BBD35ECAAE8@illinois.edu> On Nov 9, 2009, at 10:05 AM, saikari keitele wrote: > Hi, > > I'm using Bioperl to retrieve records from PubChem. > I'm trying to find a way-but have been unsuccessful- to retrieve > from a > compound record, the reference to the protein(s) that can synthesize > the > compound. > Thanks very much. > > saikari The below bioperl script returns the GI for proteins that correspond to the substance passed on the command line; invoke using 'perl pc_substance.pl substance_requested'. It probably needs more fiddling to catch everything but it should get you started. For other bits and pieces (such as how to retrieve the raw sequence files), please see the EUtilities HOWTO: http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook chris ---------------------------------------- #!/usr/bin/perl -w use 5.010; use strict; use warnings; use Bio::DB::EUtilities; my $substance = shift; my $eutil = Bio::DB::EUtilities->new(-eutil => 'esearch', -db => 'pcsubstance', -term => $substance, -usehistory => 'y'); my $hist = $eutil->next_History || die; $eutil->reset_parameters(-eutil => 'elink', -history => $hist, -db => 'protein', -dbfrom => 'pcsubstance', -retmax => 1000); say join(',',$eutil->get_ids); From saikari78 at gmail.com Mon Nov 9 11:41:20 2009 From: saikari78 at gmail.com (saikari keitele) Date: Mon, 9 Nov 2009 16:41:20 +0000 Subject: [Bioperl-l] [bioperl newbie] Retrieving link to protein from PubChem In-Reply-To: <1ECC543A-F923-4D5E-A0C1-5BBD35ECAAE8@illinois.edu> References: <1ECC543A-F923-4D5E-A0C1-5BBD35ECAAE8@illinois.edu> Message-ID: Fabulous!. Huge help. saikari On Mon, Nov 9, 2009 at 4:27 PM, Chris Fields wrote: > On Nov 9, 2009, at 10:05 AM, saikari keitele wrote: > > Hi, >> >> I'm using Bioperl to retrieve records from PubChem. >> I'm trying to find a way-but have been unsuccessful- to retrieve from a >> compound record, the reference to the protein(s) that can synthesize the >> compound. >> Thanks very much. >> >> saikari >> > > The below bioperl script returns the GI for proteins that correspond to the > substance passed on the command line; invoke using 'perl pc_substance.plsubstance_requested'. It probably needs more fiddling to catch everything > but it should get you started. > > For other bits and pieces (such as how to retrieve the raw sequence files), > please see the EUtilities HOWTO: > > http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook > > chris > > ---------------------------------------- > > #!/usr/bin/perl -w > > use 5.010; > use strict; > use warnings; > use Bio::DB::EUtilities; > > my $substance = shift; > > my $eutil = Bio::DB::EUtilities->new(-eutil => 'esearch', > -db => 'pcsubstance', > -term => $substance, > -usehistory => 'y'); > > my $hist = $eutil->next_History || die; > > $eutil->reset_parameters(-eutil => 'elink', > -history => $hist, > -db => 'protein', > -dbfrom => 'pcsubstance', > -retmax => 1000); > > say join(',',$eutil->get_ids); > From gc11song at gmail.com Mon Nov 9 13:08:48 2009 From: gc11song at gmail.com (Guangchun Song) Date: Mon, 9 Nov 2009 12:08:48 -0600 Subject: [Bioperl-l] how to get the protein sequences from DNA sequences around novel SNPs? Message-ID: <794eafc20911091008g1f98b944ncbd66ac4962a85a3@mail.gmail.com> Hello, I'm new bioperl user. I' working on a project: To determine the status of all tutative SNPs such as non-synonymous vs. synonymous, and predict the tranlational effect of non-synonymous mutations as benign or malicious. I'm trying to use bioperl to get the DNA sequence and translate to protein sequence for the SNPs that are in gene's coding region. Could someone tell me how to do it? Thanks, -Guangchun Song From robert.bradbury at gmail.com Mon Nov 9 16:15:33 2009 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Mon, 9 Nov 2009 16:15:33 -0500 Subject: [Bioperl-l] how to get the protein sequences from DNA sequences around novel SNPs? In-Reply-To: <794eafc20911091008g1f98b944ncbd66ac4962a85a3@mail.gmail.com> References: <794eafc20911091008g1f98b944ncbd66ac4962a85a3@mail.gmail.com> Message-ID: On Mon, Nov 9, 2009 at 1:08 PM, Guangchun Song wrote: > > I'm new bioperl user. I' working on a project: To determine the > status of all tutative SNPs such as non-synonymous vs. synonymous, and > predict the tranlational effect of non-synonymous mutations as benign > or malicious. I'm trying to use bioperl to get the DNA sequence and > translate to protein sequence for the SNPs that are in gene's coding > region. Could someone tell me how to do it? > > I too would like to know if this information is available. I've recently been working with the dbSNP results from NCBI but they display the results in a graphical format rather than data that one can play with and ask questions of like "What is the most disease causing gene in the Human Genome?" or "What are the critical proteins damaged by gene defects in the Human Genome?" ... "In terms of premature deaths, extended health care requirements, loss of quality of life, etc.?" The same types of questions can be applied to the dog and cat genomes where there is emotional value or the cow, horse, pig, etc. genomes where there is economic value? The value of BioPerl would increase significantly if there were functionality that would allow easy access to "these mutations may have negative/positive impact" (which means you need a function that qualifies mutations by degree) and allow for impact to be subjectively determined (implying there must be some callback function to provide a user quality/impact rating). For example: $/@differences = protein_compare($mygene, $refseq_gene, @critical_aa, @critical_domain, $callback) Where $callback could "rate" differences about the protein and position and the "type of interest" (e.g. metal binding amino acids, structural changing amino acids, critical catalysis amino acids, etc.). A default callback would be based on some evolving definition of "critical" changes which result in human disease for example. This is a "required" capability to be able to determine things like the "adaptability" of a species -- those with fewest critical mutation points may have better adaptability to mutation increasing circumstances. Please pardon any errors in perl syntax/usage its been a while since I've written perl and I'd really rather be coding in C. Robert From maj at fortinbras.us Mon Nov 9 16:56:24 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 9 Nov 2009 16:56:24 -0500 Subject: [Bioperl-l] how to get the protein sequences from DNA sequencesaround novel SNPs? In-Reply-To: References: <794eafc20911091008g1f98b944ncbd66ac4962a85a3@mail.gmail.com> Message-ID: <3ED3D387B5DE4248A218D42882369925@NewLife> I agree that BioPerl would significantly increase in value with such a module; in fact, the BioTeam would probably buy us out. My opinion is that the entire GWAS enterprise is the search for such a callback function, for humans anyway. For those engaged in this quest, if BioPerl doesn't provide a Maserati, it at least provides good italian-made (among others) parts. MAJ ----- Original Message ----- From: "Robert Bradbury" To: "Guangchun Song" Cc: Sent: Monday, November 09, 2009 4:15 PM Subject: Re: [Bioperl-l] how to get the protein sequences from DNA sequencesaround novel SNPs? > On Mon, Nov 9, 2009 at 1:08 PM, Guangchun Song wrote: >> >> I'm new bioperl user. I' working on a project: To determine the >> status of all tutative SNPs such as non-synonymous vs. synonymous, and >> predict the tranlational effect of non-synonymous mutations as benign >> or malicious. I'm trying to use bioperl to get the DNA sequence and >> translate to protein sequence for the SNPs that are in gene's coding >> region. Could someone tell me how to do it? >> >> > I too would like to know if this information is available. I've recently > been working with the dbSNP results from NCBI but they display the results > in a graphical format rather than data that one can play with and ask > questions of like "What is the most disease causing gene in the Human > Genome?" or "What are the critical proteins damaged by gene defects in the > Human Genome?" ... "In terms of premature deaths, extended health care > requirements, loss of quality of life, etc.?" > > The same types of questions can be applied to the dog and cat genomes where > there is emotional value or the cow, horse, pig, etc. genomes where there is > economic value? > > The value of BioPerl would increase significantly if there were > functionality that would allow easy access to "these mutations may have > negative/positive impact" (which means you need a function that qualifies > mutations by degree) and allow for impact to be subjectively determined > (implying there must be some callback function to provide a user > quality/impact rating). > > For example: > $/@differences = protein_compare($mygene, $refseq_gene, @critical_aa, > @critical_domain, $callback) > Where $callback could "rate" differences about the protein and position and > the "type of interest" (e.g. metal binding amino acids, structural changing > amino acids, critical catalysis amino acids, etc.). > > A default callback would be based on some evolving definition of "critical" > changes which result in human disease for example. > > This is a "required" capability to be able to determine things like the > "adaptability" of a species -- those with fewest critical mutation points > may have better adaptability to mutation increasing circumstances. > > Please pardon any errors in perl syntax/usage its been a while since I've > written perl and I'd really rather be coding in C. > > Robert > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From alexl at users.sourceforge.net Mon Nov 9 18:44:07 2009 From: alexl at users.sourceforge.net (Alex Lancaster) Date: Mon, 09 Nov 2009 18:44:07 -0500 Subject: [Bioperl-l] version of ExtUtils::Manifest too strict? In-Reply-To: <1D9E943F-2EDC-49AB-83DE-78DED5A8AC23@illinois.edu> (Chris Fields's message of "Wed, 4 Nov 2009 07:53:35 -0600") References: <1D9E943F-2EDC-49AB-83DE-78DED5A8AC23@illinois.edu> Message-ID: >>>>> Chris Fields writes: > Alex, Not sure why ExtUtils::Manifest can't be bundled as a separate > perl package alone. It is part of perl core but it's also available > on CPAN separately from perl itself: > http://search.cpan.org/~rkobes/ExtUtils-Manifest-1.57/lib/ExtUtils/Manifest.pm Hi Chris, Yes, in principle it would be possible to have this split out as a separate package (currently it's a "subpackage" under the main perl package), unfortunately that's just not the way it's currently done in Fedora (probably because it's part of the core set and they like to update all relevant packages in one step) and I have little control over that. As I suspected, the perl maintainer is not at all enthusiastic for updating the whole of perl just for that package (except for rawhide which would mean that bioperl 1.6.1 would not be available until F-13, about 6 months from now). See: http://bugzilla.redhat.com/show_bug.cgi?id=533562#c1 Obviously I am not happy with this situation either, because it will freeze bioperl on Fedora at 1.6.0 for about 6 months, so can you recommend any temporary workarounds in the meantime? > This is the commit message for that BTW. This allows spaces in file > names for the MANIFEST. v1.52 is a bug fix and is required. > http://code.open-bio.org/svnweb/index.cgi/bioperl/revision/?rev=15673 Perhaps I could create a patch that renamed files with spaces in them to ones with no spaces and then rename them again upon installation. Can you point me to which files are the problematic ones that triggered the dependency for 1.52? Perhaps I can figure a workaround. Meanwhile I will press the maintainer of perl in Fedora to perhaps reconsider his position (e.g. if another update for perl is going out for another reason, like a security update, perhaps he could roll in the 1.52 update at the same time). Cheers, Alex From cjfields at illinois.edu Mon Nov 9 19:50:00 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 9 Nov 2009 18:50:00 -0600 Subject: [Bioperl-l] version of ExtUtils::Manifest too strict? In-Reply-To: References: <1D9E943F-2EDC-49AB-83DE-78DED5A8AC23@illinois.edu> Message-ID: <29EA2398-F60B-48F2-AFE7-39A44011C451@illinois.edu> On Nov 9, 2009, at 5:44 PM, Alex Lancaster wrote: >>>>>> Chris Fields writes: > >> Alex, Not sure why ExtUtils::Manifest can't be bundled as a separate >> perl package alone. It is part of perl core but it's also available >> on CPAN separately from perl itself: > >> http://search.cpan.org/~rkobes/ExtUtils-Manifest-1.57/lib/ExtUtils/Manifest.pm > > Hi Chris, > > Yes, in principle it would be possible to have this split out as a > separate package (currently it's a "subpackage" under the main perl > package), unfortunately that's just not the way it's currently done in > Fedora (probably because it's part of the core set and they like to > update all relevant packages in one step) and I have little control > over > that. > > As I suspected, the perl maintainer is not at all enthusiastic for > updating the whole of perl just for that package (except for rawhide > which would mean that bioperl 1.6.1 would not be available until F-13, > about 6 months from now). See: > > http://bugzilla.redhat.com/show_bug.cgi?id=533562#c1 > > Obviously I am not happy with this situation either, because it will > freeze bioperl on Fedora at 1.6.0 for about 6 months, so can you > recommend any temporary workarounds in the meantime? Well, if you don't absolutely require the MANIFEST for the final package you can forego the requirement. The file in question that triggered the requirement is a data file used only for testing: t/data/test 2.txt >> This is the commit message for that BTW. This allows spaces in file >> names for the MANIFEST. v1.52 is a bug fix and is required. > >> http://code.open-bio.org/svnweb/index.cgi/bioperl/revision/?rev=15673 > > Perhaps I could create a patch that renamed files with spaces in > them to > ones with no spaces and then rename them again upon installation. > > Can you point me to which files are the problematic ones that > triggered > the dependency for 1.52? Perhaps I can figure a workaround. > > Meanwhile I will press the maintainer of perl in Fedora to perhaps > reconsider his position (e.g. if another update for perl is going out > for another reason, like a security update, perhaps he could roll in > the > 1.52 update at the same time). > > Cheers, > Alex I would point out that this is a fairly significant bug fix for ExtUtils::Manifest. A newer point release of perl is now available (5.10.1) that contains the fix and has a fix for a performance regression that popped up in 5.10.0. chris From jay at jays.net Mon Nov 9 19:05:51 2009 From: jay at jays.net (Jay Hannah) Date: Mon, 9 Nov 2009 18:05:51 -0600 Subject: [Bioperl-l] Bio::Index::GenBank - by organism? Message-ID: <3B01A09C-198E-4691-B807-7ED3250BB81A@jays.net> Many thanks to Ewan Birney et. al. for Bio::Index::* I can throw away my awful grep based index-by-accession stuff. :) Any chance someone has also written an organism based index mechanism? Something like... while (my $seq = $inx?>get_Seq_by_organism('*Xanthomonas*')) { print $seq->display_id . "\n"; } Thanks, j From cjfields at illinois.edu Mon Nov 9 22:55:01 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 9 Nov 2009 21:55:01 -0600 Subject: [Bioperl-l] Bio::Index::GenBank - by organism? In-Reply-To: <3B01A09C-198E-4691-B807-7ED3250BB81A@jays.net> References: <3B01A09C-198E-4691-B807-7ED3250BB81A@jays.net> Message-ID: <12DFD22E-42DC-4626-9873-0DE3EBB5CFBD@illinois.edu> On Nov 9, 2009, at 6:05 PM, Jay Hannah wrote: > Many thanks to Ewan Birney et. al. for Bio::Index::* > > I can throw away my awful grep based index-by-accession stuff. :) > > Any chance someone has also written an organism based index > mechanism? Something like... > > while (my $seq = $inx?>get_Seq_by_organism('*Xanthomonas*')) { > print $seq->display_id . "\n"; > } > > Thanks, > > j It should work via id_parser(); from Bio::Index::GenBank: $inx->id_parser(\&get_id); # make the index $inx->make_index($file_name); # here is where the retrieval key is specified sub get_id { my $line = shift; $line =~ /clone="(\S+)"/; $1; } Change the code ref deal with the line you want and parse the name out. Caveat: this may not be absolutely perfect (it only passes in a line at a time, and some species lines will wrap). Also not sure how this would work in cases where multiple sequences from the same species are present. The other option is to preparse everything and tie a hash to store a species->UID map, then use that along with your Bio::Index index to grab what you need. chris From cjfields at illinois.edu Mon Nov 9 23:58:32 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 9 Nov 2009 22:58:32 -0600 Subject: [Bioperl-l] how to get the protein sequences from DNA sequences around novel SNPs? In-Reply-To: References: <794eafc20911091008g1f98b944ncbd66ac4962a85a3@mail.gmail.com> Message-ID: <435BA1A8-2CCB-4D7A-8909-84F8135C439F@illinois.edu> On Nov 9, 2009, at 3:15 PM, Robert Bradbury wrote: > On Mon, Nov 9, 2009 at 1:08 PM, Guangchun Song > wrote: >> >> I'm new bioperl user. I' working on a project: To determine the >> status of all tutative SNPs such as non-synonymous vs. synonymous, >> and >> predict the tranlational effect of non-synonymous mutations as benign >> or malicious. I'm trying to use bioperl to get the DNA sequence and >> translate to protein sequence for the SNPs that are in gene's coding >> region. Could someone tell me how to do it? >> >> > I too would like to know if this information is available. I've > recently > been working with the dbSNP results from NCBI but they display the > results > in a graphical format rather than data that one can play with and ask > questions of like "What is the most disease causing gene in the Human > Genome?" or "What are the critical proteins damaged by gene defects > in the > Human Genome?" ... "In terms of premature deaths, extended health care > requirements, loss of quality of life, etc.?" > > The same types of questions can be applied to the dog and cat > genomes where > there is emotional value or the cow, horse, pig, etc. genomes where > there is > economic value? > > The value of BioPerl would increase significantly if there were > functionality that would allow easy access to "these mutations may > have > negative/positive impact" (which means you need a function that > qualifies > mutations by degree) and allow for impact to be subjectively > determined > (implying there must be some callback function to provide a user > quality/impact rating). > > For example: > $/@differences = protein_compare($mygene, $refseq_gene, > @critical_aa, > @critical_domain, $callback) > Where $callback could "rate" differences about the protein and > position and > the "type of interest" (e.g. metal binding amino acids, structural > changing > amino acids, critical catalysis amino acids, etc.). > > A default callback would be based on some evolving definition of > "critical" > changes which result in human disease for example. > > This is a "required" capability to be able to determine things like > the > "adaptability" of a species -- those with fewest critical mutation > points > may have better adaptability to mutation increasing circumstances. > > Please pardon any errors in perl syntax/usage its been a while since > I've > written perl and I'd really rather be coding in C. > > Robert I will say that most of the information from the SNP database is available in various formats (see following link under 'Retrieval Types'): http://www.ncbi.nlm.nih.gov/corehtml/query/static/efetchseq_help.html You can access this information, as well as the full XML, using something like the following script. chris ------------------------------------------------ #!/usr/bin/perl -w use 5.010; use strict; use warnings; use Bio::DB::EUtilities; my $term = shift; my $eutil = Bio::DB::EUtilities->new(-eutil => 'esearch', -db => 'snp', -term => $term, -usehistory => 'y', -retmax => 100); my $hist = $eutil->next_History || die "No history returned"; # for SNP XML, change retmode to 'xml' $eutil->set_parameters(-eutil => 'efetch', -history => $hist, -retmode => 'text', -rettype => 'flt'); # dumps to STDOUT say $eutil->get_Response->content; From jluis.lavin at unavarra.es Tue Nov 10 05:43:40 2009 From: jluis.lavin at unavarra.es (jluis.lavin at unavarra.es) Date: Tue, 10 Nov 2009 11:43:40 +0100 (CET) Subject: [Bioperl-l] [Fwd: Re: A question about iBio::Index: and itscorrect use] In-Reply-To: References: <2642.130.206.164.153.1257435996.squirrel@webmail.unavarra.es><1 984ED07F36C446284B25F617964B6C6@NewLife><2969.130.206.164.153.1257438117.squirrel@webmail.unavarra.es> Message-ID: <3471.130.206.164.153.1257849820.squirrel@webmail.unavarra.es> Hello again, I tried what Mark told me modifying the code line he told me but there?s still a problem that I believe must be due to the sequences name. My secuences header on the Fasta file have this format: >PleosPC9_1_103820|fgenesh1_pg.3_#_1 Th part on the right of the pipe changes depending on the program used to create the gene model, for example: >PleosPC9_1_103820|fgenesh1_pg.3_#_1 >PleosPC9_1_123413|genemark.2731_g >PleosPC9_1_52065|e_gw1.3.64.1 So I guess I need to parse my ids somehow for thr program to detect only the first part of the fasta header (the "protein name") and not to get messed with the other side of the pipe... This is the corrected code I wrote following Mark?s indications, but I still don?t have any idea about the parsing issue... #!/c:/Perl -w use Bio::Index::Fasta; use strict; #PC9.fasta is my genomic file my $Index_File_Name ="PC9.fasta"; my $inx = Bio::Index::Fasta->new('PC9.fasta.idx'); #LCS.txt is my sequences list @ARGV = ; foreach my $id (@ARGV) { if ($id eq ''){ die ("empty list") } else { my $seqobj = $inx->fetch($id); my $out = new Bio::SeqIO (-file => ">>index_extracted.fasta", -format => 'fasta'); $out->write_seq($seqobj); } } exit; } Thanks in advance PD. May it be a faster way of extracting those sequences using plain PERL? El Jue, 5 de Noviembre de 2009, 17:39, Mark A. Jensen escribi?: > Yes, these are files created by the SDBM, Perl's internal db manager. You > should > be able to > open the index by simply > $inx = Bio::Index::Fasta->new('PC9.fasta.idx'); > and the dbm will know what to do-- > cheers MAJ > ----- Original Message ----- > From: > To: "Mark A. Jensen" > Cc: ; > Sent: Thursday, November 05, 2009 11:21 AM > Subject: Re: [Bioperl-l] [Fwd: Re: A question about iBio::Index: and its > correct > use] > > >> Thank you very much Mark, that?s a good point :$ >> I guess your correction is referred to the second script, isn?t it? >> >> If it is so, there is still a problem with the first script, it doesn?t >> create the PC9.fasta.idx file, instead it creates two files named: >> -PC9.fasta.idx.pag >> -PC9.fasta.idx.dir >> >> which seem to be clearly related with some kind of indexing >> process...but, >> unless the PC9.fasta.idx file is only virtual or remains hidden, I can?t >> find it anywhere... >> Forgive me if I?m talking nosense... >> >> Thank you very much again for your help ;) >> >> >> El Jue, 5 de Noviembre de 2009, 17:02, Mark A. Jensen escribi?: >>> Hey Jos?, >>> The first thing that jumps out it the index file name. Looks >>> like you create it as >>> PC9.fasta.idx >>> But you read it as >>> PC9.fasta >>> Not an unusual mistake. Do >>> my $inx = Bio::Index::Fasta->new('PC9.fasta.idx'); >>> and see if it works. >>> MAJ >>> ----- Original Message ----- >>> From: >>> To: >>> Sent: Thursday, November 05, 2009 10:46 AM >>> Subject: [Bioperl-l] [Fwd: Re: A question about iBio::Index: and its >>> correct >>> use] >>> >>> >>> >>> >>> ---------------------------- Mensaje original >>> ---------------------------- >>> Subject: Re: [Bioperl-l] A question about iBio::Index: and its correct >>> use >>> From: jluis.lavin at unavarra.es >>> Fecha: Jue, 5 de Noviembre de 2009, 16:46 >>> To: "Mark A. Jensen" >>> -------------------------------------------------------------------------- >>> >>> Hi Mark, >>> >>> I?ve actually got two scripts, the first one is to create the index and >>> the second one is to retrieve the sequence lis from the indexed file. >>> >>> 1)Here is the Index creation script: >>> >>> #!/c:/Perl -w >>> use strict; >>> use Bio::Index::Fasta; >>> use strict; >>> >>> print "Enter file for indexing: \n"; >>> my $Index_File_Name = ; >>> my $inx = Bio::Index::Fasta->new(-filename => $Index_File_Name.".idx", >>> -write_flag => 1); >>> $inx->make_index(my $File_Name); >>> >>> 2)And here is the sequence retrieval script: >>> >>> #!/c:/Perl -w >>> use Bio::Index::Fasta; >>> use strict; >>> #PC9.fasta is my genomic file >>> my $Index_File_Name ="PC9.fasta"; >>> my $inx = Bio::Index::Fasta->new($Index_File_Name); >>> #LCS.txt is my sequences list >>> @ARGV = ; >>> foreach my $id (@ARGV) { >>> if ($id eq ''){ >>> die ("empty list") >>> } >>> else { >>> my $seqobj = $inx->fetch($id); >>> my $out = new Bio::SeqIO (-file => ">>extracted_seqs.fasta", >>> -format => 'fasta'); >>> $out->write_seq($seqobj); >>> } >>> } >>> exit; >>> } >>> >>> I hope this code is not a total scum... >>> >>> Thanks in advance ;) >>> >>> >>> >>> El Jue, 5 de Noviembre de 2009, 16:39, Mark A. Jensen escribi?: >>>> Jos? -- It looks like this is a good solution to your problem. Please >>>> send >>>> you >>>> script so we can look at it- >>>> cheers Mark >>>> ----- Original Message ----- >>>> From: >>>> To: >>>> Sent: Thursday, November 05, 2009 10:28 AM >>>> Subject: [Bioperl-l] A question about iBio::Index: and its correct use >>>> >>>> >>>> >>>> Hello to all, >>>> >>>> I?m trying to write a script to retrieve a list of sequences from a >>>> local >>>> FASTA file (for example a fasta archive where all the protein models >>>> of >>>> an >>>> organism are stored). This file would be used by me as some kind >>>> "local >>>> database" (sorry if I mistake a few concepts...) >>>> I?ve been reading the BioPerl HOWTOs and I came across the >>>> Bio::Index::Fasta tool. >>>> If I didn?t misunderstood what I read (which can be easy because my >>>> low >>>> level on programming) this Indexing tool should do the job. >>>> I wrote a couple of scripts based on the documentation i read about >>>> this >>>> tool, but I don?t seem to be able to create the index file to be used >>>> later (to retrieve the sequences from). >>>> -First of all, I want to ask the people in this forum if the >>>> Bio::Index::Fasta is the right one to chose for this tasks. >>>> -Then I?ll beg you to take a look at my scripts, because I don?t seem >>>> to >>>> catch the bug... >>>> >>>> Best wishes to you all and thanks in advance ;) >>>> >>>> -- >>>> Jos? Luis Lav?n Trueba, PhD >>>> >>>> Dpto. de Producci?n Agraria >>>> Grupo de Gen?tica y Microbiolog?a >>>> Universidad P?blica de Navarra >>>> 31006 Pamplona >>>> Navarra >>>> SPAIN >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> >>> >>> >>> -- >>> Dr. Jos? Luis Lav?n Trueba >>> >>> Dpto. de Producci?n Agraria >>> Grupo de Gen?tica y Microbiolog?a >>> Universidad P?blica de Navarra >>> 31006 Pamplona >>> Navarra >>> SPAIN >>> >>> >>> >>> -- >>> Dr. Jos? Luis Lav?n Trueba >>> >>> Dpto. de Producci?n Agraria >>> Grupo de Gen?tica y Microbiolog?a >>> Universidad P?blica de Navarra >>> 31006 Pamplona >>> Navarra >>> SPAIN >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >> >> >> -- >> Dr. Jos? Luis Lav?n Trueba >> >> Dpto. de Producci?n Agraria >> Grupo de Gen?tica y Microbiolog?a >> Universidad P?blica de Navarra >> 31006 Pamplona >> Navarra >> SPAIN >> >> >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Dr. Jos? Luis Lav?n Trueba Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN From saikari78 at gmail.com Tue Nov 10 06:41:11 2009 From: saikari78 at gmail.com (saikari keitele) Date: Tue, 10 Nov 2009 11:41:11 +0000 Subject: [Bioperl-l] [bioperl newbie] Retrieving link to protein from PubChem In-Reply-To: References: <1ECC543A-F923-4D5E-A0C1-5BBD35ECAAE8@illinois.edu> Message-ID: Thanks again very much for your help and the script. i've been trying it, however I fail to find any protein record linked to a record in the pcsubstance database. Do you think that its is because no links have been defined between the 2 databases, or that I am just unlucky and that no link exists for the particular records I'm testing? Thanks again saikari On Mon, Nov 9, 2009 at 4:41 PM, saikari keitele wrote: > Fabulous!. Huge help. > saikari > > On Mon, Nov 9, 2009 at 4:27 PM, Chris Fields wrote: > >> On Nov 9, 2009, at 10:05 AM, saikari keitele wrote: >> >> Hi, >>> >>> I'm using Bioperl to retrieve records from PubChem. >>> I'm trying to find a way-but have been unsuccessful- to retrieve from a >>> compound record, the reference to the protein(s) that can synthesize the >>> compound. >>> Thanks very much. >>> >>> saikari >>> >> >> The below bioperl script returns the GI for proteins that correspond to >> the substance passed on the command line; invoke using 'perl >> pc_substance.pl substance_requested'. It probably needs more fiddling to >> catch everything but it should get you started. >> >> For other bits and pieces (such as how to retrieve the raw sequence >> files), please see the EUtilities HOWTO: >> >> http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook >> >> chris >> >> ---------------------------------------- >> >> #!/usr/bin/perl -w >> >> use 5.010; >> use strict; >> use warnings; >> use Bio::DB::EUtilities; >> >> my $substance = shift; >> >> my $eutil = Bio::DB::EUtilities->new(-eutil => 'esearch', >> -db => 'pcsubstance', >> -term => $substance, >> -usehistory => 'y'); >> >> my $hist = $eutil->next_History || die; >> >> $eutil->reset_parameters(-eutil => 'elink', >> -history => $hist, >> -db => 'protein', >> -dbfrom => 'pcsubstance', >> -retmax => 1000); >> >> say join(',',$eutil->get_ids); >> > > From heyne at informatik.uni-freiburg.de Tue Nov 10 07:55:06 2009 From: heyne at informatik.uni-freiburg.de (Steffen Heyne) Date: Tue, 10 Nov 2009 13:55:06 +0100 Subject: [Bioperl-l] problem with alignments and sequence locations Message-ID: <4AF962AA.7060908@informatik.uni-freiburg.de> Hi, I'm using Bioperl for my research and it is very useful! Thank you! Currently I have a problem with locations tags of sequences. I read in seed alignments of Rfam (in stockholm format, but I think it is similar to other formats). If the location is like: AB194432.1/908-846 the start/end values are changed to $seq->start = 846 $seq->end = 908 and therefore the new location (e.g.$seq->get_nse) is: AB194432.1/846-908 The $seq->strand tag is correctly set to -1 in this case, but if the alignment is written out again (clustal, stockholm,...) this strand info is lost and the sequences have this "wrong" location. But this information is important in respect to the sequence accession number. Is there a way to set the location back to the original one or is this behavior desired? Any manually setting with $seq->start($val) failed due to automatic checking. I'm using bioperl 1.6.1 Thanks! steffen -- --- Steffen Heyne, Dipl.-Bioinf. Lehrstuhl f?r Bioinformatik Institut f?r Informatik Albert-Ludwigs-Universit?t Freiburg Georges-K?hler-Allee 106 79110 Freiburg, Germany Tel: (+49) 761 203 8239 Fax: (+49) 761 203 7462 Mail: heyne at informatik.uni-freiburg.de From cjfields at illinois.edu Tue Nov 10 08:58:52 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 10 Nov 2009 07:58:52 -0600 Subject: [Bioperl-l] problem with alignments and sequence locations In-Reply-To: <4AF962AA.7060908@informatik.uni-freiburg.de> References: <4AF962AA.7060908@informatik.uni-freiburg.de> Message-ID: On Nov 10, 2009, at 6:55 AM, Steffen Heyne wrote: > Hi, > > I'm using Bioperl for my research and it is very useful! Thank you! > > Currently I have a problem with locations tags of sequences. I read > in seed alignments of Rfam (in stockholm format, but I think it is > similar to other formats). > > If the location is like: > > AB194432.1/908-846 > > the start/end values are changed to > > $seq->start = 846 > $seq->end = 908 > > and therefore the new location (e.g.$seq->get_nse) is: > > AB194432.1/846-908 > > The $seq->strand tag is correctly set to -1 in this case, but if the > alignment is written out again (clustal, stockholm,...) this strand > info is lost and the sequences have this "wrong" location. But this > information is important in respect to the sequence accession number. > > Is there a way to set the location back to the original one or is > this behavior desired? Any manually setting with $seq->start($val) > failed due to automatic checking. > > I'm using bioperl 1.6.1 > > Thanks! > > steffen This is a definite bug. We recently discussed amending the NSE format due to this (the subject came up over the last few months or so); it's fallen through the cracks. Fortunaely it is very easy to fix (the relevant method is in LocatableSeq). Does anyone have a problem with me adding this in? It will change output for only those instances where the strand is -1, so AB194432.1/908-846 would be start = 846, end = 908, strand = -1 AB194432.1/846-908 would be start = 846, end = 908, strand = 1 chris From cjfields at illinois.edu Tue Nov 10 09:05:51 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 10 Nov 2009 08:05:51 -0600 Subject: [Bioperl-l] [bioperl newbie] Retrieving link to protein from PubChem In-Reply-To: References: <1ECC543A-F923-4D5E-A0C1-5BBD35ECAAE8@illinois.edu> Message-ID: <738F6320-B87A-4541-B9FA-20273ABA96B9@illinois.edu> On Nov 10, 2009, at 5:41 AM, saikari keitele wrote: > Thanks again very much for your help and the script. > i've been trying it, however I fail to find any protein record > linked to a > record in the pcsubstance database. > Do you think that its is because no links have been defined between > the 2 > databases, or that I am just unlucky and that no link exists for the > particular records I'm testing? > Thanks again > > saikari It's probably that no links have been defined. I have found similar problems in the past with pubchem, in that not all substances have proteins associated with them. Most proteins linked to are those with a deposited structure. There are a few other databases to check out; KEGG, the BioCyc dbs (like EcoCyc), come to mind. I don't think we have a generic remote query engine set up for any of those unfortunately (unless there is one I'm unaware of), but I know BioCyc comes with it's own set of tools (including perl- and java-based query tools) and can be set up locally, which is likely much faster and more in lines with what you need. chris ... From vebaev at gmail.com Tue Nov 10 12:38:54 2009 From: vebaev at gmail.com (Vesselin Baev) Date: Tue, 10 Nov 2009 09:38:54 -0800 (PST) Subject: [Bioperl-l] Invitation to connect on LinkedIn Message-ID: <1983273212.597925.1257874734811.JavaMail.app@ech3-cdn07.prod> LinkedIn ------------ Vesselin Baev requested to add you as a connection on LinkedIn: ------------------------------------------ Bolotin,, I'd like to add you to my professional network on LinkedIn. - Vesselin Accept invitation from Vesselin Baev http://www.linkedin.com/e/uez6TYkHzbaXxXM-lUk23auFwJZodcPlXc2UWC0Ao8h/blk/I1572789477_2/pmpxnSRJrSdvj4R5fnhv9ClRsDgZp6lQs6lzoQ5AomZIpn8_cBYTdPgVe3sOdPkNiiZFlAN1oPlOp2YMdPsTcz8OdjwLrCBxbOYWrSlI/EML_comm_afe/ View invitation from Vesselin Baev http://www.linkedin.com/e/uez6TYkHzbaXxXM-lUk23auFwJZodcPlXc2UWC0Ao8h/blk/I1572789477_2/39vdPsQejwTczsRckALqnpPbOYWrSlI/svi/ ------------------------------------------ DID YOU KNOW your LinkedIn profile helps you control your public image when people search for you? Setting your profile as public means your LinkedIn profile will come up when people enter your name in leading search engines. Take control of your image! http://www.linkedin.com/e/ewp/inv-22/ ------ (c) 2009, LinkedIn Corporation From jason at bioperl.org Tue Nov 10 13:47:02 2009 From: jason at bioperl.org (Jason Stajich) Date: Tue, 10 Nov 2009 10:47:02 -0800 Subject: [Bioperl-l] [Fwd: Re: A question about iBio::Index: and itscorrect use] In-Reply-To: <3471.130.206.164.153.1257849820.squirrel@webmail.unavarra.es> References: <2642.130.206.164.153.1257435996.squirrel@webmail.unavarra.es><1 984ED07F36C446284B25F617964B6C6@NewLife><2969.130.206.164.153.1257438117.squirrel@webmail.unavarra.es> <3471.130.206.164.153.1257849820.squirrel@webmail.unavarra.es> Message-ID: Page 44 has the custom ID info or look at documentation for Bio::DB::Fasta - there is a similar syntax for Bio::Index::Fasta if you read the perldoc for the module. http://jason.open-bio.org/Bioperl_Tutorials/ProgrammingBiology2008/ProgBiology_BioPerl_I.pdf Don't re-opening SeqIO each time just do it once at the beginning outside of the loop and then call write_seq within the loop. This is one nuance of doing OO programming vs procedural is that there is some outside state information that can persist in an object, but conceptually, you want to open a filehandle once and just keep writing to it. -jason On Nov 10, 2009, at 2:43 AM, jluis.lavin at unavarra.es wrote: > Hello again, > > I tried what Mark told me modifying the code line he told me but > there?s > still a problem that I believe must be due to the sequences name. > My secuences header on the Fasta file have this format: > >> PleosPC9_1_103820|fgenesh1_pg.3_#_1 > > Th part on the right of the pipe changes depending on the program > used to > create the gene model, for example: > >> PleosPC9_1_103820|fgenesh1_pg.3_#_1 >> PleosPC9_1_123413|genemark.2731_g >> PleosPC9_1_52065|e_gw1.3.64.1 > > So I guess I need to parse my ids somehow for thr program to detect > only > the first part of the fasta header (the "protein name") and not to get > messed with the other side of the pipe... > > This is the corrected code I wrote following Mark?s indications, but I > still don?t have any idea about the parsing issue... > > #!/c:/Perl -w > use Bio::Index::Fasta; > use strict; > #PC9.fasta is my genomic file > my $Index_File_Name ="PC9.fasta"; > my $inx = Bio::Index::Fasta->new('PC9.fasta.idx'); > #LCS.txt is my sequences list > @ARGV = ; > foreach my $id (@ARGV) { > if ($id eq ''){ > die ("empty list") > } > else { > my $seqobj = $inx->fetch($id); > my $out = new Bio::SeqIO (-file => ">>index_extracted.fasta", > -format => 'fasta'); > $out->write_seq($seqobj); > } > } > exit; > } > > Thanks in advance > > PD. May it be a faster way of extracting those sequences using plain > PERL? > > > > > El Jue, 5 de Noviembre de 2009, 17:39, Mark A. Jensen escribi?: >> Yes, these are files created by the SDBM, Perl's internal db >> manager. You >> should >> be able to >> open the index by simply >> $inx = Bio::Index::Fasta->new('PC9.fasta.idx'); >> and the dbm will know what to do-- >> cheers MAJ >> ----- Original Message ----- >> From: >> To: "Mark A. Jensen" >> Cc: ; >> Sent: Thursday, November 05, 2009 11:21 AM >> Subject: Re: [Bioperl-l] [Fwd: Re: A question about iBio::Index: >> and its >> correct >> use] >> >> >>> Thank you very much Mark, that?s a good point :$ >>> I guess your correction is referred to the second script, isn?t it? >>> >>> If it is so, there is still a problem with the first script, it >>> doesn?t >>> create the PC9.fasta.idx file, instead it creates two files named: >>> -PC9.fasta.idx.pag >>> -PC9.fasta.idx.dir >>> >>> which seem to be clearly related with some kind of indexing >>> process...but, >>> unless the PC9.fasta.idx file is only virtual or remains hidden, I >>> can?t >>> find it anywhere... >>> Forgive me if I?m talking nosense... >>> >>> Thank you very much again for your help ;) >>> >>> >>> El Jue, 5 de Noviembre de 2009, 17:02, Mark A. Jensen escribi?: >>>> Hey Jos?, >>>> The first thing that jumps out it the index file name. Looks >>>> like you create it as >>>> PC9.fasta.idx >>>> But you read it as >>>> PC9.fasta >>>> Not an unusual mistake. Do >>>> my $inx = Bio::Index::Fasta->new('PC9.fasta.idx'); >>>> and see if it works. >>>> MAJ >>>> ----- Original Message ----- >>>> From: >>>> To: >>>> Sent: Thursday, November 05, 2009 10:46 AM >>>> Subject: [Bioperl-l] [Fwd: Re: A question about iBio::Index: and >>>> its >>>> correct >>>> use] >>>> >>>> >>>> >>>> >>>> ---------------------------- Mensaje original >>>> ---------------------------- >>>> Subject: Re: [Bioperl-l] A question about iBio::Index: and its >>>> correct >>>> use >>>> From: jluis.lavin at unavarra.es >>>> Fecha: Jue, 5 de Noviembre de 2009, 16:46 >>>> To: "Mark A. Jensen" >>>> -------------------------------------------------------------------------- >>>> >>>> Hi Mark, >>>> >>>> I?ve actually got two scripts, the first one is to create the >>>> index and >>>> the second one is to retrieve the sequence lis from the indexed >>>> file. >>>> >>>> 1)Here is the Index creation script: >>>> >>>> #!/c:/Perl -w >>>> use strict; >>>> use Bio::Index::Fasta; >>>> use strict; >>>> >>>> print "Enter file for indexing: \n"; >>>> my $Index_File_Name = ; >>>> my $inx = Bio::Index::Fasta->new(-filename => >>>> $Index_File_Name.".idx", >>>> -write_flag => 1); >>>> $inx->make_index(my $File_Name); >>>> >>>> 2)And here is the sequence retrieval script: >>>> >>>> #!/c:/Perl -w >>>> use Bio::Index::Fasta; >>>> use strict; >>>> #PC9.fasta is my genomic file >>>> my $Index_File_Name ="PC9.fasta"; >>>> my $inx = Bio::Index::Fasta->new($Index_File_Name); >>>> #LCS.txt is my sequences list >>>> @ARGV = ; >>>> foreach my $id (@ARGV) { >>>> if ($id eq ''){ >>>> die ("empty list") >>>> } >>>> else { >>>> my $seqobj = $inx->fetch($id); >>>> my $out = new Bio::SeqIO (-file => ">>extracted_seqs.fasta", >>>> -format => 'fasta'); >>>> $out->write_seq($seqobj); >>>> } >>>> } >>>> exit; >>>> } >>>> >>>> I hope this code is not a total scum... >>>> >>>> Thanks in advance ;) >>>> >>>> >>>> >>>> El Jue, 5 de Noviembre de 2009, 16:39, Mark A. Jensen escribi?: >>>>> Jos? -- It looks like this is a good solution to your problem. >>>>> Please >>>>> send >>>>> you >>>>> script so we can look at it- >>>>> cheers Mark >>>>> ----- Original Message ----- >>>>> From: >>>>> To: >>>>> Sent: Thursday, November 05, 2009 10:28 AM >>>>> Subject: [Bioperl-l] A question about iBio::Index: and its >>>>> correct use >>>>> >>>>> >>>>> >>>>> Hello to all, >>>>> >>>>> I?m trying to write a script to retrieve a list of sequences >>>>> from a >>>>> local >>>>> FASTA file (for example a fasta archive where all the protein >>>>> models >>>>> of >>>>> an >>>>> organism are stored). This file would be used by me as some kind >>>>> "local >>>>> database" (sorry if I mistake a few concepts...) >>>>> I?ve been reading the BioPerl HOWTOs and I came across the >>>>> Bio::Index::Fasta tool. >>>>> If I didn?t misunderstood what I read (which can be easy because >>>>> my >>>>> low >>>>> level on programming) this Indexing tool should do the job. >>>>> I wrote a couple of scripts based on the documentation i read >>>>> about >>>>> this >>>>> tool, but I don?t seem to be able to create the index file to be >>>>> used >>>>> later (to retrieve the sequences from). >>>>> -First of all, I want to ask the people in this forum if the >>>>> Bio::Index::Fasta is the right one to chose for this tasks. >>>>> -Then I?ll beg you to take a look at my scripts, because I don?t >>>>> seem >>>>> to >>>>> catch the bug... >>>>> >>>>> Best wishes to you all and thanks in advance ;) >>>>> >>>>> -- >>>>> Jos? Luis Lav?n Trueba, PhD >>>>> >>>>> Dpto. de Producci?n Agraria >>>>> Grupo de Gen?tica y Microbiolog?a >>>>> Universidad P?blica de Navarra >>>>> 31006 Pamplona >>>>> Navarra >>>>> SPAIN >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Dr. Jos? Luis Lav?n Trueba >>>> >>>> Dpto. de Producci?n Agraria >>>> Grupo de Gen?tica y Microbiolog?a >>>> Universidad P?blica de Navarra >>>> 31006 Pamplona >>>> Navarra >>>> SPAIN >>>> >>>> >>>> >>>> -- >>>> Dr. Jos? Luis Lav?n Trueba >>>> >>>> Dpto. de Producci?n Agraria >>>> Grupo de Gen?tica y Microbiolog?a >>>> Universidad P?blica de Navarra >>>> 31006 Pamplona >>>> Navarra >>>> SPAIN >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> >>> >>> >>> -- >>> Dr. Jos? Luis Lav?n Trueba >>> >>> Dpto. de Producci?n Agraria >>> Grupo de Gen?tica y Microbiolog?a >>> Universidad P?blica de Navarra >>> 31006 Pamplona >>> Navarra >>> SPAIN >>> >>> >>> >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > -- > Dr. Jos? Luis Lav?n Trueba > > Dpto. de Producci?n Agraria > Grupo de Gen?tica y Microbiolog?a > Universidad P?blica de Navarra > 31006 Pamplona > Navarra > SPAIN > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From jason at bioperl.org Tue Nov 10 13:50:00 2009 From: jason at bioperl.org (Jason Stajich) Date: Tue, 10 Nov 2009 10:50:00 -0800 Subject: [Bioperl-l] Bio::Index::GenBank - by organism? In-Reply-To: <12DFD22E-42DC-4626-9873-0DE3EBB5CFBD@illinois.edu> References: <3B01A09C-198E-4691-B807-7ED3250BB81A@jays.net> <12DFD22E-42DC-4626-9873-0DE3EBB5CFBD@illinois.edu> Message-ID: <2BA451B1-6E18-483E-B655-74D1146772CC@bioperl.org> You might also look at what mygenbank does: http://homepage.mac.com/iankorf/mygenbank.html On Nov 9, 2009, at 7:55 PM, Chris Fields wrote: > On Nov 9, 2009, at 6:05 PM, Jay Hannah wrote: > >> Many thanks to Ewan Birney et. al. for Bio::Index::* >> >> I can throw away my awful grep based index-by-accession stuff. :) >> >> Any chance someone has also written an organism based index >> mechanism? Something like... >> >> while (my $seq = $inx?>get_Seq_by_organism('*Xanthomonas*')) { >> print $seq->display_id . "\n"; >> } >> >> Thanks, >> >> j > > It should work via id_parser(); from Bio::Index::GenBank: > > $inx->id_parser(\&get_id); > # make the index > $inx->make_index($file_name); > > # here is where the retrieval key is specified > sub get_id { > my $line = shift; > $line =~ /clone="(\S+)"/; > $1; > } > > Change the code ref deal with the line you want and parse the name > out. Caveat: this may not be absolutely perfect (it only passes in > a line at a time, and some species lines will wrap). Also not sure > how this would work in cases where multiple sequences from the same > species are present. > > The other option is to preparse everything and tie a hash to store a > species->UID map, then use that along with your Bio::Index index to > grab what you need. > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From jluis.lavin at unavarra.es Wed Nov 11 10:01:18 2009 From: jluis.lavin at unavarra.es (jluis.lavin at unavarra.es) Date: Wed, 11 Nov 2009 16:01:18 +0100 (CET) Subject: [Bioperl-l] [Fwd: Re: A question about iBio::Index: anditscorrect use] In-Reply-To: References: <2642.130.206.164.153.1257435996.squirrel@webmail.unavarra.es><1 984ED07F36C446284B25F617964B6C6@NewLife><2969.130.206.164.153.1257438117.sq uirrel@webmail.unavarra.es><3471. 130.206.164.153.1257849820.squirrel@webmail.unavarra.es> Message-ID: <2979.130.206.164.153.1257951678.squirrel@webmail.unavarra.es> Hi once again, I have modified the script following the instructions Jason gave me (at last what I understood, remember it is my first time trying to learn a programming language...and I?m not the smartest guy in the class, hehe)but it seems I didn?t fix the problem... Here?s the new code I wrote: #!/c:/Perl -w use strict; use Bio::Index::Fasta; use Bio::DB::Fasta; use Bio::SeqIO; use IO::File; # assign files to scalars my $index_file = 'PC91.fasta'; my $id_list = 'LCS2.txt'; # open index file my $db = Bio::DB::Fasta->new($index_file) or die; # open the id list my $in = IO::File->new($id_list) or die; # open FASTA to write my $out = new Bio::SeqIO (-file => ">>index_extracted.fasta", -format => 'fasta'); # retrieve ids loop foreach my $id ($in) { if ($id eq ''){ die ("empty list") } else { my $seqobj = my $inx->fetch($id); $out->write_seq($seqobj); } } # parse fasta headers sub my_makeid { my $id = shift; if ( $id =~ /^>[^:]+:(\S+)/ ) { return $1; } elsif ($id =~ /^>(\S+)/) { return $1; } else { warn("cannot parse ID for $id\n"); } } exit; Would anyone, please take a look at it ... Thanks in advance ;) El Mar, 10 de Noviembre de 2009, 19:47, Jason Stajich escribi?: > Page 44 has the custom ID info or look at documentation for > Bio::DB::Fasta - there is a similar syntax for Bio::Index::Fasta if > you read the perldoc for the module. > > http://jason.open-bio.org/Bioperl_Tutorials/ProgrammingBiology2008/ProgBiology_BioPerl_I.pdf > > Don't re-opening SeqIO each time just do it once at the beginning > outside of the loop and then call write_seq within the loop. > > This is one nuance of doing OO programming vs procedural is that there > is some outside state information that can persist in an object, but > conceptually, you want to open a filehandle once and just keep writing > to it. > > -jason > On Nov 10, 2009, at 2:43 AM, jluis.lavin at unavarra.es wrote: > >> Hello again, >> >> I tried what Mark told me modifying the code line he told me but >> there?s >> still a problem that I believe must be due to the sequences name. >> My secuences header on the Fasta file have this format: >> >>> PleosPC9_1_103820|fgenesh1_pg.3_#_1 >> >> Th part on the right of the pipe changes depending on the program >> used to >> create the gene model, for example: >> >>> PleosPC9_1_103820|fgenesh1_pg.3_#_1 >>> PleosPC9_1_123413|genemark.2731_g >>> PleosPC9_1_52065|e_gw1.3.64.1 >> >> So I guess I need to parse my ids somehow for thr program to detect >> only >> the first part of the fasta header (the "protein name") and not to get >> messed with the other side of the pipe... >> >> This is the corrected code I wrote following Mark?s indications, but I >> still don?t have any idea about the parsing issue... >> >> #!/c:/Perl -w >> use Bio::Index::Fasta; >> use strict; >> #PC9.fasta is my genomic file >> my $Index_File_Name ="PC9.fasta"; >> my $inx = Bio::Index::Fasta->new('PC9.fasta.idx'); >> #LCS.txt is my sequences list >> @ARGV = ; >> foreach my $id (@ARGV) { >> if ($id eq ''){ >> die ("empty list") >> } >> else { >> my $seqobj = $inx->fetch($id); >> my $out = new Bio::SeqIO (-file => ">>index_extracted.fasta", >> -format => 'fasta'); >> $out->write_seq($seqobj); >> } >> } >> exit; >> } >> >> Thanks in advance >> >> PD. May it be a faster way of extracting those sequences using plain >> PERL? >> >> >> >> >> El Jue, 5 de Noviembre de 2009, 17:39, Mark A. Jensen escribi?: >>> Yes, these are files created by the SDBM, Perl's internal db >>> manager. You >>> should >>> be able to >>> open the index by simply >>> $inx = Bio::Index::Fasta->new('PC9.fasta.idx'); >>> and the dbm will know what to do-- >>> cheers MAJ >>> ----- Original Message ----- >>> From: >>> To: "Mark A. Jensen" >>> Cc: ; >>> Sent: Thursday, November 05, 2009 11:21 AM >>> Subject: Re: [Bioperl-l] [Fwd: Re: A question about iBio::Index: >>> and its >>> correct >>> use] >>> >>> >>>> Thank you very much Mark, that?s a good point :$ >>>> I guess your correction is referred to the second script, isn?t it? >>>> >>>> If it is so, there is still a problem with the first script, it >>>> doesn?t >>>> create the PC9.fasta.idx file, instead it creates two files named: >>>> -PC9.fasta.idx.pag >>>> -PC9.fasta.idx.dir >>>> >>>> which seem to be clearly related with some kind of indexing >>>> process...but, >>>> unless the PC9.fasta.idx file is only virtual or remains hidden, I >>>> can?t >>>> find it anywhere... >>>> Forgive me if I?m talking nosense... >>>> >>>> Thank you very much again for your help ;) >>>> >>>> >>>> El Jue, 5 de Noviembre de 2009, 17:02, Mark A. Jensen escribi?: >>>>> Hey Jos?, >>>>> The first thing that jumps out it the index file name. Looks >>>>> like you create it as >>>>> PC9.fasta.idx >>>>> But you read it as >>>>> PC9.fasta >>>>> Not an unusual mistake. Do >>>>> my $inx = Bio::Index::Fasta->new('PC9.fasta.idx'); >>>>> and see if it works. >>>>> MAJ >>>>> ----- Original Message ----- >>>>> From: >>>>> To: >>>>> Sent: Thursday, November 05, 2009 10:46 AM >>>>> Subject: [Bioperl-l] [Fwd: Re: A question about iBio::Index: and >>>>> its >>>>> correct >>>>> use] >>>>> >>>>> >>>>> >>>>> >>>>> ---------------------------- Mensaje original >>>>> ---------------------------- >>>>> Subject: Re: [Bioperl-l] A question about iBio::Index: and its >>>>> correct >>>>> use >>>>> From: jluis.lavin at unavarra.es >>>>> Fecha: Jue, 5 de Noviembre de 2009, 16:46 >>>>> To: "Mark A. Jensen" >>>>> -------------------------------------------------------------------------- >>>>> >>>>> Hi Mark, >>>>> >>>>> I?ve actually got two scripts, the first one is to create the >>>>> index and >>>>> the second one is to retrieve the sequence lis from the indexed >>>>> file. >>>>> >>>>> 1)Here is the Index creation script: >>>>> >>>>> #!/c:/Perl -w >>>>> use strict; >>>>> use Bio::Index::Fasta; >>>>> use strict; >>>>> >>>>> print "Enter file for indexing: \n"; >>>>> my $Index_File_Name = ; >>>>> my $inx = Bio::Index::Fasta->new(-filename => >>>>> $Index_File_Name.".idx", >>>>> -write_flag => 1); >>>>> $inx->make_index(my $File_Name); >>>>> >>>>> 2)And here is the sequence retrieval script: >>>>> >>>>> #!/c:/Perl -w >>>>> use Bio::Index::Fasta; >>>>> use strict; >>>>> #PC9.fasta is my genomic file >>>>> my $Index_File_Name ="PC9.fasta"; >>>>> my $inx = Bio::Index::Fasta->new($Index_File_Name); >>>>> #LCS.txt is my sequences list >>>>> @ARGV = ; >>>>> foreach my $id (@ARGV) { >>>>> if ($id eq ''){ >>>>> die ("empty list") >>>>> } >>>>> else { >>>>> my $seqobj = $inx->fetch($id); >>>>> my $out = new Bio::SeqIO (-file => ">>extracted_seqs.fasta", >>>>> -format => 'fasta'); >>>>> $out->write_seq($seqobj); >>>>> } >>>>> } >>>>> exit; >>>>> } >>>>> >>>>> I hope this code is not a total scum... >>>>> >>>>> Thanks in advance ;) >>>>> >>>>> >>>>> >>>>> El Jue, 5 de Noviembre de 2009, 16:39, Mark A. Jensen escribi?: >>>>>> Jos? -- It looks like this is a good solution to your problem. >>>>>> Please >>>>>> send >>>>>> you >>>>>> script so we can look at it- >>>>>> cheers Mark >>>>>> ----- Original Message ----- >>>>>> From: >>>>>> To: >>>>>> Sent: Thursday, November 05, 2009 10:28 AM >>>>>> Subject: [Bioperl-l] A question about iBio::Index: and its >>>>>> correct use >>>>>> >>>>>> >>>>>> >>>>>> Hello to all, >>>>>> >>>>>> I?m trying to write a script to retrieve a list of sequences >>>>>> from a >>>>>> local >>>>>> FASTA file (for example a fasta archive where all the protein >>>>>> models >>>>>> of >>>>>> an >>>>>> organism are stored). This file would be used by me as some kind >>>>>> "local >>>>>> database" (sorry if I mistake a few concepts...) >>>>>> I?ve been reading the BioPerl HOWTOs and I came across the >>>>>> Bio::Index::Fasta tool. >>>>>> If I didn?t misunderstood what I read (which can be easy because >>>>>> my >>>>>> low >>>>>> level on programming) this Indexing tool should do the job. >>>>>> I wrote a couple of scripts based on the documentation i read >>>>>> about >>>>>> this >>>>>> tool, but I don?t seem to be able to create the index file to be >>>>>> used >>>>>> later (to retrieve the sequences from). >>>>>> -First of all, I want to ask the people in this forum if the >>>>>> Bio::Index::Fasta is the right one to chose for this tasks. >>>>>> -Then I?ll beg you to take a look at my scripts, because I don?t >>>>>> seem >>>>>> to >>>>>> catch the bug... >>>>>> >>>>>> Best wishes to you all and thanks in advance ;) >>>>>> >>>>>> -- >>>>>> Jos? Luis Lav?n Trueba, PhD >>>>>> >>>>>> Dpto. de Producci?n Agraria >>>>>> Grupo de Gen?tica y Microbiolog?a >>>>>> Universidad P?blica de Navarra >>>>>> 31006 Pamplona >>>>>> Navarra >>>>>> SPAIN >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Dr. Jos? Luis Lav?n Trueba >>>>> >>>>> Dpto. de Producci?n Agraria >>>>> Grupo de Gen?tica y Microbiolog?a >>>>> Universidad P?blica de Navarra >>>>> 31006 Pamplona >>>>> Navarra >>>>> SPAIN >>>>> >>>>> >>>>> >>>>> -- >>>>> Dr. Jos? Luis Lav?n Trueba >>>>> >>>>> Dpto. de Producci?n Agraria >>>>> Grupo de Gen?tica y Microbiolog?a >>>>> Universidad P?blica de Navarra >>>>> 31006 Pamplona >>>>> Navarra >>>>> SPAIN >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Dr. Jos? Luis Lav?n Trueba >>>> >>>> Dpto. de Producci?n Agraria >>>> Grupo de Gen?tica y Microbiolog?a >>>> Universidad P?blica de Navarra >>>> 31006 Pamplona >>>> Navarra >>>> SPAIN >>>> >>>> >>>> >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> >> -- >> Dr. Jos? Luis Lav?n Trueba >> >> Dpto. de Producci?n Agraria >> Grupo de Gen?tica y Microbiolog?a >> Universidad P?blica de Navarra >> 31006 Pamplona >> Navarra >> SPAIN >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Dr. Jos? Luis Lav?n Trueba Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN From maj at fortinbras.us Wed Nov 11 18:48:33 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 11 Nov 2009 18:48:33 -0500 Subject: [Bioperl-l] Maq assembly wrapper ready for beta testing Message-ID: <4057E5A862B845EA8BB153888075590C@NewLife> Hi All- New modules are available in the core and in bioperl-run for working with Heng Li's short read assembler "maq" (http://maq.sourceforge.net/maq-man.shtml). Bio::Tools::Run::Maq allows a quick assembly call with a canned a maq pipeline, and also allows individual maq commands to be called separately. It uses Bio::Assembly::IO::maq (a read-only module) to deliver a Bio::Assembly::Scaffold from maq output. If you're interested, see http://www.bioperl.org/wiki/HOWTO:Short-read_assemblies_with_maq and update your core and bioperl-run. The code inherits from Florent's excellent new Bio::Tools::Run::AssemblerBase -- kudos to him!! tests are in bioperl-run/trunk/t/Maq.t, see them for myriad examples send me the bugs MAJ From clarsen at vecna.com Thu Nov 12 12:22:26 2009 From: clarsen at vecna.com (Chris Larsen) Date: Thu, 12 Nov 2009 12:22:26 -0500 Subject: [Bioperl-l] Polyproteins, ribo slippage, and mat_peptide in viruses? In-Reply-To: <320fb6e00910271029m26f07564l727fb78adae81c11@mail.gmail.com> References: <320fb6e00910271029m26f07564l727fb78adae81c11@mail.gmail.com> Message-ID: <7BBAE077-4D76-46C2-BF66-363F5A017278@vecna.com> All, This is a short followup on the prior thread of discussion, regarding computing mature peptide sequences for viruses. The topic has gone underwater for the time being as we solve some problems with source data. While the biopython effort and contributors on this board have given good guidance, and we now have scripts that function (thanks mostly to pcock), however, the source data on which everything relies is suspect: mat_peptide 15118..16914 <=== /product="nsp13" /note="helicase" I can tell you the virus community does not want to rely heavily, on those position numbers. Furthermore we have found fewer compete source genomes for viruses than bacteria, more virus-to-virus variation in the data fields annotated in the GBK file, (Gene, CDS, ORF, Protein, Polyprotein, mat_peptide, db_xref) and in fact the community will have to come together significantly on how these molecules are defined in public repositories, before a mature scripting effort becomes reliable, public and well received. Because of the variation in viruses, it's not even clear at this point what a 'gene' is. I will let you know how we proceed when more sequence data has been fully analyzed, and we can think about making any perl based solution a new viral protein module. Thanks, Chris -- Christopher Larsen, Ph.D. Sr. Scientist / Grants Manager Vecna Technologies 6404 Ivy Lane #500 Greenbelt, MD 20770 Phone: (240) 965-4525 Fax: (240) 547-6133 240-737-4525 From David.Messina at sbc.su.se Thu Nov 12 14:20:54 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 12 Nov 2009 20:20:54 +0100 Subject: [Bioperl-l] highest PAML version supported? Message-ID: <628aabb70911121120w4c609056v50204b9bd9e5c3fb@mail.gmail.com> Hi everyone, What is the latest version of PAML (specifically codeml) that I can use with bioperl-live and bioperl-run? I looked around and couldn't find where (or if) this is documented. With PAML version 4.3a against the current trunk of both -live and -run I see this: ------------- EXCEPTION Bio::Root::NotImplemented ------------- MSG: Unknown format of PAML output did not see seqtype STACK Bio::Tools::Phylo::PAML::_parse_summary /Users/dave/src/bioperl-live/Bio/Tools/Phylo/PAML.pm:461 STACK Bio::Tools::Phylo::PAML::next_result /Users/dave/src/bioperl-live/Bio/Tools/Phylo/PAML.pm:270 STACK toplevel ../bin/cluster_kaks:251 --------------------------------------------------------------- ...which I suspect (but haven't confirmed) is due to a change in the file format. Dave From jason at bioperl.org Thu Nov 12 14:29:22 2009 From: jason at bioperl.org (Jason Stajich) Date: Thu, 12 Nov 2009 11:29:22 -0800 Subject: [Bioperl-l] highest PAML version supported? In-Reply-To: <628aabb70911121120w4c609056v50204b9bd9e5c3fb@mail.gmail.com> References: <628aabb70911121120w4c609056v50204b9bd9e5c3fb@mail.gmail.com> Message-ID: prolly 3.15 or so. it really needs a maintainer!!! On Nov 12, 2009, at 11:20 AM, Dave Messina wrote: > Hi everyone, > > What is the latest version of PAML (specifically codeml) that I can > use with > bioperl-live and bioperl-run? > > I looked around and couldn't find where (or if) this is documented. > > > With PAML version 4.3a against the current trunk of both -live and - > run I > see this: > ------------- EXCEPTION Bio::Root::NotImplemented ------------- > MSG: Unknown format of PAML output did not see seqtype > STACK Bio::Tools::Phylo::PAML::_parse_summary > /Users/dave/src/bioperl-live/Bio/Tools/Phylo/PAML.pm:461 > STACK Bio::Tools::Phylo::PAML::next_result > /Users/dave/src/bioperl-live/Bio/Tools/Phylo/PAML.pm:270 > STACK toplevel ../bin/cluster_kaks:251 > --------------------------------------------------------------- > > ...which I suspect (but haven't confirmed) is due to a change in the > file > format. > > > Dave > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From scott at scottcain.net Fri Nov 13 09:48:43 2009 From: scott at scottcain.net (Scott Cain) Date: Fri, 13 Nov 2009 09:48:43 -0500 Subject: [Bioperl-l] January GMOD meeting announcement Message-ID: <4536f7700911130648j40eb2d82g2594adaccf476d73@mail.gmail.com> Hello, I am pleased to announce that the January GMOD meeting will be taking place on January 14 and 15 in San Diego at the Best Western Seven Seas (the same location as last year). Please see this page for registration information: http://gmod.org/wiki/January_2010_GMOD_Meeting When you go to that page, please take a moment to add suggestions for the agenda. There is no registration fee for this meeting, however there is limited space, so please register early. The proprietors of the Best Western have given us an excellent room rate, and extended it to the previous week, so that people attending the GMOD meeting and the Plant and Animal Genome meeting before it may stay at the Best Western the entire time. Please direct follow up questions to the gmod-devel mailing list: https://lists.sourceforge.net/lists/listinfo/gmod-devel Thanks and I look forward to seeing you in San Diego! Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From j.inoue at ucl.ac.uk Sat Nov 14 14:20:29 2009 From: j.inoue at ucl.ac.uk (Jun Inoue) Date: Sat, 14 Nov 2009 19:20:29 +0000 Subject: [Bioperl-l] Bio::TreeIO, Root-tip branch lengths Message-ID: Dear All, I just started to learn BioPerl for phylogenetics. Usually I am using perl v5.10.0 on my Mac OS 10.5.8. I would like to ask you a hint to calculate the Branch lengths from root to tip for all species in NEWICK TREE format. Please see the following web site. I am explaining what I want to do and showing my easy script (not completed). http://www.geocities.jp/ancientfishtree/BioPerl_BLRootTip.html Thank you for your help. Best, Jun Inoue http://www.geocities.jp/ancientfishtree/index_eng.html From maj at fortinbras.us Sat Nov 14 16:47:37 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sat, 14 Nov 2009 16:47:37 -0500 Subject: [Bioperl-l] Bio::TreeIO, Root-tip branch lengths In-Reply-To: References: Message-ID: <3BC179984D5E49868C4F12D181D82B8D@NewLife> Hi Jun, Some hints: incorporate @leaves = $tree->get_leaf_nodes; and use Bio::Tree::TreeFunctionsI; $distance = $tree->distance( $node_a, $node_b ); cheers, Mark ----- Original Message ----- From: "Jun Inoue" To: Cc: "?? ?" Sent: Saturday, November 14, 2009 2:20 PM Subject: [Bioperl-l] Bio::TreeIO, Root-tip branch lengths > Dear All, > > I just started to learn BioPerl for phylogenetics. > Usually I am using perl v5.10.0 on my Mac OS 10.5.8. > I would like to ask you a hint to calculate the Branch lengths > from root to tip for all species in NEWICK TREE format. > > Please see the following web site. > I am explaining what I want to do and > showing my easy script (not completed). > http://www.geocities.jp/ancientfishtree/BioPerl_BLRootTip.html > > Thank you for your help. > > Best, > Jun Inoue > http://www.geocities.jp/ancientfishtree/index_eng.html > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From jay at jays.net Sun Nov 15 20:23:38 2009 From: jay at jays.net (Jay Hannah) Date: Sun, 15 Nov 2009 19:23:38 -0600 Subject: [Bioperl-l] Bio::Index::GenBank - by organism? In-Reply-To: <12DFD22E-42DC-4626-9873-0DE3EBB5CFBD@illinois.edu> References: <3B01A09C-198E-4691-B807-7ED3250BB81A@jays.net> <12DFD22E-42DC-4626-9873-0DE3EBB5CFBD@illinois.edu> Message-ID: On Nov 9, 2009, at 9:55 PM, Chris Fields wrote: > It should work via id_parser(); from Bio::Index::GenBank: > > $inx->id_parser(\&get_id); > # make the index > $inx->make_index($file_name); > > # here is where the retrieval key is specified > sub get_id { > my $line = shift; > $line =~ /clone="(\S+)"/; > $1; > } This worked great for me today (tackling a different problem than the original). Thanks!! j From veronica.xiaoyu at gmail.com Fri Nov 13 15:35:48 2009 From: veronica.xiaoyu at gmail.com (Xiaoyu Liang) Date: Fri, 13 Nov 2009 15:35:48 -0500 Subject: [Bioperl-l] Bio::Graphics::Panel question Message-ID: Hi, I'm using Bio::Graphics to parse the blast result and generate images. But, sometimes, in the middle of the output image, the hit's color is white, eventhough I set it to other colors. I attached the picture here for an example. This doesn't occur all the time, usually, it works well. I'm wondering if I did something wrong? or depends on the blast result? Thank you, Xiaoyu -------------- next part -------------- A non-text attachment was scrubbed... Name: BLAST_problem.jpg Type: image/jpeg Size: 51888 bytes Desc: not available URL: From ryan_bogard at hms.harvard.edu Sun Nov 15 22:30:22 2009 From: ryan_bogard at hms.harvard.edu (rbogard) Date: Sun, 15 Nov 2009 19:30:22 -0800 (PST) Subject: [Bioperl-l] Problems with bioperl in Mac OS X 10.6 (Perl 5.10.0) Message-ID: <26366421.post@talk.nabble.com> In advance, any advice would be grealy appreciated! I have installed bioperl-588pm via fink but I am having difficulties calling the modules in script. The following is added to .profile (bash): PERL5LIB=/sw/lib/perl5/5.8.8:$PERL5LIB If I change this to /sw/lib/perl5 then I get an @INC error, as use Bio::PERL cannot be located. The environment variables are as follows: MANPATH=/sw/share/man:/usr/share/man:/usr/X11/man:/sw/lib/perl5/5.10.0/man:/usr/X11R6/man:/sw/lib/perl5-core/5.8.8/man:/sw/lib/perl5/5.8.8/man PERL5LIB=/sw/lib/perl5/5.8.8:/sw/lib/perl5:/sw/lib/perl5/darwin:/sw/lib/perl5/5.8.8 PATH=/sw/bin:/sw/sbin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin:/usr/X11R6/bin INFOPATH=/sw/share/info:/sw/info:/usr/share/info This is the perl script I'm attempting to run: #!/sw/bin/perl5.8.8 use strict; use Bio::Perl; $seq_object = get_sequence('swiss',"ROA1_HUMAN"); write_sequence(">roa1.fasta",'fasta',$seq_object); Here is the error output: dyld: lazy symbol binding failed: Symbol not found: _Perl_Tstack_sp_ptr Referenced from: /sw/lib/perl5/5.8.8/darwin-thread-multi-2level/auto/IO/IO.bundle Expected in: dynamic lookup dyld: Symbol not found: _Perl_Tstack_sp_ptr Referenced from: /sw/lib/perl5/5.8.8/darwin-thread-multi-2level/auto/IO/IO.bundle Expected in: dynamic lookup Trace/BPT trap I have looked through many forum postings and attempted the solutions offered in those instances, but none seem to work in my case. I'm not sure if it's because I have perl 5.10.0 installed while attempting to call bioperl 5.8.8; however, others seem to have it working just fine. Thank you, Ryan -- View this message in context: http://old.nabble.com/Problems-with-bioperl-in-Mac-OS-X-10.6-%28Perl-5.10.0%29-tp26366421p26366421.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From e.osimo at gmail.com Mon Nov 16 02:04:40 2009 From: e.osimo at gmail.com (Emanuele Osimo) Date: Mon, 16 Nov 2009 08:04:40 +0100 Subject: [Bioperl-l] Problems with bioperl in Mac OS X 10.6 (Perl 5.10.0) In-Reply-To: <26366421.post@talk.nabble.com> References: <26366421.post@talk.nabble.com> Message-ID: <2ac05d0f0911152304v58985cb5x6ea0501bff7a41ab@mail.gmail.com> Hello Ryan, unfortunately, if you upgraded to 10.6 without formatting, I have to tell you that you'll be in big trouble with perl and with everything you installed from the commandline... Because in the upgrade process everything in the system folders, perl and bioperl being some of these things, is erased without being uninstalled, so you'll find a lot of folders with the same name but no contents. I suggest you, as I did, to format your pc and reinstall 10.6 from scratch. Then youl'll be able to install mysql (I had to install mysql-5.4.3-beta-osx10.5, the only to work on 10.6), and, working with perl 5.10 that is already installed, you'll install bioperl with no effort. Bye Emanuele On Mon, Nov 16, 2009 at 04:30, rbogard wrote: > > In advance, any advice would be grealy appreciated! I have installed > bioperl-588pm via fink but I am having difficulties calling the modules in > script. The following is added to .profile (bash): > PERL5LIB=/sw/lib/perl5/5.8.8:$PERL5LIB > > If I change this to /sw/lib/perl5 then I get an @INC error, as use > Bio::PERL > cannot be located. > > The environment variables are as follows: > > > MANPATH=/sw/share/man:/usr/share/man:/usr/X11/man:/sw/lib/perl5/5.10.0/man:/usr/X11R6/man:/sw/lib/perl5-core/5.8.8/man:/sw/lib/perl5/5.8.8/man > > PERL5LIB=/sw/lib/perl5/5.8.8:/sw/lib/perl5:/sw/lib/perl5/darwin:/sw/lib/perl5/5.8.8 > > PATH=/sw/bin:/sw/sbin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin:/usr/X11R6/bin > INFOPATH=/sw/share/info:/sw/info:/usr/share/info > > > This is the perl script I'm attempting to run: > #!/sw/bin/perl5.8.8 > use strict; > use Bio::Perl; > $seq_object = get_sequence('swiss',"ROA1_HUMAN"); > write_sequence(">roa1.fasta",'fasta',$seq_object); > > Here is the error output: > > dyld: lazy symbol binding failed: Symbol not found: _Perl_Tstack_sp_ptr > Referenced from: > /sw/lib/perl5/5.8.8/darwin-thread-multi-2level/auto/IO/IO.bundle > Expected in: dynamic lookup > > dyld: Symbol not found: _Perl_Tstack_sp_ptr > Referenced from: > /sw/lib/perl5/5.8.8/darwin-thread-multi-2level/auto/IO/IO.bundle > Expected in: dynamic lookup > > Trace/BPT trap > > I have looked through many forum postings and attempted the solutions > offered in those instances, but none seem to work in my case. I'm not sure > if it's because I have perl 5.10.0 installed while attempting to call > bioperl 5.8.8; however, others seem to have it working just fine. > > Thank you, Ryan > -- > View this message in context: > http://old.nabble.com/Problems-with-bioperl-in-Mac-OS-X-10.6-%28Perl-5.10.0%29-tp26366421p26366421.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From ryan_bogard at hms.harvard.edu Mon Nov 16 08:43:19 2009 From: ryan_bogard at hms.harvard.edu (rbogard) Date: Mon, 16 Nov 2009 05:43:19 -0800 (PST) Subject: [Bioperl-l] Problems with bioperl in Mac OS X 10.6 (Perl 5.10.0) In-Reply-To: <2ac05d0f0911152304v58985cb5x6ea0501bff7a41ab@mail.gmail.com> References: <26366421.post@talk.nabble.com> <2ac05d0f0911152304v58985cb5x6ea0501bff7a41ab@mail.gmail.com> Message-ID: <26372079.post@talk.nabble.com> The Mac OS X 10.6 was a fresh install on a new Mac Book Pro. Not sure if I will have the same issues, but it's worth a shot as I have little on my computer and reinstalling to start over wouldn't be too difficult. What method did you use to install bioperl? I used fink and I am not sure the available stable version is the one I need. I will install from the command line this time around, and let you know how it turns out. Thank you! Emanuele Osimo wrote: > > Hello Ryan, > unfortunately, if you upgraded to 10.6 without formatting, I have to tell > you that you'll be in big trouble with perl and with everything you > installed from the commandline... Because in the upgrade process > everything > in the system folders, perl and bioperl being some of these things, is > erased without being uninstalled, so you'll find a lot of folders with the > same name but no contents. > I suggest you, as I did, to format your pc and reinstall 10.6 from > scratch. > Then youl'll be able to install mysql (I had to install > mysql-5.4.3-beta-osx10.5, the only to work on 10.6), and, working with > perl > 5.10 that is already installed, you'll install bioperl with no effort. > Bye > Emanuele > > On Mon, Nov 16, 2009 at 04:30, rbogard > wrote: > >> >> In advance, any advice would be grealy appreciated! I have installed >> bioperl-588pm via fink but I am having difficulties calling the modules >> in >> script. The following is added to .profile (bash): >> PERL5LIB=/sw/lib/perl5/5.8.8:$PERL5LIB >> >> If I change this to /sw/lib/perl5 then I get an @INC error, as use >> Bio::PERL >> cannot be located. >> >> The environment variables are as follows: >> >> >> MANPATH=/sw/share/man:/usr/share/man:/usr/X11/man:/sw/lib/perl5/5.10.0/man:/usr/X11R6/man:/sw/lib/perl5-core/5.8.8/man:/sw/lib/perl5/5.8.8/man >> >> PERL5LIB=/sw/lib/perl5/5.8.8:/sw/lib/perl5:/sw/lib/perl5/darwin:/sw/lib/perl5/5.8.8 >> >> PATH=/sw/bin:/sw/sbin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin:/usr/X11R6/bin >> INFOPATH=/sw/share/info:/sw/info:/usr/share/info >> >> >> This is the perl script I'm attempting to run: >> #!/sw/bin/perl5.8.8 >> use strict; >> use Bio::Perl; >> $seq_object = get_sequence('swiss',"ROA1_HUMAN"); >> write_sequence(">roa1.fasta",'fasta',$seq_object); >> >> Here is the error output: >> >> dyld: lazy symbol binding failed: Symbol not found: _Perl_Tstack_sp_ptr >> Referenced from: >> /sw/lib/perl5/5.8.8/darwin-thread-multi-2level/auto/IO/IO.bundle >> Expected in: dynamic lookup >> >> dyld: Symbol not found: _Perl_Tstack_sp_ptr >> Referenced from: >> /sw/lib/perl5/5.8.8/darwin-thread-multi-2level/auto/IO/IO.bundle >> Expected in: dynamic lookup >> >> Trace/BPT trap >> >> I have looked through many forum postings and attempted the solutions >> offered in those instances, but none seem to work in my case. I'm not >> sure >> if it's because I have perl 5.10.0 installed while attempting to call >> bioperl 5.8.8; however, others seem to have it working just fine. >> >> Thank you, Ryan >> -- >> View this message in context: >> http://old.nabble.com/Problems-with-bioperl-in-Mac-OS-X-10.6-%28Perl-5.10.0%29-tp26366421p26366421.html >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://old.nabble.com/Problems-with-bioperl-in-Mac-OS-X-10.6-%28Perl-5.10.0%29-tp26366421p26372079.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From maj at fortinbras.us Mon Nov 16 08:48:17 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 16 Nov 2009 08:48:17 -0500 Subject: [Bioperl-l] Problems with bioperl in Mac OS X 10.6 (Perl 5.10.0) In-Reply-To: <26372079.post@talk.nabble.com> References: <26366421.post@talk.nabble.com><2ac05d0f0911152304v58985cb5x6ea0501bff7a41ab@mail.gmail.com> <26372079.post@talk.nabble.com> Message-ID: <8D822081B13F49C2A37677D3A47F38B4@NewLife> Ryan, I'm not a mac person, but Koen has said (see http://www.bioperl.org/wiki/Getting_BioPerl#Mac_OS_X_using_fink ) to use the unstable tree to get BioPerl 1.6.1, which is likely to be what you want. cheers Mark ----- Original Message ----- From: "rbogard" To: Sent: Monday, November 16, 2009 8:43 AM Subject: Re: [Bioperl-l] Problems with bioperl in Mac OS X 10.6 (Perl 5.10.0) > > The Mac OS X 10.6 was a fresh install on a new Mac Book Pro. Not sure if I > will have the same issues, but it's worth a shot as I have little on my > computer and reinstalling to start over wouldn't be too difficult. What > method did you use to install bioperl? I used fink and I am not sure the > available stable version is the one I need. I will install from the command > line this time around, and let you know how it turns out. > > Thank you! > > > > Emanuele Osimo wrote: >> >> Hello Ryan, >> unfortunately, if you upgraded to 10.6 without formatting, I have to tell >> you that you'll be in big trouble with perl and with everything you >> installed from the commandline... Because in the upgrade process >> everything >> in the system folders, perl and bioperl being some of these things, is >> erased without being uninstalled, so you'll find a lot of folders with the >> same name but no contents. >> I suggest you, as I did, to format your pc and reinstall 10.6 from >> scratch. >> Then youl'll be able to install mysql (I had to install >> mysql-5.4.3-beta-osx10.5, the only to work on 10.6), and, working with >> perl >> 5.10 that is already installed, you'll install bioperl with no effort. >> Bye >> Emanuele >> >> On Mon, Nov 16, 2009 at 04:30, rbogard >> wrote: >> >>> >>> In advance, any advice would be grealy appreciated! I have installed >>> bioperl-588pm via fink but I am having difficulties calling the modules >>> in >>> script. The following is added to .profile (bash): >>> PERL5LIB=/sw/lib/perl5/5.8.8:$PERL5LIB >>> >>> If I change this to /sw/lib/perl5 then I get an @INC error, as use >>> Bio::PERL >>> cannot be located. >>> >>> The environment variables are as follows: >>> >>> >>> MANPATH=/sw/share/man:/usr/share/man:/usr/X11/man:/sw/lib/perl5/5.10.0/man:/usr/X11R6/man:/sw/lib/perl5-core/5.8.8/man:/sw/lib/perl5/5.8.8/man >>> >>> PERL5LIB=/sw/lib/perl5/5.8.8:/sw/lib/perl5:/sw/lib/perl5/darwin:/sw/lib/perl5/5.8.8 >>> >>> PATH=/sw/bin:/sw/sbin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin:/usr/X11R6/bin >>> INFOPATH=/sw/share/info:/sw/info:/usr/share/info >>> >>> >>> This is the perl script I'm attempting to run: >>> #!/sw/bin/perl5.8.8 >>> use strict; >>> use Bio::Perl; >>> $seq_object = get_sequence('swiss',"ROA1_HUMAN"); >>> write_sequence(">roa1.fasta",'fasta',$seq_object); >>> >>> Here is the error output: >>> >>> dyld: lazy symbol binding failed: Symbol not found: _Perl_Tstack_sp_ptr >>> Referenced from: >>> /sw/lib/perl5/5.8.8/darwin-thread-multi-2level/auto/IO/IO.bundle >>> Expected in: dynamic lookup >>> >>> dyld: Symbol not found: _Perl_Tstack_sp_ptr >>> Referenced from: >>> /sw/lib/perl5/5.8.8/darwin-thread-multi-2level/auto/IO/IO.bundle >>> Expected in: dynamic lookup >>> >>> Trace/BPT trap >>> >>> I have looked through many forum postings and attempted the solutions >>> offered in those instances, but none seem to work in my case. I'm not >>> sure >>> if it's because I have perl 5.10.0 installed while attempting to call >>> bioperl 5.8.8; however, others seem to have it working just fine. >>> >>> Thank you, Ryan >>> -- >>> View this message in context: >>> http://old.nabble.com/Problems-with-bioperl-in-Mac-OS-X-10.6-%28Perl-5.10.0%29-tp26366421p26366421.html >>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > -- > View this message in context: > http://old.nabble.com/Problems-with-bioperl-in-Mac-OS-X-10.6-%28Perl-5.10.0%29-tp26366421p26372079.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Mon Nov 16 10:00:09 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 16 Nov 2009 09:00:09 -0600 Subject: [Bioperl-l] Problems with bioperl in Mac OS X 10.6 (Perl 5.10.0) In-Reply-To: <2ac05d0f0911152304v58985cb5x6ea0501bff7a41ab@mail.gmail.com> References: <26366421.post@talk.nabble.com> <2ac05d0f0911152304v58985cb5x6ea0501bff7a41ab@mail.gmail.com> Message-ID: <49681E01-E95D-4FC6-AE42-6E57ED43AAA2@illinois.edu> On Nov 16, 2009, at 1:04 AM, Emanuele Osimo wrote: > Hello Ryan, > unfortunately, if you upgraded to 10.6 without formatting, I have to tell > you that you'll be in big trouble with perl and with everything you > installed from the commandline... Because in the upgrade process everything > in the system folders, perl and bioperl being some of these things, is > erased without being uninstalled, so you'll find a lot of folders with the > same name but no contents. > I suggest you, as I did, to format your pc and reinstall 10.6 from scratch. > Then youl'll be able to install mysql (I had to install > mysql-5.4.3-beta-osx10.5, the only to work on 10.6), and, working with perl > 5.10 that is already installed, you'll install bioperl with no effort. > Bye > Emanuele Just starting from scratch isn't always the best solution (though it is the cleanest). In this case I don't think anything you mention applies, as there are conflicting symbols being reported. My guess is conflicting perl builds, probably between your system 5.10.0 (snow leopard) and your fink-installed perl 5.8.8 (they are binary incompatible). Also, remember that snow leopard is primarily 64-bit, so it might be best to try working out whether your fink is attempting to compile 64- vs 32-bit. In this case, I would just uninstall the fink-based perl and either use the system one (snow leopard = 5.10.0), or roll your own and install 5.10.1 locally or in /usr/local. Do NOT replace the system one, as that will likely break your OS. In my experience, and not to bash on fink or MacPorts, I never had much luck with their perl installs. Unless I plan on only using fink or macports for my OS (not likely in my case), I find they tend to cause problems in the long term unless one uses them to install packages with very few dependencies, and even then you need to make sure fink is configure to compile the correct binary. For instance, they're fairly good for gd, libxml2, etc., but beyond that one may get into issues with odd, version-specific dependencies with some packages, such as relying on perl 5.8.8 (but not perl 5.10.x), db42 (instead of db44), etc. I've ended up in the past with 2-3 different perl versions, berkeley db versions, etc. chris > On Mon, Nov 16, 2009 at 04:30, rbogard wrote: > >> >> In advance, any advice would be grealy appreciated! I have installed >> bioperl-588pm via fink but I am having difficulties calling the modules in >> script. The following is added to .profile (bash): >> PERL5LIB=/sw/lib/perl5/5.8.8:$PERL5LIB >> >> If I change this to /sw/lib/perl5 then I get an @INC error, as use >> Bio::PERL >> cannot be located. >> >> The environment variables are as follows: >> >> >> MANPATH=/sw/share/man:/usr/share/man:/usr/X11/man:/sw/lib/perl5/5.10.0/man:/usr/X11R6/man:/sw/lib/perl5-core/5.8.8/man:/sw/lib/perl5/5.8.8/man >> >> PERL5LIB=/sw/lib/perl5/5.8.8:/sw/lib/perl5:/sw/lib/perl5/darwin:/sw/lib/perl5/5.8.8 >> >> PATH=/sw/bin:/sw/sbin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin:/usr/X11R6/bin >> INFOPATH=/sw/share/info:/sw/info:/usr/share/info >> >> >> This is the perl script I'm attempting to run: >> #!/sw/bin/perl5.8.8 >> use strict; >> use Bio::Perl; >> $seq_object = get_sequence('swiss',"ROA1_HUMAN"); >> write_sequence(">roa1.fasta",'fasta',$seq_object); >> >> Here is the error output: >> >> dyld: lazy symbol binding failed: Symbol not found: _Perl_Tstack_sp_ptr >> Referenced from: >> /sw/lib/perl5/5.8.8/darwin-thread-multi-2level/auto/IO/IO.bundle >> Expected in: dynamic lookup >> >> dyld: Symbol not found: _Perl_Tstack_sp_ptr >> Referenced from: >> /sw/lib/perl5/5.8.8/darwin-thread-multi-2level/auto/IO/IO.bundle >> Expected in: dynamic lookup >> >> Trace/BPT trap >> >> I have looked through many forum postings and attempted the solutions >> offered in those instances, but none seem to work in my case. I'm not sure >> if it's because I have perl 5.10.0 installed while attempting to call >> bioperl 5.8.8; however, others seem to have it working just fine. >> >> Thank you, Ryan >> -- >> View this message in context: >> http://old.nabble.com/Problems-with-bioperl-in-Mac-OS-X-10.6-%28Perl-5.10.0%29-tp26366421p26366421.html >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Nov 16 10:01:01 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 16 Nov 2009 09:01:01 -0600 Subject: [Bioperl-l] Problems with bioperl in Mac OS X 10.6 (Perl 5.10.0) In-Reply-To: <8D822081B13F49C2A37677D3A47F38B4@NewLife> References: <26366421.post@talk.nabble.com><2ac05d0f0911152304v58985cb5x6ea0501bff7a41ab@mail.gmail.com> <26372079.post@talk.nabble.com> <8D822081B13F49C2A37677D3A47F38B4@NewLife> Message-ID: <58912861-CD59-4AFC-8F30-B0AA2E77AECB@illinois.edu> Actually, why not just install via CPAN? Any particular reason? chris On Nov 16, 2009, at 7:48 AM, Mark A. Jensen wrote: > Ryan, > I'm not a mac person, but Koen has said (see http://www.bioperl.org/wiki/Getting_BioPerl#Mac_OS_X_using_fink ) > to use the unstable tree to get BioPerl 1.6.1, which is likely to be what you want. > cheers > Mark > ----- Original Message ----- From: "rbogard" > To: > Sent: Monday, November 16, 2009 8:43 AM > Subject: Re: [Bioperl-l] Problems with bioperl in Mac OS X 10.6 (Perl 5.10.0) > > >> >> The Mac OS X 10.6 was a fresh install on a new Mac Book Pro. Not sure if I >> will have the same issues, but it's worth a shot as I have little on my >> computer and reinstalling to start over wouldn't be too difficult. What >> method did you use to install bioperl? I used fink and I am not sure the >> available stable version is the one I need. I will install from the command >> line this time around, and let you know how it turns out. >> >> Thank you! >> >> >> >> Emanuele Osimo wrote: >>> >>> Hello Ryan, >>> unfortunately, if you upgraded to 10.6 without formatting, I have to tell >>> you that you'll be in big trouble with perl and with everything you >>> installed from the commandline... Because in the upgrade process >>> everything >>> in the system folders, perl and bioperl being some of these things, is >>> erased without being uninstalled, so you'll find a lot of folders with the >>> same name but no contents. >>> I suggest you, as I did, to format your pc and reinstall 10.6 from >>> scratch. >>> Then youl'll be able to install mysql (I had to install >>> mysql-5.4.3-beta-osx10.5, the only to work on 10.6), and, working with >>> perl >>> 5.10 that is already installed, you'll install bioperl with no effort. >>> Bye >>> Emanuele >>> >>> On Mon, Nov 16, 2009 at 04:30, rbogard >>> wrote: >>> >>>> >>>> In advance, any advice would be grealy appreciated! I have installed >>>> bioperl-588pm via fink but I am having difficulties calling the modules >>>> in >>>> script. The following is added to .profile (bash): >>>> PERL5LIB=/sw/lib/perl5/5.8.8:$PERL5LIB >>>> >>>> If I change this to /sw/lib/perl5 then I get an @INC error, as use >>>> Bio::PERL >>>> cannot be located. >>>> >>>> The environment variables are as follows: >>>> >>>> >>>> MANPATH=/sw/share/man:/usr/share/man:/usr/X11/man:/sw/lib/perl5/5.10.0/man:/usr/X11R6/man:/sw/lib/perl5-core/5.8.8/man:/sw/lib/perl5/5.8.8/man >>>> >>>> PERL5LIB=/sw/lib/perl5/5.8.8:/sw/lib/perl5:/sw/lib/perl5/darwin:/sw/lib/perl5/5.8.8 >>>> >>>> PATH=/sw/bin:/sw/sbin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin:/usr/X11R6/bin >>>> INFOPATH=/sw/share/info:/sw/info:/usr/share/info >>>> >>>> >>>> This is the perl script I'm attempting to run: >>>> #!/sw/bin/perl5.8.8 >>>> use strict; >>>> use Bio::Perl; >>>> $seq_object = get_sequence('swiss',"ROA1_HUMAN"); >>>> write_sequence(">roa1.fasta",'fasta',$seq_object); >>>> >>>> Here is the error output: >>>> >>>> dyld: lazy symbol binding failed: Symbol not found: _Perl_Tstack_sp_ptr >>>> Referenced from: >>>> /sw/lib/perl5/5.8.8/darwin-thread-multi-2level/auto/IO/IO.bundle >>>> Expected in: dynamic lookup >>>> >>>> dyld: Symbol not found: _Perl_Tstack_sp_ptr >>>> Referenced from: >>>> /sw/lib/perl5/5.8.8/darwin-thread-multi-2level/auto/IO/IO.bundle >>>> Expected in: dynamic lookup >>>> >>>> Trace/BPT trap >>>> >>>> I have looked through many forum postings and attempted the solutions >>>> offered in those instances, but none seem to work in my case. I'm not >>>> sure >>>> if it's because I have perl 5.10.0 installed while attempting to call >>>> bioperl 5.8.8; however, others seem to have it working just fine. >>>> >>>> Thank you, Ryan >>>> -- >>>> View this message in context: >>>> http://old.nabble.com/Problems-with-bioperl-in-Mac-OS-X-10.6-%28Perl-5.10.0%29-tp26366421p26366421.html >>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> -- >> View this message in context: http://old.nabble.com/Problems-with-bioperl-in-Mac-OS-X-10.6-%28Perl-5.10.0%29-tp26366421p26372079.html >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Kevin.M.Brown at asu.edu Mon Nov 16 10:49:13 2009 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Mon, 16 Nov 2009 08:49:13 -0700 Subject: [Bioperl-l] Bio::Graphics::Panel question In-Reply-To: References: Message-ID: <1A4207F8295607498283FE9E93B775B40663EDB9@EX02.asurite.ad.asu.edu> To really be able to tell if this was a bug, I (and probably the real devs) would need to see that part of your code and the Blast file that is having this issue as it could be your callback for color choice vs the blast object (e.g. your color picker is missing an option that the data comes in with and so returns with a blank value). -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Xiaoyu Liang Sent: Friday, November 13, 2009 1:36 PM To: Bioperl-l at lists.open-bio.org Subject: [Bioperl-l] Bio::Graphics::Panel question Hi, I'm using Bio::Graphics to parse the blast result and generate images. But, sometimes, in the middle of the output image, the hit's color is white, eventhough I set it to other colors. I attached the picture here for an example. This doesn't occur all the time, usually, it works well. I'm wondering if I did something wrong? or depends on the blast result? Thank you, Xiaoyu From ryan_bogard at hms.harvard.edu Mon Nov 16 11:57:16 2009 From: ryan_bogard at hms.harvard.edu (rbogard) Date: Mon, 16 Nov 2009 08:57:16 -0800 (PST) Subject: [Bioperl-l] Problems with bioperl in Mac OS X 10.6 (Perl 5.10.0) In-Reply-To: <58912861-CD59-4AFC-8F30-B0AA2E77AECB@illinois.edu> References: <26366421.post@talk.nabble.com> <2ac05d0f0911152304v58985cb5x6ea0501bff7a41ab@mail.gmail.com> <26372079.post@talk.nabble.com> <8D822081B13F49C2A37677D3A47F38B4@NewLife> <58912861-CD59-4AFC-8F30-B0AA2E77AECB@illinois.edu> Message-ID: <26375418.post@talk.nabble.com> I read that posting by Koen and used the unstable tree after the first attempt; however, the errors still persisted. I just finished a fresh install and I will just follow Mr. Fields advice and use CPAN. Thank you all for the help! Chris Fields-5 wrote: > > Actually, why not just install via CPAN? Any particular reason? > > chris > > On Nov 16, 2009, at 7:48 AM, Mark A. Jensen wrote: > >> Ryan, >> I'm not a mac person, but Koen has said (see >> http://www.bioperl.org/wiki/Getting_BioPerl#Mac_OS_X_using_fink ) >> to use the unstable tree to get BioPerl 1.6.1, which is likely to be what >> you want. >> cheers >> Mark >> ----- Original Message ----- From: "rbogard" >> >> To: >> Sent: Monday, November 16, 2009 8:43 AM >> Subject: Re: [Bioperl-l] Problems with bioperl in Mac OS X 10.6 (Perl >> 5.10.0) >> >> >>> >>> The Mac OS X 10.6 was a fresh install on a new Mac Book Pro. Not sure if >>> I >>> will have the same issues, but it's worth a shot as I have little on my >>> computer and reinstalling to start over wouldn't be too difficult. What >>> method did you use to install bioperl? I used fink and I am not sure the >>> available stable version is the one I need. I will install from the >>> command >>> line this time around, and let you know how it turns out. >>> >>> Thank you! >>> >>> >>> >>> Emanuele Osimo wrote: >>>> >>>> Hello Ryan, >>>> unfortunately, if you upgraded to 10.6 without formatting, I have to >>>> tell >>>> you that you'll be in big trouble with perl and with everything you >>>> installed from the commandline... Because in the upgrade process >>>> everything >>>> in the system folders, perl and bioperl being some of these things, is >>>> erased without being uninstalled, so you'll find a lot of folders with >>>> the >>>> same name but no contents. >>>> I suggest you, as I did, to format your pc and reinstall 10.6 from >>>> scratch. >>>> Then youl'll be able to install mysql (I had to install >>>> mysql-5.4.3-beta-osx10.5, the only to work on 10.6), and, working with >>>> perl >>>> 5.10 that is already installed, you'll install bioperl with no effort. >>>> Bye >>>> Emanuele >>>> >>>> On Mon, Nov 16, 2009 at 04:30, rbogard >>>> wrote: >>>> >>>>> >>>>> In advance, any advice would be grealy appreciated! I have installed >>>>> bioperl-588pm via fink but I am having difficulties calling the >>>>> modules >>>>> in >>>>> script. The following is added to .profile (bash): >>>>> PERL5LIB=/sw/lib/perl5/5.8.8:$PERL5LIB >>>>> >>>>> If I change this to /sw/lib/perl5 then I get an @INC error, as use >>>>> Bio::PERL >>>>> cannot be located. >>>>> >>>>> The environment variables are as follows: >>>>> >>>>> >>>>> MANPATH=/sw/share/man:/usr/share/man:/usr/X11/man:/sw/lib/perl5/5.10.0/man:/usr/X11R6/man:/sw/lib/perl5-core/5.8.8/man:/sw/lib/perl5/5.8.8/man >>>>> >>>>> PERL5LIB=/sw/lib/perl5/5.8.8:/sw/lib/perl5:/sw/lib/perl5/darwin:/sw/lib/perl5/5.8.8 >>>>> >>>>> PATH=/sw/bin:/sw/sbin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin:/usr/X11R6/bin >>>>> INFOPATH=/sw/share/info:/sw/info:/usr/share/info >>>>> >>>>> >>>>> This is the perl script I'm attempting to run: >>>>> #!/sw/bin/perl5.8.8 >>>>> use strict; >>>>> use Bio::Perl; >>>>> $seq_object = get_sequence('swiss',"ROA1_HUMAN"); >>>>> write_sequence(">roa1.fasta",'fasta',$seq_object); >>>>> >>>>> Here is the error output: >>>>> >>>>> dyld: lazy symbol binding failed: Symbol not found: >>>>> _Perl_Tstack_sp_ptr >>>>> Referenced from: >>>>> /sw/lib/perl5/5.8.8/darwin-thread-multi-2level/auto/IO/IO.bundle >>>>> Expected in: dynamic lookup >>>>> >>>>> dyld: Symbol not found: _Perl_Tstack_sp_ptr >>>>> Referenced from: >>>>> /sw/lib/perl5/5.8.8/darwin-thread-multi-2level/auto/IO/IO.bundle >>>>> Expected in: dynamic lookup >>>>> >>>>> Trace/BPT trap >>>>> >>>>> I have looked through many forum postings and attempted the solutions >>>>> offered in those instances, but none seem to work in my case. I'm not >>>>> sure >>>>> if it's because I have perl 5.10.0 installed while attempting to call >>>>> bioperl 5.8.8; however, others seem to have it working just fine. >>>>> >>>>> Thank you, Ryan >>>>> -- >>>>> View this message in context: >>>>> http://old.nabble.com/Problems-with-bioperl-in-Mac-OS-X-10.6-%28Perl-5.10.0%29-tp26366421p26366421.html >>>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>> >>> -- >>> View this message in context: >>> http://old.nabble.com/Problems-with-bioperl-in-Mac-OS-X-10.6-%28Perl-5.10.0%29-tp26366421p26372079.html >>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://old.nabble.com/Problems-with-bioperl-in-Mac-OS-X-10.6-%28Perl-5.10.0%29-tp26366421p26375418.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From krishna.aneesh at gmail.com Mon Nov 16 02:00:15 2009 From: krishna.aneesh at gmail.com (Aneesh K) Date: Mon, 16 Nov 2009 12:30:15 +0530 Subject: [Bioperl-l] Regarding Bio::TreeIO Object Message-ID: <9cb9dfd70911152300y34789f88qc69dd14bf505f57d@mail.gmail.com> Hi, I just started to use Bioperl modules. It's really useful and interesting. Now I have in stuck with "Tree objects and phylogenetic trees". I couldn't get any documentation/examples about reading/parsing phylip tree files. Please tell me from where I can get some sample codes for this. Waiting for your reply. Thanks Aneesh.K Mob. 09646181517 From David.Messina at sbc.su.se Mon Nov 16 12:33:36 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 16 Nov 2009 18:33:36 +0100 Subject: [Bioperl-l] highest PAML version supported? In-Reply-To: References: <628aabb70911121120w4c609056v50204b9bd9e5c3fb@mail.gmail.com> Message-ID: Hi everyone, I just committed support for parsing codeml 4.3a (August 2009) to bioperl-live. I added new tests and all PAML-related tests pass, but please report any problems you have to the list. Note that I haven't tested the other PAML 4.3a executables to see if there are format changes with those. If you get the chance to try any and it doesn't work, let me know and I'll try to add support for them. (Note that these changes are only to the PAML parsing code; Bio::Tools::Run already appears to handle 4.3a just fine.) Dave From jason at bioperl.org Mon Nov 16 12:34:57 2009 From: jason at bioperl.org (Jason Stajich) Date: Mon, 16 Nov 2009 09:34:57 -0800 Subject: [Bioperl-l] Regarding Bio::TreeIO Object In-Reply-To: <9cb9dfd70911152300y34789f88qc69dd14bf505f57d@mail.gmail.com> References: <9cb9dfd70911152300y34789f88qc69dd14bf505f57d@mail.gmail.com> Message-ID: Is this at all helpful to your questions. http://www.bioperl.org/wiki/HOWTO:Trees The trees are in 'newick' or new hampshire format though I don't think there is a phylip format for trees. -jason On Nov 15, 2009, at 11:00 PM, Aneesh K wrote: > Hi, > > I just started to use Bioperl modules. It's really useful and > interesting. > Now I have in stuck with "Tree objects and phylogenetic trees". > I couldn't get any documentation/examples about reading/parsing > phylip tree > files. > > Please tell me from where I can get some sample codes for this. > > Waiting for your reply. > > Thanks > Aneesh.K > Mob. 09646181517 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From roy.chaudhuri at gmail.com Mon Nov 16 12:31:49 2009 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Mon, 16 Nov 2009 17:31:49 +0000 Subject: [Bioperl-l] Regarding Bio::TreeIO Object In-Reply-To: <9cb9dfd70911152300y34789f88qc69dd14bf505f57d@mail.gmail.com> References: <9cb9dfd70911152300y34789f88qc69dd14bf505f57d@mail.gmail.com> Message-ID: <4B018C85.6020801@gmail.com> Hi Aneesh, See the Bioperl trees howto: http://www.bioperl.org/wiki/HOWTO:Trees Roy. Aneesh K wrote: > Hi, > > I just started to use Bioperl modules. It's really useful and interesting. > Now I have in stuck with "Tree objects and phylogenetic trees". > I couldn't get any documentation/examples about reading/parsing phylip tree > files. > > Please tell me from where I can get some sample codes for this. > > Waiting for your reply. > > Thanks > Aneesh.K > Mob. 09646181517 -- Dr. Roy Chaudhuri Department of Veterinary Medicine University of Cambridge, U.K. From Kevin.M.Brown at asu.edu Mon Nov 16 13:22:07 2009 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Mon, 16 Nov 2009 11:22:07 -0700 Subject: [Bioperl-l] FW: Bio::Graphics::Panel question Message-ID: <1A4207F8295607498283FE9E93B775B40663EE37@EX02.asurite.ad.asu.edu> Please keep your responses on the list for more timely help. Kevin Brown Center for Innovations in Medicine Biodesign Institute Arizona State University ________________________________ From: Xiaoyu Liang [mailto:veronica.xiaoyu at gmail.com] Sent: Monday, November 16, 2009 9:34 AM To: Kevin Brown Subject: Re: [Bioperl-l] Bio::Graphics::Panel question Hi Kevin, Thank you for ur quick response. I attached the BLAST .out file here. And the follow is my code part. I have an array keeping the color for each hit, and I printed it out the array, there is no missing. my $track = $panel->add_track( -glyph => 'graded_segments', -label => 1, -connector => 'dashed', -font2color => 'red', -sort_order => 'high_score', -description => sub { $feature = shift; #print "--".$feature."\n"; return unless $feature->has_tag('description'); my ($description) = $feature->each_tag_value('description'); my ($id) = $feature->display_name; my @records= split(/\|/,$description); my $score = $feature->score; #print $id.":".$score."\n"; if($score >=200){ push (@color_array,1); }elsif($score >=80){ push (@color_array,2); }elsif($score >=50){ push (@color_array,3); }elsif($score >= 40){ push (@color_array,4); }else{ push (@color_array,5); } if($type == 1){ "Species:Arabidopsis TF Family:$records[1] Score=$score"; }elsif($type == 2){ if(scalar(@records)==5){ "Species:$records[1] TF Family:$records[2] Accepted Name:$records[3] Score=$score"; }else{ "Species:$records[1] TF Family:$records[2] Score=$score"; } }else{ ""; } }, -bgcolor => sub{ return unless $feature->has_tag('description'); if($color_array[$index] == 1 ){ $color = 'red'; } if($color_array[$index]== 2){ $color = 'orange'; } if($color_array[$index]== 3){ $color = 'green'; } if($color_array[$index]== 4){ $color = 'blue'; } if($color_array[$index]== 5){ $color = 'black'; } #if ($index == 20){ # $color = 'black'; #} #print $index."--".$color_array[$index]."\n"; $index++; #print $feature."\n"; #print $feature->display_name."\n"; return $color; }, ); Best regrads, Xiaoyu On Mon, Nov 16, 2009 at 10:49 AM, Kevin Brown wrote: To really be able to tell if this was a bug, I (and probably the real devs) would need to see that part of your code and the Blast file that is having this issue as it could be your callback for color choice vs the blast object (e.g. your color picker is missing an option that the data comes in with and so returns with a blank value). -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Xiaoyu Liang Sent: Friday, November 13, 2009 1:36 PM To: Bioperl-l at lists.open-bio.org Subject: [Bioperl-l] Bio::Graphics::Panel question Hi, I'm using Bio::Graphics to parse the blast result and generate images. But, sometimes, in the middle of the output image, the hit's color is white, eventhough I set it to other colors. I attached the picture here for an example. This doesn't occur all the time, usually, it works well. I'm wondering if I did something wrong? or depends on the blast result? Thank you, Xiaoyu _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l -------------- next part -------------- A non-text attachment was scrubbed... Name: 1258388779.out Type: application/octet-stream Size: 32599 bytes Desc: 1258388779.out URL: From paolo.pavan at gmail.com Mon Nov 16 14:06:06 2009 From: paolo.pavan at gmail.com (Paolo Pavan) Date: Mon, 16 Nov 2009 20:06:06 +0100 Subject: [Bioperl-l] bioperl-ext installation issue Message-ID: <56be91b60911161106w69e20fd9k133a465e8d4f8a3f@mail.gmail.com> Hi everybody, I have problems installing the bioperl-ext package, any help is much appreciated. 1) - I start trying with cpan i /bioperl-ext/ the only resource available is /B/BI/BIRNEY/bioperl-ext-1.4 (is it ok?) - I install Inline::MakeMaker and Inline::C then - i/BIRNEY/bioperl-ext-1.4/ fails bacause I don't have staden package 2) I try to install io_lib-1.8.10.tar as suggested by the README ( ftp://ftp.mrc-lmb.cam.ac.uk/pub/staden/io_lib/), installation fails after: ... gcc -g -O2 -o makeSCF makeSCF.o ../read/.libs/libread.a -lz -lm ../read/.libs/libread.a(compress.o): In function `fopen_compressed': /root/Download/staden/io_lib-1.8.10/utils/compress.c:321: warning: the use of `tempnam' is dangerous, better use `mkstemp' gcc -DHAVE_CONFIG_H -I. -I. -I.. -I.. -I../read -I../alf -I../abi -I../ctf -I../ztr -I../plain -I../scf -I../exp_file -I../utils -I/usr/local/include -g -O2 -c -o extract_seq.o `test -f extract_seq.c || echo './'`extract_seq.c /bin/sh ../libtool --mode=link gcc -g -O2 -o extract_seq extract_seq.o ../read/libread.la gcc -g -O2 -o extract_seq extract_seq.o ../read/.libs/libread.a -lz -lm ../read/.libs/libread.a(compress.o): In function `fopen_compressed': /root/Download/staden/io_lib-1.8.10/utils/compress.c:321: warning: the use of `tempnam' is dangerous, better use `mkstemp' gcc -DHAVE_CONFIG_H -I. -I. -I.. -I.. -I../read -I../alf -I../abi -I../ctf -I../ztr -I../plain -I../scf -I../exp_file -I../utils -I/usr/local/include -g -O2 -c -o index_tar.o `test -f index_tar.c || echo './'`index_tar.c index_tar.c: In function ?main?: index_tar.c:12: error: two or more data types in declaration specifiers make[2]: *** [index_tar.o] Error 1 make[2]: Leaving directory `/home/root/Download/staden/io_lib-1.8.10/progs' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/home/root/Download/staden/io_lib-1.8.10' make: *** [all-recursive-am] Error 2 3) I give up staden, because I actually need pSW, and try to install from Makefile.PL in Bio/Ext/Align but installation fails after: ... Align.xs:18: warning: ?not_here? defined but not used Running Mkbootstrap for Bio::Ext::Align () chmod 644 Align.bs rm -f ../blib/arch/auto/Bio/Ext/Align/Align.so gcc -shared -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic Align.o -o ../blib/arch/auto/Bio/Ext/Align/Align.so libs/libsw.a \ -lm \ /usr/bin/ld: libs/libsw.a(aln.o): relocation R_X86_64_32 against `a local symbol' can not be used when making a shared object; recompile with -fPIC libs/libsw.a: could not read symbols: Bad value collect2: ld returned 1 exit status make[1]: *** [../blib/arch/auto/Bio/Ext/Align/Align.so] Error 1 make[1]: Leaving directory `/home/root/.cpan/sources/authors/id/B/BI/BIRNEY/bioperl-ext-1.4/Bio/Ext/Align' make: *** [subdirs] Error 2 I have also made some other tries such force install Bio::Ext:Align without success but I'm sure I miss something trivial that I can't catch. Can someone help me? Thank you, Paolo From lincoln.stein at gmail.com Mon Nov 16 15:08:20 2009 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Mon, 16 Nov 2009 15:08:20 -0500 Subject: [Bioperl-l] FW: Bio::Graphics::Panel question In-Reply-To: <1A4207F8295607498283FE9E93B775B40663EE37@EX02.asurite.ad.asu.edu> References: <1A4207F8295607498283FE9E93B775B40663EE37@EX02.asurite.ad.asu.edu> Message-ID: <6dce9a0b0911161208q2f826d83s319184f0cacca097@mail.gmail.com> Hi, I think you should modify your color selection code as follows: if($color_array[$index] == 1 ){ $color = 'red'; } elsif($color_array[$index]== 2){ $color = 'orange'; } elsif($color_array[$index]== 3){ $color = 'green'; } elsif($color_array[$index]== 4){ $color = 'blue'; } elsif($color_array[$index]== 5){ $color = 'black'; } else { die "unexpected color array value $color_array[$index]" } Lincoln On Mon, Nov 16, 2009 at 1:22 PM, Kevin Brown wrote: > Please keep your responses on the list for more timely help. > > > Kevin Brown > Center for Innovations in Medicine > Biodesign Institute > Arizona State University > > > > ________________________________ > > From: Xiaoyu Liang [mailto:veronica.xiaoyu at gmail.com] > Sent: Monday, November 16, 2009 9:34 AM > To: Kevin Brown > Subject: Re: [Bioperl-l] Bio::Graphics::Panel question > > > Hi Kevin, > > Thank you for ur quick response. I attached the BLAST .out file here. > And the follow is my code part. I have an array keeping the color for > each hit, and I printed it out the array, there is no missing. > > my $track = $panel->add_track( > -glyph => 'graded_segments', > -label => 1, > -connector => 'dashed', > -font2color => 'red', > -sort_order => 'high_score', > -description => sub { > $feature = shift; > #print "--".$feature."\n"; > return unless > $feature->has_tag('description'); > my ($description) = > $feature->each_tag_value('description'); > my ($id) = $feature->display_name; > my @records= split(/\|/,$description); > my $score = $feature->score; > #print $id.":".$score."\n"; > if($score >=200){ > push (@color_array,1); > }elsif($score >=80){ > push (@color_array,2); > }elsif($score >=50){ > push (@color_array,3); > }elsif($score >= 40){ > push (@color_array,4); > }else{ > push (@color_array,5); > } > > if($type == 1){ > "Species:Arabidopsis TF > Family:$records[1] Score=$score"; > }elsif($type == 2){ > if(scalar(@records)==5){ > "Species:$records[1] TF > Family:$records[2] Accepted Name:$records[3] Score=$score"; > }else{ > "Species:$records[1] TF > Family:$records[2] Score=$score"; > } > }else{ > ""; > } > }, > -bgcolor => sub{ > return unless > $feature->has_tag('description'); > if($color_array[$index] == 1 ){ > $color = 'red'; > } > if($color_array[$index]== 2){ > $color = 'orange'; > } > if($color_array[$index]== 3){ > $color = 'green'; > } > if($color_array[$index]== 4){ > $color = 'blue'; > } > if($color_array[$index]== 5){ > $color = 'black'; > } > #if ($index == 20){ > # $color = 'black'; > #} > #print > $index."--".$color_array[$index]."\n"; > $index++; > > #print $feature."\n"; > #print > $feature->display_name."\n"; > return $color; > }, > ); > > > Best regrads, > Xiaoyu > > > On Mon, Nov 16, 2009 at 10:49 AM, Kevin Brown > wrote: > > > To really be able to tell if this was a bug, I (and probably the > real > devs) would need to see that part of your code and the Blast > file that > is having this issue as it could be your callback for color > choice vs > the blast object (e.g. your color picker is missing an option > that the > data comes in with and so returns with a blank value). > > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Xiaoyu Liang > Sent: Friday, November 13, 2009 1:36 PM > To: Bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Bio::Graphics::Panel question > > Hi, > > I'm using Bio::Graphics to parse the blast result and generate > images. > But, sometimes, in the middle of the output image, the hit's > color is > white, eventhough I set it to other colors. I attached the > picture here > for an example. This doesn't occur all the time, usually, it > works well. > I'm wondering if I did something wrong? or depends on the blast > result? > > Thank you, > Xiaoyu > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa From ryan_bogard at hms.harvard.edu Mon Nov 16 16:44:25 2009 From: ryan_bogard at hms.harvard.edu (rbogard) Date: Mon, 16 Nov 2009 13:44:25 -0800 (PST) Subject: [Bioperl-l] Problems with bioperl in Mac OS X 10.6 (Perl 5.10.0) In-Reply-To: <26366421.post@talk.nabble.com> References: <26366421.post@talk.nabble.com> Message-ID: <26379710.post@talk.nabble.com> Thank you all for your help! I was able to get bioperl working via manual download and install. It was a combination of permissions issues and X86_64 vs. X86_32 compatibility issues. Using fink to download and install seems to have given me a combination of 32 and 64 associated files (I probably did something wrong in config). rbogard wrote: > > In advance, any advice would be grealy appreciated! I have installed > bioperl-588pm via fink but I am having difficulties calling the modules in > script. The following is added to .profile (bash): > PERL5LIB=/sw/lib/perl5/5.8.8:$PERL5LIB > > If I change this to /sw/lib/perl5 then I get an @INC error, as use > Bio::PERL cannot be located. > > The environment variables are as follows: > > MANPATH=/sw/share/man:/usr/share/man:/usr/X11/man:/sw/lib/perl5/5.10.0/man:/usr/X11R6/man:/sw/lib/perl5-core/5.8.8/man:/sw/lib/perl5/5.8.8/man > PERL5LIB=/sw/lib/perl5/5.8.8:/sw/lib/perl5:/sw/lib/perl5/darwin:/sw/lib/perl5/5.8.8 > PATH=/sw/bin:/sw/sbin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin:/usr/X11R6/bin > INFOPATH=/sw/share/info:/sw/info:/usr/share/info > > > This is the perl script I'm attempting to run: > #!/sw/bin/perl5.8.8 > use strict; > use Bio::Perl; > $seq_object = get_sequence('swiss',"ROA1_HUMAN"); > write_sequence(">roa1.fasta",'fasta',$seq_object); > > Here is the error output: > > dyld: lazy symbol binding failed: Symbol not found: _Perl_Tstack_sp_ptr > Referenced from: > /sw/lib/perl5/5.8.8/darwin-thread-multi-2level/auto/IO/IO.bundle > Expected in: dynamic lookup > > dyld: Symbol not found: _Perl_Tstack_sp_ptr > Referenced from: > /sw/lib/perl5/5.8.8/darwin-thread-multi-2level/auto/IO/IO.bundle > Expected in: dynamic lookup > > Trace/BPT trap > > I have looked through many forum postings and attempted the solutions > offered in those instances, but none seem to work in my case. I'm not sure > if it's because I have perl 5.10.0 installed while attempting to call > bioperl 5.8.8; however, others seem to have it working just fine. > > Thank you, Ryan > -- View this message in context: http://old.nabble.com/Problems-with-bioperl-in-Mac-OS-X-10.6-%28Perl-5.10.0%29-tp26366421p26379710.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From jay at jays.net Mon Nov 16 17:02:10 2009 From: jay at jays.net (Jay Hannah) Date: Mon, 16 Nov 2009 16:02:10 -0600 Subject: [Bioperl-l] Bio::Index::GenBank - by organism? In-Reply-To: <2BA451B1-6E18-483E-B655-74D1146772CC@bioperl.org> References: <3B01A09C-198E-4691-B807-7ED3250BB81A@jays.net> <12DFD22E-42DC-4626-9873-0DE3EBB5CFBD@illinois.edu> <2BA451B1-6E18-483E-B655-74D1146772CC@bioperl.org> Message-ID: <60ADD3A9-D38B-4A39-A5CE-C8118DEC1242@jays.net> On Nov 10, 2009, at 12:50 PM, Jason Stajich wrote: > You might also look at what mygenbank does: > http://homepage.mac.com/iankorf/mygenbank.html It appears, perhaps, that BioSQL can provide *foo* searching like so: http://www.biosql.org/wiki/Schema_Overview#TAXON.2C_TAXON_NAME SELECT DISTINCT include.ncbi_taxon_id FROM taxon INNER JOIN taxon AS include ON (include.left_value BETWEEN taxon.left_value AND taxon.right_value) WHERE taxon.taxon_id IN (SELECT taxon_id FROM taxon_name WHERE name LIKE '%fungi%') So I think we're going to chase that for a while. I didn't see a *foo* search in MyGenBank? Thanks, j http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah From roy.chaudhuri at gmail.com Tue Nov 17 06:24:07 2009 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Tue, 17 Nov 2009 11:24:07 +0000 Subject: [Bioperl-l] Regarding Bio::TreeIO Object In-Reply-To: <9cb9dfd70911162117nfac0e52gea3d638e34337b16@mail.gmail.com> References: <9cb9dfd70911152300y34789f88qc69dd14bf505f57d@mail.gmail.com> <4B018C85.6020801@gmail.com> <9cb9dfd70911162117nfac0e52gea3d638e34337b16@mail.gmail.com> Message-ID: <4B0287D7.5050702@gmail.com> Hi Aneesh, Please keep your replies on the mailing list, that way someone else can respond, which would be particularly useful in this case since I know nothing about MapIO. Roy. Aneesh K wrote: > Thanks for your reply. > > I would like to know about "Genetic Maps" also. I would like to > use MapIO object. > But I'm not aware about genetic maps and the mapmaker format. > > Please tell me from where I can get some examples for mapmaker format > and some example scripts to use MapIO object. > > Hoping your reply. > > Aneesh.K > Mob. 09646181517 > > > > On Mon, Nov 16, 2009 at 11:01 PM, Roy Chaudhuri > wrote: > > Hi Aneesh, > > See the Bioperl trees howto: > http://www.bioperl.org/wiki/HOWTO:Trees > > Roy. > > > Aneesh K wrote: > > Hi, > > I just started to use Bioperl modules. It's really useful and > interesting. > Now I have in stuck with "Tree objects and phylogenetic trees". > I couldn't get any documentation/examples about reading/parsing > phylip tree > files. > > Please tell me from where I can get some sample codes for this. > > Waiting for your reply. > > Thanks > Aneesh.K > Mob. 09646181517 > > > > -- > Dr. Roy Chaudhuri > Department of Veterinary Medicine > University of Cambridge, U.K. > > From maj at fortinbras.us Tue Nov 17 07:50:06 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 17 Nov 2009 07:50:06 -0500 Subject: [Bioperl-l] Regarding Bio::TreeIO Object In-Reply-To: <4B0287D7.5050702@gmail.com> References: <9cb9dfd70911152300y34789f88qc69dd14bf505f57d@mail.gmail.com><4B018C85.6020801@gmail.com><9cb9dfd70911162117nfac0e52gea3d638e34337b16@mail.gmail.com> <4B0287D7.5050702@gmail.com> Message-ID: <394F62D51F15405BBCF8BB50DA0FF336@NewLife> Aneesh, Have a look in the t/Map directory of the BioPerl distribution. These are test scripts that are also examples of usage. The t/data directory will contain the datafiles that the tests use; these will provide example data. cheers Mark ----- Original Message ----- From: "Roy Chaudhuri" To: "Aneesh K" ; Sent: Tuesday, November 17, 2009 6:24 AM Subject: Re: [Bioperl-l] Regarding Bio::TreeIO Object > Hi Aneesh, > > Please keep your replies on the mailing list, that way someone else can > respond, which would be particularly useful in this case since I know > nothing about MapIO. > > Roy. > > Aneesh K wrote: >> Thanks for your reply. >> >> I would like to know about "Genetic Maps" also. I would like to >> use MapIO object. >> But I'm not aware about genetic maps and the mapmaker format. >> >> Please tell me from where I can get some examples for mapmaker format >> and some example scripts to use MapIO object. >> >> Hoping your reply. >> >> Aneesh.K >> Mob. 09646181517 >> >> >> >> On Mon, Nov 16, 2009 at 11:01 PM, Roy Chaudhuri > > wrote: >> >> Hi Aneesh, >> >> See the Bioperl trees howto: >> http://www.bioperl.org/wiki/HOWTO:Trees >> >> Roy. >> >> >> Aneesh K wrote: >> >> Hi, >> >> I just started to use Bioperl modules. It's really useful and >> interesting. >> Now I have in stuck with "Tree objects and phylogenetic trees". >> I couldn't get any documentation/examples about reading/parsing >> phylip tree >> files. >> >> Please tell me from where I can get some sample codes for this. >> >> Waiting for your reply. >> >> Thanks >> Aneesh.K >> Mob. 09646181517 >> >> >> >> -- >> Dr. Roy Chaudhuri >> Department of Veterinary Medicine >> University of Cambridge, U.K. >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From veronica.xiaoyu at gmail.com Wed Nov 18 12:18:33 2009 From: veronica.xiaoyu at gmail.com (Xiaoyu Liang) Date: Wed, 18 Nov 2009 12:18:33 -0500 Subject: [Bioperl-l] how to visualize multiple sequences alignments Message-ID: Hi, I'm wondering Is there any modules that can be used for visualizing multiple sequences alignments? like the result from ClustalW? Thank you very much, Xiaoyu From jason at bioperl.org Wed Nov 18 13:23:05 2009 From: jason at bioperl.org (Jason Stajich) Date: Wed, 18 Nov 2009 10:23:05 -0800 Subject: [Bioperl-l] how to visualize multiple sequences alignments In-Reply-To: References: Message-ID: try jalview http://www.jalview.org/ On Nov 18, 2009, at 9:18 AM, Xiaoyu Liang wrote: > Hi, > > I'm wondering Is there any modules that can be used for visualizing > multiple > sequences alignments? like the result from ClustalW? > > Thank you very much, > Xiaoyu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From andrew.j.grimm at gmail.com Wed Nov 18 21:52:31 2009 From: andrew.j.grimm at gmail.com (Andrew Grimm) Date: Thu, 19 Nov 2009 13:52:31 +1100 Subject: [Bioperl-l] DANGER: hacking of bioperl wiki? Message-ID: Caution: read the whole email before visiting the bioperl wiki I was doing some bioinformatics-related searching using google, and one of the hits was to the bio dot perl dot org wiki (the FAQ in particular). When I did that, I was redirected to a ferdax dot com web site (a typo-squatting of fedex?). Some people reckon that ferdax hacks web sites and redirects google hits from the victim web site to their own web site. For example, this thread at google's webmaster central http://www.google.com/support/forum/p/Webmasters/thread?tid=37a36c0d1ea99819&hl=en#all (it's talking about zencart, but presumably they've since found other victims) Just going to the website without using google may not trigger the redirect. Apologies if this is a false alarm, but I don't think it is. I won't be in contact between Friday and Monday Australian time (I'll be at railscamp 6 in Melbourne), so I won't be able to answer any replies. Thanks, Andrew Grimm From maj at fortinbras.us Wed Nov 18 22:14:44 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 18 Nov 2009 22:14:44 -0500 Subject: [Bioperl-l] DANGER: hacking of bioperl wiki? In-Reply-To: References: Message-ID: <7761C2223DB54DE6B836F302D2FF6AC0@NewLife> Andrew-- thanks!! We're on it. MAJ ----- Original Message ----- From: "Andrew Grimm" To: Sent: Wednesday, November 18, 2009 9:52 PM Subject: [Bioperl-l] DANGER: hacking of bioperl wiki? > Caution: read the whole email before visiting the bioperl wiki > > I was doing some bioinformatics-related searching using google, and > one of the hits was to the bio dot perl dot org wiki (the FAQ in > particular). > > When I did that, I was redirected to a ferdax dot com web site (a > typo-squatting of fedex?). > > Some people reckon that ferdax hacks web sites and redirects google > hits from the victim web site to their own web site. For example, this > thread at google's webmaster central > http://www.google.com/support/forum/p/Webmasters/thread?tid=37a36c0d1ea99819&hl=en#all > (it's talking about zencart, but presumably they've since found other > victims) > > Just going to the website without using google may not trigger the redirect. > > Apologies if this is a false alarm, but I don't think it is. > > I won't be in contact between Friday and Monday Australian time (I'll > be at railscamp 6 in Melbourne), so I won't be able to answer any > replies. > > Thanks, > > Andrew Grimm > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From sandipan.chowdhury at physiology.wisc.edu Thu Nov 19 01:49:45 2009 From: sandipan.chowdhury at physiology.wisc.edu (Sandipan Chowdhury) Date: Thu, 19 Nov 2009 00:49:45 -0600 Subject: [Bioperl-l] accessing EMBL database Message-ID: <475F74057A618245A773CD325E105D1E033334AC@phy-srv01.physiol.physiology.wisc.edu> Hi, I have 3 questions all related to the retreival of sequences from online databases. (1) I have been trying to download a protein sequence from the EMBL database and trying to write the sequence into a text file, as a string. I am using the following code: use Bio::DB::EMBL; open b,">","s.txt"; $em_obj = Bio::DB::EMBL->new; $seq_obj = $em_obj->get_Seq_by_acc("CAB95729"); $s_str = $seq_obj->seq; print b "$s_str\n"; close b; The script is not working and gives the messege: "MSG: EMBL stream with no ID. Not embl in my book STACK: Error::throw STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm: 368 STACK: Bio::SeqIO::embl::next_seq C:/Perl/site/lib/Bio\SeqIO\embl.pm: 203 STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc C:/Perl/site/lib/Bio/DB/WebDBSeqI.pm: 194 STACK: trial2.pl" I am not sure what this means. A similar version of the script works for the Swissprot, GenBank and RefSeq databases but not for the EMBL. What is the way around this so that I can download the embl sequence? (2) Also, is there anyway I can download sequences from DDBJ (database of Japan)? (3) Can GI numbers be used to retreive the sequences? If so then how? Answers to these questions would be greatly appreciated. I am very new to Perl/Bioperl and am not really familiar with the advanced programming features, so I would need to your help to find my way out of this situation. Many Thanks Sandipan From maj at fortinbras.us Thu Nov 19 08:10:07 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 19 Nov 2009 08:10:07 -0500 Subject: [Bioperl-l] accessing EMBL database In-Reply-To: <475F74057A618245A773CD325E105D1E033334AC@phy-srv01.physiol.physiology.wisc.edu> References: <475F74057A618245A773CD325E105D1E033334AC@phy-srv01.physiol.physiology.wisc.edu> Message-ID: Sandipan-- That id (CAB95729) returns "No entries" from EMBL. I would agree that the error message is not really informative. The module documentation warns: # remember that EMBL_ID does not equal GenBank_ID! so I would check that. MAJ ----- Original Message ----- From: "Sandipan Chowdhury" To: Sent: Thursday, November 19, 2009 1:49 AM Subject: [Bioperl-l] accessing EMBL database > Hi, > > I have 3 questions all related to the retreival of sequences from online > databases. > > (1) I have been trying to download a protein sequence from the EMBL database > and trying to write the sequence into a text file, as a string. I am using the > following code: > > use Bio::DB::EMBL; > open b,">","s.txt"; > $em_obj = Bio::DB::EMBL->new; > $seq_obj = $em_obj->get_Seq_by_acc("CAB95729"); > $s_str = $seq_obj->seq; > print b "$s_str\n"; > close b; > > The script is not working and gives the messege: > "MSG: EMBL stream with no ID. Not embl in my book > STACK: Error::throw > STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm: 368 > STACK: Bio::SeqIO::embl::next_seq C:/Perl/site/lib/Bio\SeqIO\embl.pm: 203 > STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc > C:/Perl/site/lib/Bio/DB/WebDBSeqI.pm: 194 > STACK: trial2.pl" > > I am not sure what this means. A similar version of the script works for the > Swissprot, GenBank and RefSeq databases but not for the EMBL. What is the way > around this so that I can download the embl sequence? > > (2) Also, is there anyway I can download sequences from DDBJ (database of > Japan)? > > (3) Can GI numbers be used to retreive the sequences? If so then how? > > Answers to these questions would be greatly appreciated. I am very new to > Perl/Bioperl and am not really familiar with the advanced programming > features, so I would need to your help to find my way out of this situation. > > Many Thanks > Sandipan > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From hrh at fmi.ch Thu Nov 19 08:23:29 2009 From: hrh at fmi.ch (Hotz, Hans-Rudolf) Date: Thu, 19 Nov 2009 14:23:29 +0100 Subject: [Bioperl-l] accessing EMBL database In-Reply-To: <475F74057A618245A773CD325E105D1E033334AC@phy-srv01.physiol.physiology.wisc.edu> Message-ID: Sandipan > I have 3 questions all related to the retreival of sequences from online > databases. > > (1) I have been trying to download a protein sequence from the EMBL database > and trying to write the sequence into a text file, as a string. I am using the > following code: > > use Bio::DB::EMBL; > open b,">","s.txt"; > $em_obj = Bio::DB::EMBL->new; > $seq_obj = $em_obj->get_Seq_by_acc("CAB95729"); > $s_str = $seq_obj->seq; > print b "$s_str\n"; > close b; > > The script is not working and gives the messege: > "MSG: EMBL stream with no ID. Not embl in my book > STACK: Error::throw > STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm: 368 > STACK: Bio::SeqIO::embl::next_seq C:/Perl/site/lib/Bio\SeqIO\embl.pm: 203 > STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc > C:/Perl/site/lib/Bio/DB/WebDBSeqI.pm: 194 > STACK: trial2.pl" > > I am not sure what this means. A similar version of the script works for the > Swissprot, GenBank and RefSeq databases but not for the EMBL. What is the way > around this so that I can download the embl sequence? "CAB95729" is a protein sequence, ie a translation of the CDS of 'AJ277028.1'. As far as I know, Bio::DB::EMBL is only designed to get EMBL entries, ie the nucleotides sequence > (2) Also, is there anyway I can download sequences from DDBJ (database of > Japan)? Unless, for network/speed reason, why do you want to download data from DDBJ? It contains the same data as GenBank and EMBL. Those three databases exchange their data on a daily basis. > (3) Can GI numbers be used to retreive the sequences? If so then how? Have you looked at Bio::DB::Eutilities ? See the 'HOWTOs' page in the Bioperl Wiki Regards, Hans > Answers to these questions would be greatly appreciated. I am very new to > Perl/Bioperl and am not really familiar with the advanced programming > features, so I would need to your help to find my way out of this situation. > > Many Thanks > Sandipan > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu Nov 19 08:47:16 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 19 Nov 2009 07:47:16 -0600 Subject: [Bioperl-l] accessing EMBL database In-Reply-To: References: Message-ID: <95D416ED-7630-40A1-ABA5-A3C3525D25B1@illinois.edu> On Nov 19, 2009, at 7:23 AM, Hotz, Hans-Rudolf wrote: > > Sandipan > > >> I have 3 questions all related to the retreival of sequences from online >> databases. >> >> (1) I have been trying to download a protein sequence from the EMBL database >> and trying to write the sequence into a text file, as a string. I am using the >> following code: >> >> use Bio::DB::EMBL; >> open b,">","s.txt"; >> $em_obj = Bio::DB::EMBL->new; >> $seq_obj = $em_obj->get_Seq_by_acc("CAB95729"); >> $s_str = $seq_obj->seq; >> print b "$s_str\n"; >> close b; >> >> The script is not working and gives the messege: >> "MSG: EMBL stream with no ID. Not embl in my book >> STACK: Error::throw >> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm: 368 >> STACK: Bio::SeqIO::embl::next_seq C:/Perl/site/lib/Bio\SeqIO\embl.pm: 203 >> STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc >> C:/Perl/site/lib/Bio/DB/WebDBSeqI.pm: 194 >> STACK: trial2.pl" >> >> I am not sure what this means. A similar version of the script works for the >> Swissprot, GenBank and RefSeq databases but not for the EMBL. What is the way >> around this so that I can download the embl sequence? > > "CAB95729" is a protein sequence, ie a translation of the CDS of > 'AJ277028.1'. > > As far as I know, Bio::DB::EMBL is only designed to get EMBL entries, ie the > nucleotides sequence > > > >> (2) Also, is there anyway I can download sequences from DDBJ (database of >> Japan)? > > Unless, for network/speed reason, why do you want to download data from > DDBJ? It contains the same data as GenBank and EMBL. Those three databases > exchange their data on a daily basis. > >> (3) Can GI numbers be used to retreive the sequences? If so then how? > > Have you looked at Bio::DB::Eutilities ? See the 'HOWTOs' page in the > Bioperl Wiki > > > > Regards, Hans > > > >> Answers to these questions would be greatly appreciated. I am very new to >> Perl/Bioperl and am not really familiar with the advanced programming >> features, so I would need to your help to find my way out of this situation. >> >> Many Thanks >> Sandipan To add to that, if you want the protein sequences as a Bio::Seq you can use Bio::DB::GenPept (Bio::DB::EUtilities will retrieve raw data only). chris From David.Messina at sbc.su.se Thu Nov 19 09:04:55 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 19 Nov 2009 15:04:55 +0100 Subject: [Bioperl-l] accessing EMBL database In-Reply-To: References: <475F74057A618245A773CD325E105D1E033334AC@phy-srv01.physiol.physiology.wisc.edu> Message-ID: > I would agree that the error message is not really informative. Agreed that it could be better, but I wonder whether part of the problem with BioPerl error messages is the stack dump. I think a lot of eyes just glaze right over when they see a big wad of complicated stuff, with colons and slashes and line numbers, spewing out at them. Perhaps the stack dump should be turned off by default? Wouldn't this: ERROR: EMBL stream with no ID. Not embl in my book Be a lot clearer than this?: MSG: EMBL stream with no ID. Not embl in my book STACK: Error::throw STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm: 368 STACK: Bio::SeqIO::embl::next_seq C:/Perl/site/lib/Bio\SeqIO\embl.pm: 203 STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc C:/Perl/site/lib/Bio/DB/WebDBSeqI.pm: 194 STACK: trial2.pl Just a thought. This has probably been discussed before. Dave From maj at fortinbras.us Thu Nov 19 09:17:05 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 19 Nov 2009 09:17:05 -0500 Subject: [Bioperl-l] accessing EMBL database In-Reply-To: References: <475F74057A618245A773CD325E105D1E033334AC@phy-srv01.physiol.physiology.wisc.edu> Message-ID: I'm inclined to agree. Lots of responses to questions here that begin "Well, as the error message said, you need to check...", which means people tend towards "I broke it! Write the list!". I do find it hairy when my errors are way down in the object tree. ----- Original Message ----- From: "Dave Messina" To: "Mark A. Jensen" Cc: Sent: Thursday, November 19, 2009 9:04 AM Subject: Re: [Bioperl-l] accessing EMBL database > I would agree that the error message is not really informative. Agreed that it could be better, but I wonder whether part of the problem with BioPerl error messages is the stack dump. I think a lot of eyes just glaze right over when they see a big wad of complicated stuff, with colons and slashes and line numbers, spewing out at them. Perhaps the stack dump should be turned off by default? Wouldn't this: ERROR: EMBL stream with no ID. Not embl in my book Be a lot clearer than this?: MSG: EMBL stream with no ID. Not embl in my book STACK: Error::throw STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm: 368 STACK: Bio::SeqIO::embl::next_seq C:/Perl/site/lib/Bio\SeqIO\embl.pm: 203 STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc C:/Perl/site/lib/Bio/DB/WebDBSeqI.pm: 194 STACK: trial2.pl Just a thought. This has probably been discussed before. Dave From rtbio.2009 at gmail.com Thu Nov 19 09:55:27 2009 From: rtbio.2009 at gmail.com (Roopa Raghuveer) Date: Thu, 19 Nov 2009 15:55:27 +0100 Subject: [Bioperl-l] Remote blast Message-ID: Hello everybody, I have a problem. I would like to use remote blast to find sequences matching for an input sequence. Ex:-I would like to search sequences which match Trypanosoma Brucei sequence. I want the output to be only Trypanosoma Brucei sequences matching with my query.When i tried to use remoteblast to nr database,I got sequences from different organisms like E.coli,Pseudomonas etc., Could you please tell me how can this be solved...? My code is as follows. use Bio::Tools::Run::RemoteBlast; use strict; my $prog = 'blastn'; my $db = 'nr'; my $e_val= '1e-10'; my $organism= 'Trypanosoma Brucei'; my @params = ( '-prog' => $prog, '-data' => $db, '-expect' => $e_val, '-readmethod' => 'SearchIO', '-Organism' => $organism ); my $factory = Bio::Tools::Run::RemoteBlast-> new(@params); #change a paramter #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma brucei[ORGN]' #remove a parameter #delete $Bio::Tools::Run::RemoteBlast::HEADER{'FILTER'}; my $v = 1; #$v is just to turn on and off the messages my $str = Bio::SeqIO->new(-file=>'amino.fa' , '-format' => 'fasta' , '-organism' => 'Trypanosoma Brucei' ); while (my $input = $str->next_seq()){ #Blast a sequence against a database: my $r = $factory->submit_blast($input); #my $r = $factory->submit_blast('amino.fa'); print STDERR "waiting..." if( $v > 0 ); while ( my @rids = $factory->each_rid ) { foreach my $rid ( @rids ) { my $rc = $factory->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $factory->remove_rid($rid); } print STDERR "." if ( $v > 0 ); sleep 5; } else { my $result = $rc->next_result(); #save the output my $filename = $result->query_name()."\.out"; $factory->save_output($filename); $factory->remove_rid($rid); print "\nQuery Name: ", $result->query_name(), "\n"; while ( my $hit = $result->next_hit ) { next unless ( $v > 0); print "\thit name is ", $hit->name, "\n"; while( my $hsp = $hit->next_hsp ) { print "\t\tscore is ", $hsp->score, "\n"; } } } } } } My input sequence is >ref|NC_009512.1|:385-1902 GTGTCAGTGGAACTTTGGCAGCAGTGCGTGGAGCTTCTGCGCGATGAACTGCCTGCCCAGCAATTCAACA CCTGGATCCGTCCGCTACAGGTCGAAGCCGAAGGCGACGAGTTGCGCGTCTATGCGCCTAACCGTTTCGT TCTCGATTGGGTCAATGAAAAGTACCTGGGTCGTTTGCTCGAGCTGTTGGGTGAGAACGGTAGCGGCATT GCACCAGCCCTTTCCTTATTAATAGGTAGCCGCCGCAGCTCGGCCCCAAGGGCTGCACCCAACGCGCCGG TCAGCGCTGCCGTTGCGGCTTCGCTGGCGCAGACTCAGGCGCACAAGACGGCCCCGGCAGCAGCGGTTGA ACCCGTTGCCGTGGCCGCGGCCGAGCCTGTATTGGTCGAGACGTCTTCGCGTGACAGCTTTGATGCCATG GCCGAGCCTGCTGCTGCGCCGCCCAGTGGTGGCCGGGCTGAACAGCGCACCGTGCAGGTTGAAGGTGCGC TCAAGCACACCAGTTACCTGAACCGGACCTTTACCTTTGACACCTTCGTCGAAGGTAAGTCGAACCAGCT CGCCCGCGCGGCTGCCTGGCAGGTTGCGGACAACCCTAAGCATGGCTACAACCCACTGTTCCTTTATGGC GGTGTGGGTTTGGGTAAAACCCACCTTATGCATGCTGTGGGTAACCATCTGCTGAAGAAGAATCCGAACG CCAAGGTGGTGTACCTGCATTCGGAGCGCTTCGTCGCGGACATGGTCAAAGCGTTGCAACTCAACGCCAT CAACGAATTCAAGCGCTTCTACCGCTCGGTGGACGCGTTGCTGATCGACGATATCCAGTTCTTCGCTCGC AAAGAGCGCTCGCAAGAAGAGTTTTTCCACACCTTCAACGCCTTGCTTGAGGGTGGCCAGCAGGTAATCC TTACCTCTGACCGCTATCCCAAGGAAATCGAAGGCCTGGAAGAGCGTCTGAAGTCGCGCTTTGGTTGGGG CCTGACGGTGGCTGTCGAGCCGCCAGAGCTGGAGACCCGCGTAGCGATCCTGATGAAGAAGGCCGACCAG GCCAAAGTCGAGCTCCCGCATGACGCAGCCTTTTTCATCGCTCAGCGCATCCGGTCCAACGTCCGTGAGC TGGAAGGTGCACTGAAGCGAGTTATTGCTCACTCGCACTTCATGGGGCGTGACATCACCATCGAGCTGAT TCGTGAATCGCTCAAGGATCTGTTGGCGCTGCAAGACAAACTGGTCAGTGTGGATAACATTCAGCGTACC GTCGCTGAGTACTACAAGATCAAGATCTCCGATCTGTTGTCCAAGCGTCGTTCGCGTTCTGTCGCGCGCC CGCGTCAGGTAGCCATGGCCCTGTCCAAGGAGTTGACCAACCACAGTCTGCCGGAAATCGGCGACATGTT CGGTGGTCGCGACCATACGACCGTGCTGCACGCCTGCCGCAAAATCAATGAACTGAAGGAATCCGACGCG GACATCCGCGAGGACTACAAGAACCTGCTGCGGACGCTGACGACCTGA Please mail me regarding any queries. Regards, Roopa. From cjfields at illinois.edu Thu Nov 19 10:30:34 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 19 Nov 2009 09:30:34 -0600 Subject: [Bioperl-l] verbosity and error stack, was accessing EMBL database In-Reply-To: References: <475F74057A618245A773CD325E105D1E033334AC@phy-srv01.physiol.physiology.wisc.edu> Message-ID: Mark, Dave, This could be based on verbose(). Level w t d st verbose < 0 - + - -/+ verbose 0 + + - -/+ verbose 1 + + + +/+ verbose > 1 +* -> + + +/+ * converts to throw() w = warn t = throw d = debug st = stack trace warn() is set up that way now, you don't get a stack trace unless verbose() is > 0. throw() could be the same; would be a simple fix, really. My only problem with the current state of things is (I think we've delved down this path before) verbosity level is tied to exception strictness as seen above, and they're really two separate concepts, at least to me. Verbosity of 1 or more doesn't necessarily mean I want an elevated level of strictness along with it. For instance, one might want very strict exceptions w/o the noise, or (conversely) lots of debugging output but no warnings. (aside: another small nit, but I haven't exactly liked that the global level of strictness is designated by a env. variable with DEBUG in the name, but that's just me). I've been thinking it would be nice to have simple separate verbose/strict switches (this is the way it's implemented in Biome). This would allow some finer grained control over output: Level d st verbose 0 - - verbose 1 + + Default = BIOPERLDEBUG || 0 # current situation Level w t strict -1 - + strict 0 + + strict 1 +* -> + * converts to throw() Default = BIOPERLSTRICT || 0 We could even allow finer-grained control of verbosity (states which cover all combinations) w/o affecting strictness. chris On Nov 19, 2009, at 8:17 AM, Mark A. Jensen wrote: > I'm inclined to agree. Lots of responses to questions here that begin > "Well, as the error message said, you need to check...", which means > people tend towards "I broke it! Write the list!". I do find it hairy when > my errors are way down in the object tree. > ----- Original Message ----- From: "Dave Messina" > To: "Mark A. Jensen" > Cc: > Sent: Thursday, November 19, 2009 9:04 AM > Subject: Re: [Bioperl-l] accessing EMBL database > > >> I would agree that the error message is not really informative. > > Agreed that it could be better, but I wonder whether part of the problem with BioPerl error messages is the stack dump. > > I think a lot of eyes just glaze right over when they see a big wad of complicated stuff, with colons and slashes and line numbers, spewing out at them. > > Perhaps the stack dump should be turned off by default? > > Wouldn't this: > > ERROR: EMBL stream with no ID. Not embl in my book > > > > Be a lot clearer than this?: > > MSG: EMBL stream with no ID. Not embl in my book > STACK: Error::throw > STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm: 368 > STACK: Bio::SeqIO::embl::next_seq C:/Perl/site/lib/Bio\SeqIO\embl.pm: 203 > STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc C:/Perl/site/lib/Bio/DB/WebDBSeqI.pm: 194 > STACK: trial2.pl > > > > Just a thought. This has probably been discussed before. > Dave > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From roy.chaudhuri at gmail.com Thu Nov 19 11:10:28 2009 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Thu, 19 Nov 2009 16:10:28 +0000 Subject: [Bioperl-l] Remote blast In-Reply-To: References: Message-ID: <4B056DF4.2030502@gmail.com> Hi Roopa, I think that the -Organism parameter that you specify for Bio::Tools::Run::RemoteBlast is ignored - I can't find any reference to it in the documentation: http://search.cpan.org/~cjfields/BioPerl-1.6.1/Bio/Tools/Run/RemoteBlast.pm You have the correct approach in your code - limiting the search to the Entrez query "Trypanosoma brucei[ORGN]", but the line is commented out. If you uncomment the line (and add a semicolon afterwards), the program runs correctly, but no hits are reported below your threshold e-value. If you change the value of $e_val to 10 then some T.brucei hits are reported. Roy. Roopa Raghuveer wrote: > Hello everybody, > > I have a problem. I would like to use remote blast to find sequences > matching for an input sequence. > > Ex:-I would like to search sequences which match Trypanosoma Brucei > sequence. > > I want the output to be only Trypanosoma Brucei sequences matching with my > query.When i tried to use remoteblast to nr database,I got sequences from > different organisms like E.coli,Pseudomonas etc., > > Could you please tell me how can this be solved...? > > My code is as follows. > > use Bio::Tools::Run::RemoteBlast; > use strict; > my $prog = 'blastn'; > my $db = 'nr'; > my $e_val= '1e-10'; > my $organism= 'Trypanosoma Brucei'; > > my @params = ( '-prog' => $prog, > '-data' => $db, > '-expect' => $e_val, > '-readmethod' => 'SearchIO', > '-Organism' => $organism ); > > my $factory = Bio::Tools::Run::RemoteBlast-> > new(@params); > > #change a paramter > #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma > brucei[ORGN]' > > #remove a parameter > #delete $Bio::Tools::Run::RemoteBlast::HEADER{'FILTER'}; > > my $v = 1; > #$v is just to turn on and off the messages > > my $str = Bio::SeqIO->new(-file=>'amino.fa' , '-format' => 'fasta' , > '-organism' => 'Trypanosoma Brucei' ); > > while (my $input = $str->next_seq()){ > #Blast a sequence against a database: > my $r = $factory->submit_blast($input); > #my $r = $factory->submit_blast('amino.fa'); > > print STDERR "waiting..." if( $v > 0 ); > while ( my @rids = $factory->each_rid ) { > foreach my $rid ( @rids ) { > my $rc = $factory->retrieve_blast($rid); > if( !ref($rc) ) { > if( $rc < 0 ) { > $factory->remove_rid($rid); > } > print STDERR "." if ( $v > 0 ); > sleep 5; > } > else { > my $result = $rc->next_result(); > #save the output > my $filename = $result->query_name()."\.out"; > $factory->save_output($filename); > $factory->remove_rid($rid); > print "\nQuery Name: ", $result->query_name(), "\n"; > while ( my $hit = $result->next_hit ) { > next unless ( $v > 0); > print "\thit name is ", $hit->name, "\n"; > while( my $hsp = $hit->next_hsp ) { > print "\t\tscore is ", $hsp->score, "\n"; > } > } > } > } > } > } > > My input sequence is > >> ref|NC_009512.1|:385-1902 > GTGTCAGTGGAACTTTGGCAGCAGTGCGTGGAGCTTCTGCGCGATGAACTGCCTGCCCAGCAATTCAACA > CCTGGATCCGTCCGCTACAGGTCGAAGCCGAAGGCGACGAGTTGCGCGTCTATGCGCCTAACCGTTTCGT > TCTCGATTGGGTCAATGAAAAGTACCTGGGTCGTTTGCTCGAGCTGTTGGGTGAGAACGGTAGCGGCATT > GCACCAGCCCTTTCCTTATTAATAGGTAGCCGCCGCAGCTCGGCCCCAAGGGCTGCACCCAACGCGCCGG > TCAGCGCTGCCGTTGCGGCTTCGCTGGCGCAGACTCAGGCGCACAAGACGGCCCCGGCAGCAGCGGTTGA > ACCCGTTGCCGTGGCCGCGGCCGAGCCTGTATTGGTCGAGACGTCTTCGCGTGACAGCTTTGATGCCATG > GCCGAGCCTGCTGCTGCGCCGCCCAGTGGTGGCCGGGCTGAACAGCGCACCGTGCAGGTTGAAGGTGCGC > TCAAGCACACCAGTTACCTGAACCGGACCTTTACCTTTGACACCTTCGTCGAAGGTAAGTCGAACCAGCT > CGCCCGCGCGGCTGCCTGGCAGGTTGCGGACAACCCTAAGCATGGCTACAACCCACTGTTCCTTTATGGC > GGTGTGGGTTTGGGTAAAACCCACCTTATGCATGCTGTGGGTAACCATCTGCTGAAGAAGAATCCGAACG > CCAAGGTGGTGTACCTGCATTCGGAGCGCTTCGTCGCGGACATGGTCAAAGCGTTGCAACTCAACGCCAT > CAACGAATTCAAGCGCTTCTACCGCTCGGTGGACGCGTTGCTGATCGACGATATCCAGTTCTTCGCTCGC > AAAGAGCGCTCGCAAGAAGAGTTTTTCCACACCTTCAACGCCTTGCTTGAGGGTGGCCAGCAGGTAATCC > TTACCTCTGACCGCTATCCCAAGGAAATCGAAGGCCTGGAAGAGCGTCTGAAGTCGCGCTTTGGTTGGGG > CCTGACGGTGGCTGTCGAGCCGCCAGAGCTGGAGACCCGCGTAGCGATCCTGATGAAGAAGGCCGACCAG > GCCAAAGTCGAGCTCCCGCATGACGCAGCCTTTTTCATCGCTCAGCGCATCCGGTCCAACGTCCGTGAGC > TGGAAGGTGCACTGAAGCGAGTTATTGCTCACTCGCACTTCATGGGGCGTGACATCACCATCGAGCTGAT > TCGTGAATCGCTCAAGGATCTGTTGGCGCTGCAAGACAAACTGGTCAGTGTGGATAACATTCAGCGTACC > GTCGCTGAGTACTACAAGATCAAGATCTCCGATCTGTTGTCCAAGCGTCGTTCGCGTTCTGTCGCGCGCC > CGCGTCAGGTAGCCATGGCCCTGTCCAAGGAGTTGACCAACCACAGTCTGCCGGAAATCGGCGACATGTT > CGGTGGTCGCGACCATACGACCGTGCTGCACGCCTGCCGCAAAATCAATGAACTGAAGGAATCCGACGCG > GACATCCGCGAGGACTACAAGAACCTGCTGCGGACGCTGACGACCTGA > > Please mail me regarding any queries. > > Regards, > Roopa. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From clements at nescent.org Thu Nov 19 12:46:32 2009 From: clements at nescent.org (Dave Clements) Date: Thu, 19 Nov 2009 18:46:32 +0100 Subject: [Bioperl-l] how to visualize multiple sequences alignments In-Reply-To: References: Message-ID: Hi Xiaoyu, I would also take a look at GBrowse_syn, a perl based solution built with the GBrowse genome browser framework. See http://gmod.org/wiki/GBrowse_syn. Cheers, Dave C. On Wed, Nov 18, 2009 at 7:23 PM, Jason Stajich wrote: > try jalview http://www.jalview.org/ > > > On Nov 18, 2009, at 9:18 AM, Xiaoyu Liang wrote: > > Hi, >> >> I'm wondering Is there any modules that can be used for visualizing >> multiple >> sequences alignments? like the result from ClustalW? >> >> Thank you very much, >> Xiaoyu >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- http://gmod.org/wiki/GMOD_News http://gmod.org/wiki/January_2010_GMOD_Meeting From maj at fortinbras.us Thu Nov 19 18:37:05 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 19 Nov 2009 18:37:05 -0500 Subject: [Bioperl-l] verbosity and error stack, was accessing EMBL database In-Reply-To: References: <475F74057A618245A773CD325E105D1E033334AC@phy-srv01.physiol.physiology.wisc.edu> Message-ID: I like this verbose/strict separability a lot. Should we go for it? ----- Original Message ----- From: "Chris Fields" To: "Mark A. Jensen" Cc: Sent: Thursday, November 19, 2009 10:30 AM Subject: [Bioperl-l] verbosity and error stack, was accessing EMBL database > Mark, Dave, > > This could be based on verbose(). > > Level w t d st > verbose < 0 - + - -/+ > verbose 0 + + - -/+ > verbose 1 + + + +/+ > verbose > 1 +* -> + + +/+ > * converts to throw() > w = warn > t = throw > d = debug > st = stack trace > > warn() is set up that way now, you don't get a stack trace unless verbose() is > > 0. throw() could be the same; would be a simple fix, really. > > My only problem with the current state of things is (I think we've delved down > this path before) verbosity level is tied to exception strictness as seen > above, and they're really two separate concepts, at least to me. Verbosity of > 1 or more doesn't necessarily mean I want an elevated level of strictness > along with it. For instance, one might want very strict exceptions w/o the > noise, or (conversely) lots of debugging output but no warnings. > > (aside: another small nit, but I haven't exactly liked that the global level > of strictness is designated by a env. variable with DEBUG in the name, but > that's just me). > > I've been thinking it would be nice to have simple separate verbose/strict > switches (this is the way it's implemented in Biome). This would allow some > finer grained control over output: > > Level d st > verbose 0 - - > verbose 1 + + > Default = BIOPERLDEBUG || 0 # current situation > > Level w t > strict -1 - + > strict 0 + + > strict 1 +* -> + > * converts to throw() > Default = BIOPERLSTRICT || 0 > > We could even allow finer-grained control of verbosity (states which cover all > combinations) w/o affecting strictness. > > chris > > On Nov 19, 2009, at 8:17 AM, Mark A. Jensen wrote: > >> I'm inclined to agree. Lots of responses to questions here that begin >> "Well, as the error message said, you need to check...", which means >> people tend towards "I broke it! Write the list!". I do find it hairy when >> my errors are way down in the object tree. >> ----- Original Message ----- From: "Dave Messina" >> To: "Mark A. Jensen" >> Cc: >> Sent: Thursday, November 19, 2009 9:04 AM >> Subject: Re: [Bioperl-l] accessing EMBL database >> >> >>> I would agree that the error message is not really informative. >> >> Agreed that it could be better, but I wonder whether part of the problem with >> BioPerl error messages is the stack dump. >> >> I think a lot of eyes just glaze right over when they see a big wad of >> complicated stuff, with colons and slashes and line numbers, spewing out at >> them. >> >> Perhaps the stack dump should be turned off by default? >> >> Wouldn't this: >> >> ERROR: EMBL stream with no ID. Not embl in my book >> >> >> >> Be a lot clearer than this?: >> >> MSG: EMBL stream with no ID. Not embl in my book >> STACK: Error::throw >> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm: 368 >> STACK: Bio::SeqIO::embl::next_seq C:/Perl/site/lib/Bio\SeqIO\embl.pm: 203 >> STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc >> C:/Perl/site/lib/Bio/DB/WebDBSeqI.pm: 194 >> STACK: trial2.pl >> >> >> >> Just a thought. This has probably been discussed before. >> Dave >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From michael.watson at bbsrc.ac.uk Fri Nov 20 05:07:10 2009 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Fri, 20 Nov 2009 10:07:10 +0000 Subject: [Bioperl-l] strand in Bio::SearchIO when parsing fasta35 output In-Reply-To: <8D08960C647E64438CE5740657CBBDC501487319B6@iahcexch1.iah.bbsrc.ac.uk> References: <8D08960C647E64438CE5740657CBBDC501487319AE@iahcexch1.iah.bbsrc.ac.uk> <9994F70B-AE92-4425-9AAC-E9A2DC26964E@bioperl.org> <8D08960C647E64438CE5740657CBBDC501487319B6@iahcexch1.iah.bbsrc.ac.uk> Message-ID: <8D08960C647E64438CE5740657CBBDC50148731CEB@iahcexch1.iah.bbsrc.ac.uk> Hello I was just wondering if anyone had had time to look into this? I posted a bug: http://bugzilla.open-bio.org/show_bug.cgi?id=2937 Thanks Mick -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of michael watson (IAH-C) Sent: 27 October 2009 09:01 To: 'Jason Stajich' Cc: bioperl-l Subject: Re: [Bioperl-l] strand in Bio::SearchIO when parsing fasta35 output Hi Jason They both print 0 also. A bug report it is Mick -----Original Message----- From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of Jason Stajich Sent: 26 October 2009 18:46 To: michael watson (IAH-C) Cc: bioperl-l Subject: Re: [Bioperl-l] strand in Bio::SearchIO when parsing fasta35 output Is this -m9 -d 0 output or standard default? I think the strand is parsed in the HSP parsing. Can you double check what $hsp->query->strand and $hsp->hit->strand prints? A full example report as a bug request will be next step if that doesn't resolve. -jason On Oct 26, 2009, at 10:04 AM, michael watson (IAH-C) wrote: > Dear all > > Where does this go? Perhaps I am doing something wrong. > > Fasta35 output puts the strand in the hit list at the top: > > cluster_99033:3 ( 23) [r] 115 37.9 > 0.0011 > cluster_79238:1 ( 27) [f] 126 38.0 0.00097 0.963 > 0.963 27 > > The [r] stands for reverse and the [f] stands for forward. > > There is also the text "rev-comp" after the hit line further down. > > However, when I parse fasta35 output using SearchIO and output the > strand of the HSP: > > print $hsp->strand('hit'), ","; > print $hsp->strand('query'), "\n"; > > This simply prints out 0, 0 (I assume 0 is the default in BioPerl > for "I don't know which strand it's on"). > > So the information is there, but it's not getting parsed. > Alternatively, I've missed something and will feel a bit foolish. > > Currently using BioPerl 1.6.0 > > Thanks > Mick > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Fri Nov 20 05:15:11 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 20 Nov 2009 11:15:11 +0100 Subject: [Bioperl-l] verbosity and error stack, was accessing EMBL database In-Reply-To: References: <475F74057A618245A773CD325E105D1E033334AC@phy-srv01.physiol.physiology.wisc.edu> Message-ID: <3277368F-615A-4DD3-B9B3-5D32A5EEEE98@sbc.su.se> Chris, I took a look at how you implemented this in Biome -- very nice! > I like this verbose/strict separability a lot. Should we go for it? Me too. So yes, I think so. > We could even allow finer-grained control of verbosity (states which cover all combinations) w/o affecting strictness. Perhaps this is a job for Log::Log4Perl or Log::Dispatch? http://search.cpan.org/~mschilli/Log-Log4perl-1.25/lib/Log/Log4perl.pm http://search.cpan.org/~drolsky/Log-Dispatch-2.26/lib/Log/Dispatch.pm That might be overkill, though. Dave From roychu at gmail.com Fri Nov 20 05:21:54 2009 From: roychu at gmail.com (Chu, Roy) Date: Fri, 20 Nov 2009 02:21:54 -0800 Subject: [Bioperl-l] Installing Bio-perl on dreamhost via CPAN Message-ID: <4d7f3e450911200221m5f39ace2hb979712115fb9d78@mail.gmail.com> Hi, Does anyone use dreamhost as a web hosting service? I'm just curious if anyone has had any luck installing the module as their daemon seems to kill my process whenever I try to install it. Dreamhost tech support attributes it to either exceeding the allocated memory cache or exceeding the processing time. I tried to nice the process, but that didn't help for me. Any luck or experience in resolving this would be much appreciated. I suppose my next attempt would be to try installing it directly and hope I don't need root... Thanks, Roy From s.denaxas at gmail.com Fri Nov 20 05:27:42 2009 From: s.denaxas at gmail.com (Spiros Denaxas) Date: Fri, 20 Nov 2009 11:27:42 +0100 Subject: [Bioperl-l] Installing Bio-perl on dreamhost via CPAN In-Reply-To: <4d7f3e450911200221m5f39ace2hb979712115fb9d78@mail.gmail.com> References: <4d7f3e450911200221m5f39ace2hb979712115fb9d78@mail.gmail.com> Message-ID: Hello, normally you don't need to be root - http://sial.org/howto/perl/life-with-cpan/non-root/ Kind of disturbing that their tech support cannot give you a straight answer on what they are killing the process. Good luck Spiros On Fri, Nov 20, 2009 at 11:21 AM, Chu, Roy wrote: > ?I suppose my next attempt would be to try > installing it directly and hope I don't need root... > > Thanks, > Roy > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From charles-listes+bioperl at plessy.org Fri Nov 20 05:44:45 2009 From: charles-listes+bioperl at plessy.org (Charles Plessy) Date: Fri, 20 Nov 2009 19:44:45 +0900 Subject: [Bioperl-l] Installing Bio-perl on dreamhost via CPAN In-Reply-To: <4d7f3e450911200221m5f39ace2hb979712115fb9d78@mail.gmail.com> References: <4d7f3e450911200221m5f39ace2hb979712115fb9d78@mail.gmail.com> Message-ID: <20091120104445.GG31318@kunpuu.plessy.org> Le Fri, Nov 20, 2009 at 02:21:54AM -0800, Chu, Roy a ?crit : > > Does anyone use dreamhost as a web hosting service? I'm just curious > if anyone has had any luck installing the module as their daemon seems > to kill my process whenever I try to install it. Dreamhost tech > support attributes it to either exceeding the allocated memory cache > or exceeding the processing time. I tried to nice the process, but > that didn't help for me. Any luck or experience in resolving this > would be much appreciated. I suppose my next attempt would be to try > installing it directly and hope I don't need root... Dear Roy, DreamHost uses Debian, so you can suggest them to install the Debian package. If you are in contact with the tech service, do not hesitate to tell them to contact me if they are interested by a backport of the 1.6.0 package. For version 1.6.1, it may be more difficult as it depends on perl 5.10.1. PS: if you propse to install BioPerl as a feature in the Dreamhost panel, I will vote for it :) Have a nice day, -- Charles Plessy Debian Med packaging team, http://www.debian.org/devel/debian-med Tsurumi, Kanagawa, Japan From cjfields at illinois.edu Fri Nov 20 07:51:39 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 20 Nov 2009 06:51:39 -0600 Subject: [Bioperl-l] strand in Bio::SearchIO when parsing fasta35 output In-Reply-To: <8D08960C647E64438CE5740657CBBDC50148731CEB@iahcexch1.iah.bbsrc.ac.uk> References: <8D08960C647E64438CE5740657CBBDC501487319AE@iahcexch1.iah.bbsrc.ac.uk> <9994F70B-AE92-4425-9AAC-E9A2DC26964E@bioperl.org> <8D08960C647E64438CE5740657CBBDC501487319B6@iahcexch1.iah.bbsrc.ac.uk> <8D08960C647E64438CE5740657CBBDC50148731CEB@iahcexch1.iah.bbsrc.ac.uk> Message-ID: Mick, Short answer, no. It was in the queue to be fixed at some point in 1.6.x, but that queue is quite long. I'm pushing it into the queue specifically for 1.6.2, so it should be addressed soon. chris On Nov 20, 2009, at 4:07 AM, michael watson (IAH-C) wrote: > Hello > > I was just wondering if anyone had had time to look into this? > > I posted a bug: http://bugzilla.open-bio.org/show_bug.cgi?id=2937 > > Thanks > Mick > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of michael watson (IAH-C) > Sent: 27 October 2009 09:01 > To: 'Jason Stajich' > Cc: bioperl-l > Subject: Re: [Bioperl-l] strand in Bio::SearchIO when parsing fasta35 output > > Hi Jason > > They both print 0 also. > > A bug report it is > > Mick > > -----Original Message----- > From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of Jason Stajich > Sent: 26 October 2009 18:46 > To: michael watson (IAH-C) > Cc: bioperl-l > Subject: Re: [Bioperl-l] strand in Bio::SearchIO when parsing fasta35 output > > > Is this -m9 -d 0 output or standard default? I think the strand is > parsed in the HSP parsing. > > Can you double check what $hsp->query->strand and $hsp->hit->strand > prints? > > A full example report as a bug request will be next step if that > doesn't resolve. > > -jason > On Oct 26, 2009, at 10:04 AM, michael watson (IAH-C) wrote: > >> Dear all >> >> Where does this go? Perhaps I am doing something wrong. >> >> Fasta35 output puts the strand in the hit list at the top: >> >> cluster_99033:3 ( 23) [r] 115 37.9 >> 0.0011 >> cluster_79238:1 ( 27) [f] 126 38.0 0.00097 0.963 >> 0.963 27 >> >> The [r] stands for reverse and the [f] stands for forward. >> >> There is also the text "rev-comp" after the hit line further down. >> >> However, when I parse fasta35 output using SearchIO and output the >> strand of the HSP: >> >> print $hsp->strand('hit'), ","; >> print $hsp->strand('query'), "\n"; >> >> This simply prints out 0, 0 (I assume 0 is the default in BioPerl >> for "I don't know which strand it's on"). >> >> So the information is there, but it's not getting parsed. >> Alternatively, I've missed something and will feel a bit foolish. >> >> Currently using BioPerl 1.6.0 >> >> Thanks >> Mick >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Fri Nov 20 08:00:45 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 20 Nov 2009 07:00:45 -0600 Subject: [Bioperl-l] Installing Bio-perl on dreamhost via CPAN In-Reply-To: <20091120104445.GG31318@kunpuu.plessy.org> References: <4d7f3e450911200221m5f39ace2hb979712115fb9d78@mail.gmail.com> <20091120104445.GG31318@kunpuu.plessy.org> Message-ID: On Nov 20, 2009, at 4:44 AM, Charles Plessy wrote: > Le Fri, Nov 20, 2009 at 02:21:54AM -0800, Chu, Roy a ?crit : >> >> Does anyone use dreamhost as a web hosting service? I'm just curious >> if anyone has had any luck installing the module as their daemon seems >> to kill my process whenever I try to install it. Dreamhost tech >> support attributes it to either exceeding the allocated memory cache >> or exceeding the processing time. I tried to nice the process, but >> that didn't help for me. Any luck or experience in resolving this >> would be much appreciated. I suppose my next attempt would be to try >> installing it directly and hope I don't need root... > > Dear Roy, > > DreamHost uses Debian, so you can suggest them to install the Debian package. > If you are in contact with the tech service, do not hesitate to tell them to > contact me if they are interested by a backport of the 1.6.0 package. For > version 1.6.1, it may be more difficult as it depends on perl 5.10.1. Any reason why this is so? We specify compatibility back to 5.6.1. Alex mentioned the reliance on the specific Extutils::Manifest version. The version requested has an important bug fix, is present on CPAN, and is backwards-compatible to 5.6.1. It should be fairly easy to request that as a separate package. A strict requirement for perl 5.10.1 doesn't make sense in that light, unless said perl maintainer can enlighten us as to why this is an issue? This one may require a ranty blog post. > PS: if you propse to install BioPerl as a feature in the Dreamhost panel, I > will vote for it :) > > Have a nice day, > > -- > Charles Plessy > Debian Med packaging team, > http://www.debian.org/devel/debian-med > Tsurumi, Kanagawa, Japan chris From rtbio.2009 at gmail.com Fri Nov 20 10:52:09 2009 From: rtbio.2009 at gmail.com (Roopa Raghuveer) Date: Fri, 20 Nov 2009 16:52:09 +0100 Subject: [Bioperl-l] Remote blast In-Reply-To: References: <4B056DF4.2030502@gmail.com> Message-ID: Hello everybody, I have tried to use Remote blast on Trypanasoma brucei sequences and could get certain hits.But I am unable to retrieve the complete sequence from where I got hits. i.e., I am unable to parse the blast output file for getting the complete sequences of the hits. Here is my code. #!/usr/bin/perl -w use Bio::SearchIO; my $blast_report = new Bio::SearchIO ('-format' => 'blast', '-file' => $ARGV[0]); my $result = $blast_report->next_result; my $level = $ARGV[1]; while( my $hit = $result->next_hit) { print $hit->name; push(@arr1,$hit->name); while( my $hsp = $hit->next_hsp()) { if ($hsp->frac_identical() >= $level) { #print $hsp->hit_string, "\n"; push(@arr,$hsp->hit_string); } } } $k=@arr1; for($i=0;$i<$k;$i++){ push(@arr2,split(/|/,$arr1[$i])); #print "$arr[$i]\n"; } #$t=@arr2; Here,I am trying to use the blast output file and get the complete sequence where I found a hit but I could not get the complete sequence. i/p:- Last login: Mon Nov 16 11:57:22 on console Welcome to Darwin! lmbicip-mac1:~ cip$ ssh admin at 141.84.66.66 The authenticity of host '141.84.66.66 (141.84.66.66)' can't be established. RSA key fingerprint is 2d:4a:09:1d:2e:f3:51:c7:ba:8b:29:37:36:f6:44:db. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added '141.84.66.66' (RSA) to the list of known hosts. Password: Last login: Fri Nov 20 13:52:57 2009 from 10.153.189.239 Have a lot of fun... admin at BosLinux:~> clear admin at BosLinux:~> cd Documents/ admin at BosLinux:~/Documents> clear admin at BosLinux:~/Documents> vim blast.pl admin at BosLinux:~/Documents> clear admin at BosLinux:~/Documents> vim nnn.pl admin at BosLinux:~/Documents> vim other.pl admin at BosLinux:~/Documents> vim amino.fa admin at BosLinux:~/Documents> vim Tb09.211.2410.out admin at BosLinux:~/Documents> vim Tb09.211.2410.out |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 661 TTTGATGAAACCCCAATTCGGACGTATGAAAAGATTCTTGCGGGCCGGCTTAAATTCCCC 720 Query 721 AATTGGTTTGATGAGCGTGCGCGGGATCTCGTAAAGGGTTTATTGCAAACGGATCACACG 780 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 721 AATTGGTTTGATGAGCGTGCGCGGGATCTCGTAAAGGGTTTATTGCAAACGGATCACACG 780 Query 781 AAACGGTTGGGCACGCTGAAGGATGGCGTAGCTGATGTGAAGAATCACCCATTCTTCCGT 840 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 781 AAACGGTTGGGCACGCTGAAGGATGGCGTAGCTGATGTGAAGAATCACCCATTCTTCCGT 840 Query 841 GGTGCGAATTGGGAGAAACTCTATGGACGTCATTATAACGCCCCCATTGCCGTAAAAGTG 900 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 841 GGTGCGAATTGGGAGAAACTCTATGGACGTCATTATAACGCCCCCATTGCCGTAAAAGTG 900 Query 901 AAGAGCCCCGGCGACACAAGTAACTTTGAGTCGTATCCCGAGAGTGGAGATAAGGGTTCT 960 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 901 AAGAGCCCCGGCGACACAAGTAACTTTGAGTCGTATCCCGAGAGTGGAGATAAGGGTTCT 960 Query 961 CCTCCACTAACCCCTTCGCAACAGGTTGCATTCCGTGGTTTTTAG 1005 ||||||||||||||||||||||||||||||||||||||||||||| Sbjct 961 CCTCCACTAACCCCTTCGCAACAGGTTGCATTCCGTGGTTTTTAG 1005 >ref|XM_822286.1| Trypanosoma brucei TREU927 protein kinase A catalytic subunit isoform 2 (Tb09.211.2360) partial mRNA Length=1011 Score = 1622 bits (1798), Expect = 0.0 Identities = 944/974 (96%), Gaps = 0/974 (0%) Strand=Plus/Plus Query 32 TGTTTACCAAGCCTGACACATCGGGATGGAAGCTGAGTGACTTTGAAATGGGTGACACGC 91 |||||||||| ||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 38 TGTTTACCAAACCTGACACATCGGGATGGAAGCTGAGTGACTTTGAAATGGGTGACACGC 97 Query 92 TAGGGACCGGCTCGTTTGGTCGCGTGCGCATTGCAAAACTGAAGAGCAGGGGGGAGTATT 151 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 98 TAGGGACCGGCTCGTTTGGTCGCGTGCGCATTGCAAAACTGAAGAGCAGGGGGGAGTATT 157 Query 152 ATGCAATAAAATGTCTAAAGAAGCATGAGATACTAAAGATGAAGCAGGTACAACACCTGA 211 |||||||||||||||||||||||| ||||||||||||||||||||||||||||||||||| Sbjct 158 ATGCAATAAAATGTCTAAAGAAGCGTGAGATACTAAAGATGAAGCAGGTACAACACCTGA 217 Query 212 ACCAAGAGAAGCAAATTCTAATGGAGTTGTCACACCCCTTCATTGTGAACATGATGTGTT 271 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 218 ACCAAGAGAAGCAAATTCTAATGGAGTTGTCACACCCCTTCATTGTGAACATGATGTGTT 277 uery 272 CCTTCCAGGATGAGAACCGCGTCTACTTTGTTCTAGAATTTGTGGTAGGTGGTGAGGTAT 331 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 278 CCTTCCAGGATGAGAACCGCGTCTACTTTGTTCTAGAATTTGTGGTAGGTGGTGAGGTAT 337 Query 332 TTACTCACCTTCGTTCCGCAGGCCGTTTCCCGAATGACGTAGCGAAGTTCTATCATGCGG 391 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 338 TTACTCACCTTCGTTCCGCAGGCCGTTTCCCGAATGACGTAGCGAAGTTCTATCATGCGG 397 Query 392 AGCTTGTGTTGGCCTTTGAATATTTACACTCGAAGGACATTATCTACCGTGACTTGAAAC 451 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 398 AGCTTGTGTTGGCCTTTGAATATTTACACTCGAAGGACATTATCTACCGTGACTTGAAAC 457 Query 452 CTGAGAATCTGCTACTTGATGGGAAGGGACACGTCAAGGTGACTGATTTTGGTTTTGCTA 511 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 458 CTGAGAATCTGCTACTTGATGGGAAGGGACACGTCAAGGTGACTGATTTTGGTTTTGCTA 517 Query 512 AGAAGGTGACGGATCGTACCTATACGTTATGTGGGACACCTGAGTATCTTGCACCTGAGG 571 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 518 AGAAGGTGACGGATCGTACCTATACGTTATGTGGGACACCTGAGTATCTTGCACCTGAGG 577 Query 572 TAATTCAGAGCAAAGGACATGGGAAGGCTGTGGATTGGTGGACGATGGGTGTTTTGCTGT 631 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| It follows like this. The output I got is ATGACGACAACTCCCACTGGTGATGGCCAACTGTTTACCAAGCCTGACACATCGGGATGGAAGCTGAGTGACTTTGAAATGGGTGACACGCTAGGGACCGGCTCGTTTGGTCGCGTGCGCATTGCAAAACTGAAGAGCAGGGGGGAGTATTATGCAATAAAATGTCTAAAGAAGCATGAGATACTAAAGATGAAGCAGGTACAACACCTGAACCAAGAGAAGCAAATTCTAATGGAGTTGTCACACCCCTTCATTGTGAACATGATGTGTTCCTTCCAGGATGAGAACCGCGTCTACTTTGTTCTAGAATTTGTGGTAGGTGGTGAGGTATTTACTCACCTTCGTTCCGCAGGCCGTTTCCCGAATGACGTAGCGAAGTTCTATCATGCGGAGCTTGTGTTGGCCTTTGAATATTTACACTCGAAGGACATTATCTACCGTGACTTGAAACCTGAGAATCTGCTACTTGATGGGAAGGGACACGTCAAGGTGACTGATTTTGGTTTTGCTAAGAAGGTGACGGATCGTACCTATACGTTATGTGGGACACCTGAGTATCTTGCACCTGAGGTAATTCAGAGCAAAGGACATGGGAAGGCTGTGGATTGGTGGACGATGGGTGTTTTGCTGTATGAATTCATAGCTGGCCATCCTCCCTTTTTTGATGAAACCCCAATTCGGACGTATGAAAAGATTCTTGCGGGCCGGCTTAAATTCCCCAATTGGTTTGATGAGCGTGCGCGGGATCTCGTAAAGGGTTTATTGCAAACGGATCACACGAAACGGTTGGGCACGCTGAAGGATGGCGTAGCTGATGTGAAGAATCACCCATTCTTCCGTGGTGCGAATTGGGAGAAACTCTATGGACGTCATTATAACGCCCCCATTGCCGTAAAAGTGAAGAGCCCCGGCGACACAAGTAACTTTGAGTCGTATCCCGAGAGTGGAGATAAGGGTTCTCCTCCACTAACCCCTTCGCAACAGG TTGCATTCCGTGGTTTTTAG TGTTTACCAAACCTGACACATCGGGATGGAAGCTGAGTGACTTTGAAATGGGTGACACGCTAGGGACCGGCTCGTTTGGTCGCGTGCGCATTGCAAAACTGAAGAGCAGGGGGGAGTATTATGCAATAAAATGTCTAAAGAAGCGTGAGATACTAAAGATGAAGCAGGTACAACACCTGAACCAAGAGAAGCAAATTCTAATGGAGTTGTCACACCCCTTCATTGTGAACATGATGTGTTCCTTCCAGGATGAGAACCGCGTCTACTTTGTTCTAGAATTTGTGGTAGGTGGTGAGGTATTTACTCACCTTCGTTCCGCAGGCCGTTTCCCGAATGACGTAGCGAAGTTCTATCATGCGGAGCTTGTGTTGGCCTTTGAATATTTACACTCGAAGGACATTATCTACCGTGACTTGAAACCTGAGAATCTGCTACTTGATGGGAAGGGACACGTCAAGGTGACTGATTTTGGTTTTGCTAAGAAGGTGACGGATCGTACCTATACGTTATGTGGGACACCTGAGTATCTTGCACCTGAGGTAATTCAGAGCAAAGGACATGGGAAGGCTGTGGATTGGTGGACGATGGGTGTTTTGCTGTATGAATTCATAGCTGGCCATCCTCCCTTTTTTGATGAAACCCCAATTCGGACGTATGAAAAGATTCTTGCGGGCCGGTTCAAATTCCCCAATTGGTTTGACTCCCGTGCGCGGGATCTCGTAAAGGGTTTATTGCAAACGGATCACACGAAACGGTTGGGCACGCTGAAGGATGGCGTAGCTGATGTGAAGAATCACCCATTCTTCCGTGGTGCGAATTGGGAGAAACTCTATGGACGTCATTATCACGCTCCCATTCCTGTAAAAGTGAAGAGCCCCGGCGACACAAGTAACTTTGAGTCGTATCCCGAGAGTGGGGATAAGCGGTTGCCCCCGTTAGCACCATCACAACAATTGGAGTTCCGTGGGTTTTAG GGATGATGACCGATTGTACCTCCTCCTCGAGTATGTGGTGGGTGGCGAGCTGT TCTCCCACCTCCGGAAGGCGGGAAAATTCCCTAATGATGTAGCCAAGTTCTACTCCGCAGAAGTGGTTTTGGCGTTTGAATATATTCATGAGTGCGGCATCGTATACCGTGACTTGAAGCCAGAAAATGTGCTTTTGGACAAGCAGGGAAACATTAAGATTACGGACTTTGGGTTCGCGAAACGCGTTAGGGACAGAACGTACACGCTATGTGGGACTCCAGAGTATCTTGCGCCGGAGATAATCCAAAGTAAAGGTCACGATCGGGCTGTGGATTGGTGGACACTCGGAATTCTTCTCTATGAGATGCTTGTCGGTTATCCTCCTTTTTTCGACGAGAGTCCTTTTAGAACATACGAAAAAATTTTAGAGGGGAAACTTCAGTTTCCAAAGTGGGTGGAGATGCGGGCGAAGGACCTCATAAAGAGTTTTTTAACAATTGAACCAACGAAACG i.e.,It is only giving the region where it could find the best alignment i.e., the best hit ones. I want the complete sequence i.e., sequences corresponding to the accession numbers XM_822292.1 XM_822286.1 XM_822694.1 Database used in Remote blast was RefSeq i.e.,(refseq_rna),organism used :Trypanasoma brucei. Can any one please help me in solving this problem Regards, Roopa. On Fri, Nov 20, 2009 at 12:30 PM, Roopa Raghuveer wrote: > > Hello Roy, > > Thanks a lot for your reply.My code is working for my sequence now. > > Thanks alot. > > Regards, > Roopa. > > On Thu, Nov 19, 2009 at 5:10 PM, Roy Chaudhuri wrote: > >> Hi Roopa, >> >> I think that the -Organism parameter that you specify for >> Bio::Tools::Run::RemoteBlast is ignored - I can't find any reference to it >> in the documentation: >> >> http://search.cpan.org/~cjfields/BioPerl-1.6.1/Bio/Tools/Run/RemoteBlast.pm >> >> You have the correct approach in your code - limiting the search to the >> Entrez query "Trypanosoma brucei[ORGN]", but the line is commented out. If >> you uncomment the line (and add a semicolon afterwards), the program runs >> correctly, but no hits are reported below your threshold e-value. If you >> change the value of $e_val to 10 then some T.brucei hits are reported. >> >> Roy. >> >> Roopa Raghuveer wrote: >> >>> Hello everybody, >>> >>> I have a problem. I would like to use remote blast to find sequences >>> matching for an input sequence. >>> >>> Ex:-I would like to search sequences which match Trypanosoma Brucei >>> sequence. >>> >>> I want the output to be only Trypanosoma Brucei sequences matching with >>> my >>> query.When i tried to use remoteblast to nr database,I got sequences from >>> different organisms like E.coli,Pseudomonas etc., >>> >>> Could you please tell me how can this be solved...? >>> >>> My code is as follows. >>> >>> use Bio::Tools::Run::RemoteBlast; >>> use strict; >>> my $prog = 'blastn'; >>> my $db = 'nr'; >>> my $e_val= '1e-10'; >>> my $organism= 'Trypanosoma Brucei'; >>> >>> my @params = ( '-prog' => $prog, >>> '-data' => $db, >>> '-expect' => $e_val, >>> '-readmethod' => 'SearchIO', >>> '-Organism' => $organism ); >>> >>> my $factory = Bio::Tools::Run::RemoteBlast-> >>> new(@params); >>> >>> #change a paramter >>> #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma >>> brucei[ORGN]' >>> >>> #remove a parameter >>> #delete $Bio::Tools::Run::RemoteBlast::HEADER{'FILTER'}; >>> >>> my $v = 1; >>> #$v is just to turn on and off the messages >>> >>> my $str = Bio::SeqIO->new(-file=>'amino.fa' , '-format' => 'fasta' , >>> '-organism' => 'Trypanosoma Brucei' ); >>> >>> while (my $input = $str->next_seq()){ >>> #Blast a sequence against a database: >>> my $r = $factory->submit_blast($input); >>> #my $r = $factory->submit_blast('amino.fa'); >>> >>> print STDERR "waiting..." if( $v > 0 ); >>> while ( my @rids = $factory->each_rid ) { >>> foreach my $rid ( @rids ) { >>> my $rc = $factory->retrieve_blast($rid); >>> if( !ref($rc) ) { >>> if( $rc < 0 ) { >>> $factory->remove_rid($rid); >>> } >>> print STDERR "." if ( $v > 0 ); >>> sleep 5; >>> } >>> else { >>> my $result = $rc->next_result(); >>> #save the output >>> my $filename = $result->query_name()."\.out"; >>> $factory->save_output($filename); >>> $factory->remove_rid($rid); >>> print "\nQuery Name: ", $result->query_name(), "\n"; >>> while ( my $hit = $result->next_hit ) { >>> next unless ( $v > 0); >>> print "\thit name is ", $hit->name, "\n"; >>> while( my $hsp = $hit->next_hsp ) { >>> print "\t\tscore is ", $hsp->score, "\n"; >>> } >>> } >>> } >>> } >>> } >>> } >>> >>> My input sequence is >>> >>> ref|NC_009512.1|:385-1902 >>>> >>> GTGTCAGTGGAACTTTGGCAGCAGTGCGTGGAGCTTCTGCGCGATGAACTGCCTGCCCAGCAATTCAACA >>> CCTGGATCCGTCCGCTACAGGTCGAAGCCGAAGGCGACGAGTTGCGCGTCTATGCGCCTAACCGTTTCGT >>> TCTCGATTGGGTCAATGAAAAGTACCTGGGTCGTTTGCTCGAGCTGTTGGGTGAGAACGGTAGCGGCATT >>> GCACCAGCCCTTTCCTTATTAATAGGTAGCCGCCGCAGCTCGGCCCCAAGGGCTGCACCCAACGCGCCGG >>> TCAGCGCTGCCGTTGCGGCTTCGCTGGCGCAGACTCAGGCGCACAAGACGGCCCCGGCAGCAGCGGTTGA >>> ACCCGTTGCCGTGGCCGCGGCCGAGCCTGTATTGGTCGAGACGTCTTCGCGTGACAGCTTTGATGCCATG >>> GCCGAGCCTGCTGCTGCGCCGCCCAGTGGTGGCCGGGCTGAACAGCGCACCGTGCAGGTTGAAGGTGCGC >>> TCAAGCACACCAGTTACCTGAACCGGACCTTTACCTTTGACACCTTCGTCGAAGGTAAGTCGAACCAGCT >>> CGCCCGCGCGGCTGCCTGGCAGGTTGCGGACAACCCTAAGCATGGCTACAACCCACTGTTCCTTTATGGC >>> GGTGTGGGTTTGGGTAAAACCCACCTTATGCATGCTGTGGGTAACCATCTGCTGAAGAAGAATCCGAACG >>> CCAAGGTGGTGTACCTGCATTCGGAGCGCTTCGTCGCGGACATGGTCAAAGCGTTGCAACTCAACGCCAT >>> CAACGAATTCAAGCGCTTCTACCGCTCGGTGGACGCGTTGCTGATCGACGATATCCAGTTCTTCGCTCGC >>> AAAGAGCGCTCGCAAGAAGAGTTTTTCCACACCTTCAACGCCTTGCTTGAGGGTGGCCAGCAGGTAATCC >>> TTACCTCTGACCGCTATCCCAAGGAAATCGAAGGCCTGGAAGAGCGTCTGAAGTCGCGCTTTGGTTGGGG >>> CCTGACGGTGGCTGTCGAGCCGCCAGAGCTGGAGACCCGCGTAGCGATCCTGATGAAGAAGGCCGACCAG >>> GCCAAAGTCGAGCTCCCGCATGACGCAGCCTTTTTCATCGCTCAGCGCATCCGGTCCAACGTCCGTGAGC >>> TGGAAGGTGCACTGAAGCGAGTTATTGCTCACTCGCACTTCATGGGGCGTGACATCACCATCGAGCTGAT >>> TCGTGAATCGCTCAAGGATCTGTTGGCGCTGCAAGACAAACTGGTCAGTGTGGATAACATTCAGCGTACC >>> GTCGCTGAGTACTACAAGATCAAGATCTCCGATCTGTTGTCCAAGCGTCGTTCGCGTTCTGTCGCGCGCC >>> CGCGTCAGGTAGCCATGGCCCTGTCCAAGGAGTTGACCAACCACAGTCTGCCGGAAATCGGCGACATGTT >>> CGGTGGTCGCGACCATACGACCGTGCTGCACGCCTGCCGCAAAATCAATGAACTGAAGGAATCCGACGCG >>> GACATCCGCGAGGACTACAAGAACCTGCTGCGGACGCTGACGACCTGA >>> >>> Please mail me regarding any queries. >>> >>> Regards, >>> Roopa. >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> > From mauricio at open-bio.org Fri Nov 20 11:15:22 2009 From: mauricio at open-bio.org (Mauricio Herrera Cuadra) Date: Fri, 20 Nov 2009 10:15:22 -0600 Subject: [Bioperl-l] DANGER: hacking of bioperl wiki? In-Reply-To: <7761C2223DB54DE6B836F302D2FF6AC0@NewLife> References: <7761C2223DB54DE6B836F302D2FF6AC0@NewLife> Message-ID: <4B06C09A.8060708@open-bio.org> All OBF wikis and blogs have been upgraded and cleaned from the hack. Thanks for the heads up! Mauricio. Mark A. Jensen wrote: > Andrew-- thanks!! We're on it. > MAJ > ----- Original Message ----- From: "Andrew Grimm" > > To: > Sent: Wednesday, November 18, 2009 9:52 PM > Subject: [Bioperl-l] DANGER: hacking of bioperl wiki? > > >> Caution: read the whole email before visiting the bioperl wiki >> >> I was doing some bioinformatics-related searching using google, and >> one of the hits was to the bio dot perl dot org wiki (the FAQ in >> particular). >> >> When I did that, I was redirected to a ferdax dot com web site (a >> typo-squatting of fedex?). >> >> Some people reckon that ferdax hacks web sites and redirects google >> hits from the victim web site to their own web site. For example, this >> thread at google's webmaster central >> http://www.google.com/support/forum/p/Webmasters/thread?tid=37a36c0d1ea99819&hl=en#all >> >> (it's talking about zencart, but presumably they've since found other >> victims) >> >> Just going to the website without using google may not trigger the >> redirect. >> >> Apologies if this is a false alarm, but I don't think it is. >> >> I won't be in contact between Friday and Monday Australian time (I'll >> be at railscamp 6 in Melbourne), so I won't be able to answer any >> replies. >> >> Thanks, >> >> Andrew Grimm >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From David.Messina at sbc.su.se Fri Nov 20 11:39:53 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 20 Nov 2009 17:39:53 +0100 Subject: [Bioperl-l] Remote blast In-Reply-To: References: <4B056DF4.2030502@gmail.com> Message-ID: <7ECF627D-3DBF-4575-89CF-FA6348C88E8E@sbc.su.se> Hi Roopa, As far as I know, a BLAST report never contains the complete sequences of the hits. If it includes any part of the hit's sequence, it will be the part that matches the query. You'll have to use the hit's ID or accession to get its complete sequence from somewhere else. You can use Bio::DB::Genbank to do that, for example. See http://www.bioperl.org/wiki/HOWTO:Beginners#Retrieving_a_sequence_from_a_database Dave From alessandra.bilardi at gmail.com Fri Nov 20 12:44:18 2009 From: alessandra.bilardi at gmail.com (Alessandra) Date: Fri, 20 Nov 2009 18:44:18 +0100 Subject: [Bioperl-l] Bio::DB::EUtilities question Message-ID: Hi all, I'm testing Bio::DB::EUtilities - webagent which interacts with and retrieves data from NCBI's eUtils. My perl script works but it works only if I request less than ~450 times get_Response function.. else I have got this error message: ------------- EXCEPTION ------------- MSG: Response Error Can't connect to eutils.ncbi.nlm.nih.gov:80 (connect: No route to host) STACK Bio::DB::GenericWebAgent::get_Response /usr/local/share/perl/5.10.0/Bio/DB/GenericWebAgent.pm:215 STACK toplevel ./wget4gbk.pl:77 ------------------------------------- wget4gbk.pl lines 76-77 are: my $req = Bio::DB::EUtilities->new(-db => 'genome', -eutil => 'esummary', -retmode => $mode, -rettype => $type, -id => $id); my $entry = $req->get_Response; I run perl script more ten times and this error arrives random time at the range 300-600 requests. If I use another system to request data, then I can to do ~ 10000 requests, without errors. Had I to set EUtilities object with particular parameters? Can you help me about random exception error? Best, -- Alessandra Bilardi, Ph. D. ---- CRIBI, University of Padova, Italy http://www.linkedin.com/in/bilardi ---- From maj at fortinbras.us Fri Nov 20 13:42:38 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 20 Nov 2009 13:42:38 -0500 Subject: [Bioperl-l] gravatars on the wiki Message-ID: <94431678F3764E8C9A49EA4D2FCD0DBD@NewLife> Hi all, You can now reveal your Gravatar (http://www.gravatar.com) on the wiki, by including the following markup on the page: {{#gravatar|youremail -at- yourplace -dot- tld}} You can do the antispam measure above, or use a regular email. Invalid emails throw an error. http://bioperl.org/wiki/Gravatars Happy coding, MAJ From roychu at gmail.com Fri Nov 20 15:23:21 2009 From: roychu at gmail.com (Chu, Roy) Date: Fri, 20 Nov 2009 12:23:21 -0800 Subject: [Bioperl-l] Installing Bio-perl on dreamhost via CPAN In-Reply-To: References: <4d7f3e450911200221m5f39ace2hb979712115fb9d78@mail.gmail.com> <20091120104445.GG31318@kunpuu.plessy.org> Message-ID: <4d7f3e450911201223w59cb308q5af7690a28697966@mail.gmail.com> "sounds very much like you process was killed for prolonged execution time, or memory usage. We have a daemon in place that monitors for processes that take up too much of a shared web server's resources, and this may have kicked in (and often does when trying to install packages on a shared server)." This was the explanation they had. Regarding asking their admins to install, it seems is a "they'll try to get to it but don't hold your breath situation." Hmmm, I tried some other attempts, installing 1.4.0 posed no problems. I'm not a perl guru, so I tried to increase the build cache size from the default, 10 MB, hoping that that may be the problem--can't imagine how though, since I can't imagine how big the whole package version can differ by (though honestly, I haven't checked). Whenever I try to install 1.6.1, it runs into a problem I guess after the 'make' step and lists the modules--BioPerl-1.6.0/t/Variation/SeqDiff.t BioPerl-1.6.0/t/Variation/SNP.t BioPerl-1.6.0/t/Variation/Variation_IO.t --and typically gets killed here '> Killed' Next, I tried 1.6.0, then I get this: "(I think you ran Build.PL directly, so will use CPAN to install prerequisites on demand) CPAN: Storable loaded ok (v2.12) Going to read '/home/$username/.cpan/Metadata' Killed" (everything prior works and it seems to get further along than when I try to install 1.6.1) Any insight into why this may be happening would be appreciated. Something EQUALLY appreciated would be a recommendation of a decent enough hosting service where someone has had success installing Bio-Perl. I'd try to set up my Mac web sharing feature and then try to setup the stuff locally, but I haven't yet been able to successfully get the port forwarding feature working properly on the apple airport extreme--perplexing. Next, I might just try to install via the Build.pl script. Hmm, checking the wiki, it seems I'll still be able to run remote blast and use the basic seq modules, although some discrepancies and idiosyncrasies may be expected? Any head-ups about any false assumptions by me would be greatly appreciated. Thanks in advance, Roy On Fri, Nov 20, 2009 at 5:00 AM, Chris Fields wrote: > > On Nov 20, 2009, at 4:44 AM, Charles Plessy wrote: > >> Le Fri, Nov 20, 2009 at 02:21:54AM -0800, Chu, Roy a ?crit : >>> >>> Does anyone use dreamhost as a web hosting service? ?I'm just curious >>> if anyone has had any luck installing the module as their daemon seems >>> to kill my process whenever I try to install it. ?Dreamhost tech >>> support attributes it to either exceeding the allocated memory cache >>> or exceeding the processing time. ?I tried to nice the process, but >>> that didn't help for me. ?Any luck or experience in resolving this >>> would be much appreciated. ?I suppose my next attempt would be to try >>> installing it directly and hope I don't need root... >> >> Dear Roy, >> >> DreamHost uses Debian, so you can suggest them to install the Debian package. >> If you are in contact with the tech service, do not hesitate to tell them to >> contact me if they are interested by a backport of the 1.6.0 package. For >> version 1.6.1, it may be more difficult as it depends on perl 5.10.1. > > Any reason why this is so? ?We specify compatibility back to 5.6.1. > > Alex mentioned the reliance on the specific Extutils::Manifest version. ?The version requested has an important bug fix, is present on CPAN, and is backwards-compatible to 5.6.1. ?It should be fairly easy to request that as a separate package. > > A strict requirement for perl 5.10.1 doesn't make sense in that light, unless said perl maintainer can enlighten us as to why this is an issue? ?This one may require a ranty blog post. > >> PS: if you propse to install BioPerl as a feature in the Dreamhost panel, I >> will vote for it :) >> >> Have a nice day, >> >> -- >> Charles Plessy >> Debian Med packaging team, >> http://www.debian.org/devel/debian-med >> Tsurumi, Kanagawa, Japan > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Fri Nov 20 15:40:24 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 20 Nov 2009 14:40:24 -0600 Subject: [Bioperl-l] Installing Bio-perl on dreamhost via CPAN In-Reply-To: <4d7f3e450911201223w59cb308q5af7690a28697966@mail.gmail.com> References: <4d7f3e450911200221m5f39ace2hb979712115fb9d78@mail.gmail.com> <20091120104445.GG31318@kunpuu.plessy.org> <4d7f3e450911201223w59cb308q5af7690a28697966@mail.gmail.com> Message-ID: <1D1B0987-3309-4281-BCE0-2737E4F0D0B1@illinois.edu> BioPerl is pure perl. If you believe all dependencies are installed, just unpack the dist to a specific directory and point PERL5LIB at it (for bash): export PERL5LIB=/home/USER/bioperl/bioperl-live Note that if you plan on doing the same for other bioperl-related modules (ex: bioperl-db) you'll need to add 'lib' to it, as they use a generic Module::Build now. export PERL5LIB=/home/USER/bioperl/bioperl-db/lib You can also add a 'use lib' directive in your scripts as well. More at the following link: http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix#USING_MODULES_NOT_INSTALLED_IN_THE_STANDARD_LOCATION chris On Nov 20, 2009, at 2:23 PM, Chu, Roy wrote: > "sounds very much like you process was killed for prolonged execution > time, or memory usage. We have a daemon in place that monitors for > processes that take up too much of a shared web server's resources, and > this may have kicked in (and often does when trying to install packages > on a shared server)." > > This was the explanation they had. Regarding asking their admins to > install, it seems is a "they'll try to get to it but don't hold your > breath situation." > > Hmmm, I tried some other attempts, installing 1.4.0 posed no problems. > I'm not a perl guru, so I tried to increase the build cache size from > the default, 10 MB, hoping that that may be the problem--can't imagine > how though, since I can't imagine how big the whole package version > can differ by (though honestly, I haven't checked). > Whenever I try to install 1.6.1, it runs into a problem I guess after > the 'make' step and lists the > modules--BioPerl-1.6.0/t/Variation/SeqDiff.t > BioPerl-1.6.0/t/Variation/SNP.t > BioPerl-1.6.0/t/Variation/Variation_IO.t > --and typically gets killed here '> Killed' > > Next, I tried 1.6.0, then I get this: > "(I think you ran Build.PL directly, so will use CPAN to install > prerequisites on demand) > CPAN: Storable loaded ok (v2.12) > Going to read '/home/$username/.cpan/Metadata' > Killed" (everything prior works and it seems to get further along than > when I try to install 1.6.1) > > Any insight into why this may be happening would be appreciated. > Something EQUALLY appreciated would be a recommendation of a decent > enough hosting service where someone has had success installing > Bio-Perl. I'd try to set up my Mac web sharing feature and then try > to setup the stuff locally, but I haven't yet been able to > successfully get the port forwarding feature working properly on the > apple airport extreme--perplexing. Next, I might just try to install > via the Build.pl script. > > Hmm, checking the wiki, it seems I'll still be able to run remote > blast and use the basic seq modules, although some discrepancies and > idiosyncrasies may be expected? Any head-ups about any false > assumptions by me would be greatly appreciated. > > Thanks in advance, > Roy > > On Fri, Nov 20, 2009 at 5:00 AM, Chris Fields wrote: >> >> On Nov 20, 2009, at 4:44 AM, Charles Plessy wrote: >> >>> Le Fri, Nov 20, 2009 at 02:21:54AM -0800, Chu, Roy a ?crit : >>>> >>>> Does anyone use dreamhost as a web hosting service? I'm just curious >>>> if anyone has had any luck installing the module as their daemon seems >>>> to kill my process whenever I try to install it. Dreamhost tech >>>> support attributes it to either exceeding the allocated memory cache >>>> or exceeding the processing time. I tried to nice the process, but >>>> that didn't help for me. Any luck or experience in resolving this >>>> would be much appreciated. I suppose my next attempt would be to try >>>> installing it directly and hope I don't need root... >>> >>> Dear Roy, >>> >>> DreamHost uses Debian, so you can suggest them to install the Debian package. >>> If you are in contact with the tech service, do not hesitate to tell them to >>> contact me if they are interested by a backport of the 1.6.0 package. For >>> version 1.6.1, it may be more difficult as it depends on perl 5.10.1. >> >> Any reason why this is so? We specify compatibility back to 5.6.1. >> >> Alex mentioned the reliance on the specific Extutils::Manifest version. The version requested has an important bug fix, is present on CPAN, and is backwards-compatible to 5.6.1. It should be fairly easy to request that as a separate package. >> >> A strict requirement for perl 5.10.1 doesn't make sense in that light, unless said perl maintainer can enlighten us as to why this is an issue? This one may require a ranty blog post. >> >>> PS: if you propse to install BioPerl as a feature in the Dreamhost panel, I >>> will vote for it :) >>> >>> Have a nice day, >>> >>> -- >>> Charles Plessy >>> Debian Med packaging team, >>> http://www.debian.org/devel/debian-med >>> Tsurumi, Kanagawa, Japan >> >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From charles-listes+bioperl at plessy.org Fri Nov 20 20:07:23 2009 From: charles-listes+bioperl at plessy.org (Charles Plessy) Date: Sat, 21 Nov 2009 10:07:23 +0900 Subject: [Bioperl-l] Installing Bio-perl on dreamhost via CPAN In-Reply-To: References: <4d7f3e450911200221m5f39ace2hb979712115fb9d78@mail.gmail.com> <20091120104445.GG31318@kunpuu.plessy.org> Message-ID: <20091121010723.GA7786@kunpuu.plessy.org> Le Fri, Nov 20, 2009 at 07:00:45AM -0600, Chris Fields a ?crit : > On Nov 20, 2009, at 4:44 AM, Charles Plessy wrote: > > > > DreamHost uses Debian, so you can suggest them to install the Debian > > package. If you are in contact with the tech service, do not hesitate to > > tell them to contact me if they are interested by a backport of the 1.6.0 > > package. For version 1.6.1, it may be more difficult as it depends on perl > > 5.10.1. > > Any reason why this is so? We specify compatibility back to 5.6.1. Dear Chris, you make a good point: although for building we need to either depend on perl 5.10.1 or package separately Extutils::Manifest, the resulting bioperl package does not depend on such a high version. Therefore, there is no need for a backport, and the latest Debian package can be installed on Debian stable (5.0/Lenny) system. I just checked the Dreamhost machine on which I happen to have an acces, ?waratahs?, and it seems to be older, but nevertheless it may be worth asking the admins anyway (with the big drawback that they would have to be asked for each update). Have a nice week-end, -- Charles Plessy Debian Med packaging team, http://www.debian.org/devel/debian-med Tsurumi, Kanagawa, Japan From robert.bradbury at gmail.com Fri Nov 20 20:40:14 2009 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Fri, 20 Nov 2009 20:40:14 -0500 Subject: [Bioperl-l] Excessive CPU use by various Bioperl sites Message-ID: I run a Linux system which is in a gradual process of evolution from the default Linux browsers (Galeon, Epiphany, etc.) through Firefox (better) to Google's Chromium (IMO, perhaps the best so far). Chromium allows one to create a process per tab/URL so one can effectively track what it is doing. It also allows one to track the machine usage of these processes (through the Developer > Task manager [shift-escape keyboard] option) which though expensive in terms of overhead allows one to track offending windows (in terms of memory or CPU use). My processor recently jumped from a typical 700 MHz to 1.4 GHz speed (using the Linux Ondemand scheduler - which saves ~20 W at the wall outlet -- I've measured it) to the full tilt 2.8 GHz the CPU is capable of. Looking at the chrome task manager I was not surprised to find the NY Times high on the list (they are pushing content, esp. using Javascript) but much to my dismay the Jalview and Howto:Trees:Bioperl appeared to be high on the list. Now I am forced to ask myself *why* sites which are simply distributing static information are eating up CPU on my machine! This is a fundamental flaw in the architecture of the sites -- wherein there should be conscious efforts to minimize user-CPU use (or avoid Javascript entirely). This would not be a problem if I were using Firefox as I can easily use NoScript to block Javacscript from non-approved sites. But it raises the question of when one should allow Javascript to run (one would "normally" approve academic sites by default) when even the academic sites are abusing my CPU. There needs to be much greater awareness both on the part of software distributors and software consumers that it is *MY* CPU and *MY* Electricty and *MY* contribution to global warming. And the developers/distributors should not be sucking down those resources without first saying "May I?" and I have the option of saying "No you may not." There is enough we can do productively (running low homology blast searches) without engaging in endless wheel spinning of Javascripts or looped GIFs. Robert From maj at fortinbras.us Fri Nov 20 23:17:12 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 20 Nov 2009 23:17:12 -0500 Subject: [Bioperl-l] ohlohers Message-ID: You can now add your Ohloh widgets and increase your carbon footprint with the less crufty: {{#ohloh|acct_id|TYPE}} where TYPE is [Detailed|Rank|Tiny]. Taint checks aplenty. MAJ From maj at fortinbras.us Fri Nov 20 23:33:02 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 20 Nov 2009 23:33:02 -0500 Subject: [Bioperl-l] Installing Bio-perl on dreamhost via CPAN In-Reply-To: <4d7f3e450911201223w59cb308q5af7690a28697966@mail.gmail.com> References: <4d7f3e450911200221m5f39ace2hb979712115fb9d78@mail.gmail.com><20091120104445.GG31318@kunpuu.plessy.org> <4d7f3e450911201223w59cb308q5af7690a28697966@mail.gmail.com> Message-ID: <9ECC66C2F23F47469AF0F07E3F9307FC@NewLife> Maybe 'nightmarehost' is more appropriate. I've had no problems on AWS, but this may not exactly what you need. MAJ ----- Original Message ----- From: "Chu, Roy" To: Sent: Friday, November 20, 2009 3:23 PM Subject: Re: [Bioperl-l] Installing Bio-perl on dreamhost via CPAN "sounds very much like you process was killed for prolonged execution time, or memory usage. We have a daemon in place that monitors for processes that take up too much of a shared web server's resources, and this may have kicked in (and often does when trying to install packages on a shared server)." This was the explanation they had. Regarding asking their admins to install, it seems is a "they'll try to get to it but don't hold your breath situation." Hmmm, I tried some other attempts, installing 1.4.0 posed no problems. I'm not a perl guru, so I tried to increase the build cache size from the default, 10 MB, hoping that that may be the problem--can't imagine how though, since I can't imagine how big the whole package version can differ by (though honestly, I haven't checked). Whenever I try to install 1.6.1, it runs into a problem I guess after the 'make' step and lists the modules--BioPerl-1.6.0/t/Variation/SeqDiff.t BioPerl-1.6.0/t/Variation/SNP.t BioPerl-1.6.0/t/Variation/Variation_IO.t --and typically gets killed here '> Killed' Next, I tried 1.6.0, then I get this: "(I think you ran Build.PL directly, so will use CPAN to install prerequisites on demand) CPAN: Storable loaded ok (v2.12) Going to read '/home/$username/.cpan/Metadata' Killed" (everything prior works and it seems to get further along than when I try to install 1.6.1) Any insight into why this may be happening would be appreciated. Something EQUALLY appreciated would be a recommendation of a decent enough hosting service where someone has had success installing Bio-Perl. I'd try to set up my Mac web sharing feature and then try to setup the stuff locally, but I haven't yet been able to successfully get the port forwarding feature working properly on the apple airport extreme--perplexing. Next, I might just try to install via the Build.pl script. Hmm, checking the wiki, it seems I'll still be able to run remote blast and use the basic seq modules, although some discrepancies and idiosyncrasies may be expected? Any head-ups about any false assumptions by me would be greatly appreciated. Thanks in advance, Roy On Fri, Nov 20, 2009 at 5:00 AM, Chris Fields wrote: > > On Nov 20, 2009, at 4:44 AM, Charles Plessy wrote: > >> Le Fri, Nov 20, 2009 at 02:21:54AM -0800, Chu, Roy a ?crit : >>> >>> Does anyone use dreamhost as a web hosting service? I'm just curious >>> if anyone has had any luck installing the module as their daemon seems >>> to kill my process whenever I try to install it. Dreamhost tech >>> support attributes it to either exceeding the allocated memory cache >>> or exceeding the processing time. I tried to nice the process, but >>> that didn't help for me. Any luck or experience in resolving this >>> would be much appreciated. I suppose my next attempt would be to try >>> installing it directly and hope I don't need root... >> >> Dear Roy, >> >> DreamHost uses Debian, so you can suggest them to install the Debian package. >> If you are in contact with the tech service, do not hesitate to tell them to >> contact me if they are interested by a backport of the 1.6.0 package. For >> version 1.6.1, it may be more difficult as it depends on perl 5.10.1. > > Any reason why this is so? We specify compatibility back to 5.6.1. > > Alex mentioned the reliance on the specific Extutils::Manifest version. The > version requested has an important bug fix, is present on CPAN, and is > backwards-compatible to 5.6.1. It should be fairly easy to request that as a > separate package. > > A strict requirement for perl 5.10.1 doesn't make sense in that light, unless > said perl maintainer can enlighten us as to why this is an issue? This one may > require a ranty blog post. > >> PS: if you propse to install BioPerl as a feature in the Dreamhost panel, I >> will vote for it :) >> >> Have a nice day, >> >> -- >> Charles Plessy >> Debian Med packaging team, >> http://www.debian.org/devel/debian-med >> Tsurumi, Kanagawa, Japan > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Fri Nov 20 23:38:23 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 20 Nov 2009 22:38:23 -0600 Subject: [Bioperl-l] Excessive CPU use by various Bioperl sites In-Reply-To: References: Message-ID: <8163BC62-3F3E-4936-AAA9-61A4FB307C99@illinois.edu> Robert, Not sure why you're seeing that, but the HOWTO (and, AFAIK, the wiki in general) do not use JS, unless there is a specific addition I'm unaware of. Now, the site wiki was recently 'parasited' for redirects, which may be the culprit, but this is now fixed. Can you at least retest to see if this persists? Anyone else know about this? chris On Nov 20, 2009, at 7:40 PM, Robert Bradbury wrote: > I run a Linux system which is in a gradual process of evolution from the > default Linux browsers (Galeon, Epiphany, etc.) through Firefox (better) to > Google's Chromium (IMO, perhaps the best so far). Chromium allows one to > create a process per tab/URL so one can effectively track what it is doing. > It also allows one to track the machine usage of these processes (through > the Developer > Task manager [shift-escape keyboard] option) which though > expensive in terms of overhead allows one to track offending windows (in > terms of memory or CPU use). My processor recently jumped from a typical > 700 MHz to 1.4 GHz speed (using the Linux Ondemand scheduler - which saves > ~20 W at the wall outlet -- I've measured it) to the full tilt 2.8 GHz the > CPU is capable of. Looking at the chrome task manager I was not surprised > to find the NY Times high on the list (they are pushing content, esp. using > Javascript) but much to my dismay the Jalview and Howto:Trees:Bioperl > appeared to be high on the list. Now I am forced to ask myself *why* sites > which are simply distributing static information are eating up CPU on my > machine! This is a fundamental flaw in the architecture of the sites -- > wherein there should be conscious efforts to minimize user-CPU use (or avoid > Javascript entirely). This would not be a problem if I were using Firefox > as I can easily use NoScript to block Javacscript from non-approved sites. > But it raises the question of when one should allow Javascript to run (one > would "normally" approve academic sites by default) when even the academic > sites are abusing my CPU. There needs to be much greater awareness both on > the part of software distributors and software consumers that it is *MY* CPU > and *MY* Electricty and *MY* contribution to global warming. And the > developers/distributors should not be sucking down those resources without > first saying "May I?" and I have the option of saying "No you may not." > There is enough we can do productively (running low homology blast > searches) without engaging in endless wheel spinning of Javascripts or > looped GIFs. > > Robert > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From sdavis2 at mail.nih.gov Sat Nov 21 00:11:34 2009 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Fri, 20 Nov 2009 21:11:34 -0800 Subject: [Bioperl-l] Excessive CPU use by various Bioperl sites In-Reply-To: <8163BC62-3F3E-4936-AAA9-61A4FB307C99@illinois.edu> References: <8163BC62-3F3E-4936-AAA9-61A4FB307C99@illinois.edu> Message-ID: <264855a00911202111u4b1f1020r4aa6e0e9b0ea61@mail.gmail.com> On Fri, Nov 20, 2009 at 8:38 PM, Chris Fields wrote: > Robert, > > Not sure why you're seeing that, but the HOWTO (and, AFAIK, the wiki in > general) do not use JS, unless there is a specific addition I'm unaware of. > Now, the site wiki was recently 'parasited' for redirects, which may be the > culprit, but this is now fixed. Can you at least retest to see if this > persists? > > Anyone else know about this? > > The page in question does include javascript, it appears from the source. This is a function of using mediawiki, though, I believe and not something specific to that page. Sean From cjfields at illinois.edu Sat Nov 21 00:20:37 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 20 Nov 2009 23:20:37 -0600 Subject: [Bioperl-l] Excessive CPU use by various Bioperl sites In-Reply-To: <264855a00911202111u4b1f1020r4aa6e0e9b0ea61@mail.gmail.com> References: <8163BC62-3F3E-4936-AAA9-61A4FB307C99@illinois.edu> <264855a00911202111u4b1f1020r4aa6e0e9b0ea61@mail.gmail.com> Message-ID: On Nov 20, 2009, at 11:11 PM, Sean Davis wrote: > On Fri, Nov 20, 2009 at 8:38 PM, Chris Fields wrote: > >> Robert, >> >> Not sure why you're seeing that, but the HOWTO (and, AFAIK, the wiki in >> general) do not use JS, unless there is a specific addition I'm unaware of. >> Now, the site wiki was recently 'parasited' for redirects, which may be the >> culprit, but this is now fixed. Can you at least retest to see if this >> persists? >> >> Anyone else know about this? >> >> > The page in question does include javascript, it appears from the source. > This is a function of using mediawiki, though, I believe and not something > specific to that page. > > Sean Sean, thanks for pointing that out. chris From robert.bradbury at gmail.com Sat Nov 21 13:26:05 2009 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Sat, 21 Nov 2009 13:26:05 -0500 Subject: [Bioperl-l] Bio::DB::EUtilities question In-Reply-To: References: Message-ID: It sounds like NCBI may be counting frequency of requests, how much data they send or something similar. Are you delaying the time between fetches? The code I've seen typically sleeps for a few seconds each time around a loop. You might try longer delays between fetches and see if that gets you any more data. Alternatively perhaps the libraries aren't reusing the TCP/IP connection properly. Is there a difference between the amount of memory on the machines? Have you watched the size of the process to see if it grows over time? I think the bug which prevented me from fetching a not-so-large genome from a few months ago (eating up 3GB of memory in the process) has not been resolved. If so that could be your problem. Robert On Fri, Nov 20, 2009 at 12:44 PM, Alessandra wrote: > > > I'm testing Bio::DB::EUtilities - webagent which interacts with and > retrieves data from NCBI's eUtils. My perl script works but it works > only if I request less than ~450 times get_Response function.. else I > have got this error message: > > ------------- EXCEPTION ------------- > MSG: Response Error > Can't connect to eutils.ncbi.nlm.nih.gov:80 (connect: No route to host) > STACK Bio::DB::GenericWebAgent::get_Response > /usr/local/share/perl/5.10.0/Bio/DB/GenericWebAgent.pm:215 > STACK toplevel ./wget4gbk.pl:77 > From cjfields at illinois.edu Sat Nov 21 14:19:24 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 21 Nov 2009 13:19:24 -0600 Subject: [Bioperl-l] Bio::DB::EUtilities question In-Reply-To: References: Message-ID: <837CE7E7-E625-4285-AD54-06FD168C0DF3@illinois.edu> NCBI has specific rules about the repeated queries to its servers: http://eutils.ncbi.nlm.nih.gov/#UserSystemRequirements Acc. to that, if you are making over 100 requests at peak times you will run into problems (they'll probably temp-block your IP), even if the timeout is much shorter now (it's 3 requests/second, whereas a year or two ago it was once every 3 sec). In general it's best to run something like this during off-hours. The actual limit on number of server requests is one specific part of Bio::DB::EUtilities that hasn't been added yet, but is tentatively planned. chris On Nov 21, 2009, at 12:26 PM, Robert Bradbury wrote: > It sounds like NCBI may be counting frequency of requests, how much data > they send or something similar. Are you delaying the time between fetches? > The code I've seen typically sleeps for a few seconds each time around a > loop. You might try longer delays between fetches and see if that gets you > any more data. > > Alternatively perhaps the libraries aren't reusing the TCP/IP connection > properly. Is there a difference between the amount of memory on the > machines? Have you watched the size of the process to see if it grows over > time? I think the bug which prevented me from fetching a not-so-large > genome from a few months ago (eating up 3GB of memory in the process) has > not been resolved. If so that could be your problem. > > Robert > > On Fri, Nov 20, 2009 at 12:44 PM, Alessandra > wrote: >> >> >> I'm testing Bio::DB::EUtilities - webagent which interacts with and >> retrieves data from NCBI's eUtils. My perl script works but it works >> only if I request less than ~450 times get_Response function.. else I >> have got this error message: >> >> ------------- EXCEPTION ------------- >> MSG: Response Error >> Can't connect to eutils.ncbi.nlm.nih.gov:80 (connect: No route to host) >> STACK Bio::DB::GenericWebAgent::get_Response >> /usr/local/share/perl/5.10.0/Bio/DB/GenericWebAgent.pm:215 >> STACK toplevel ./wget4gbk.pl:77 >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Sat Nov 21 21:58:37 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 21 Nov 2009 20:58:37 -0600 Subject: [Bioperl-l] BioPerl on FLOSS Weekly Message-ID: <05EB7AF4-8A20-4046-A585-FBF41EA8350A@illinois.edu> Jason and I were recently interviewed (Wednesday!) about BioPerl for FLOSS Weekly by Randal Schwartz, Leo Laporte, Marc Pelletier, and Kirsten Sanford. The interview is now available online, so get your favorite flavor (MP3, podcast) here: http://twit.tv/floss96 Enjoy! chris and jason From adsj at novozymes.com Sun Nov 22 07:37:40 2009 From: adsj at novozymes.com (Adam =?iso-8859-1?Q?Sj=F8gren?=) Date: Sun, 22 Nov 2009 13:37:40 +0100 Subject: [Bioperl-l] BioPerl on FLOSS Weekly In-Reply-To: <05EB7AF4-8A20-4046-A585-FBF41EA8350A@illinois.edu> (Chris Fields's message of "Sat, 21 Nov 2009 20:58:37 -0600") References: <05EB7AF4-8A20-4046-A585-FBF41EA8350A@illinois.edu> Message-ID: <87aaye91m3.fsf@topper.koldfront.dk> On Sat, 21 Nov 2009 20:58:37 -0600, Chris wrote: > Jason and I were recently interviewed (Wednesday!) about BioPerl for > FLOSS Weekly by Randal Schwartz, Leo Laporte, Marc Pelletier, and > Kirsten Sanford. Great! How about linking to it on bioperl.org? :-), Adam -- Adam Sj?gren adsj at novozymes.com From cjfields at illinois.edu Sun Nov 22 15:30:01 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 22 Nov 2009 14:30:01 -0600 Subject: [Bioperl-l] BioPerl on FLOSS Weekly In-Reply-To: <87aaye91m3.fsf@topper.koldfront.dk> References: <05EB7AF4-8A20-4046-A585-FBF41EA8350A@illinois.edu> <87aaye91m3.fsf@topper.koldfront.dk> Message-ID: <2F050081-8B44-4B4C-82D2-7AC71F156588@illinois.edu> On Nov 22, 2009, at 6:37 AM, Adam Sj?gren wrote: > On Sat, 21 Nov 2009 20:58:37 -0600, Chris wrote: > >> Jason and I were recently interviewed (Wednesday!) about BioPerl for >> FLOSS Weekly by Randal Schwartz, Leo Laporte, Marc Pelletier, and >> Kirsten Sanford. > > Great! > > How about linking to it on bioperl.org? > > > :-), > > Adam > > -- > Adam Sj?gren > adsj at novozymes.com Now posted via O|B|F News; I'll try to make that feed more prominent on the main page. Since this is the second such interview (Jason did one a few years back for PerlCast), I'm thinking we need a media page of some sort. chris From maj at fortinbras.us Sun Nov 22 15:48:39 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sun, 22 Nov 2009 15:48:39 -0500 Subject: [Bioperl-l] BioPerl on FLOSS Weekly In-Reply-To: <2F050081-8B44-4B4C-82D2-7AC71F156588@illinois.edu> References: <05EB7AF4-8A20-4046-A585-FBF41EA8350A@illinois.edu><87aaye91m3.fsf@topper.koldfront.dk> <2F050081-8B44-4B4C-82D2-7AC71F156588@illinois.edu> Message-ID: <247658CC6D9A4529B281F4482BD3E4BD@NewLife> We do have http://www.bioperl.org/wiki/Category:BioPerl_Media -- ----- Original Message ----- From: "Chris Fields" To: "Adam Sj?gren" Cc: Sent: Sunday, November 22, 2009 3:30 PM Subject: Re: [Bioperl-l] BioPerl on FLOSS Weekly On Nov 22, 2009, at 6:37 AM, Adam Sj?gren wrote: > On Sat, 21 Nov 2009 20:58:37 -0600, Chris wrote: > >> Jason and I were recently interviewed (Wednesday!) about BioPerl for >> FLOSS Weekly by Randal Schwartz, Leo Laporte, Marc Pelletier, and >> Kirsten Sanford. > > Great! > > How about linking to it on bioperl.org? > > > :-), > > Adam > > -- > Adam Sj?gren > adsj at novozymes.com Now posted via O|B|F News; I'll try to make that feed more prominent on the main page. Since this is the second such interview (Jason did one a few years back for PerlCast), I'm thinking we need a media page of some sort. chris _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From jardim.rodrigo at gmail.com Sun Nov 22 11:06:40 2009 From: jardim.rodrigo at gmail.com (Rodrigo Jardim) Date: Sun, 22 Nov 2009 14:06:40 -0200 Subject: [Bioperl-l] Problems with Genbank Proteins File Message-ID: I have been problem to parser genbank protein file. I think that because this file have a other order of fields. For example: In most general genbank files: ======================== LOCUS AA399704 183 bp mRNA linear EST 03-MAR-2000 ACCESSION AA399704 VERSION AA399704.1 GI:2053305 DEFINITION TEUF0001 T.cruzi epimastigote non-normalized cDNA Library Trypanosoma cruzi cDNA clone 1 5' similar to T. cruzi gene for histone H2b (X60982), mRNA sequence. KEYWORDS EST. SOURCE Trypanosoma cruzi In genbank protein files: =================== LOCUS XP_628849 510 aa linear INV 31-OCT-2008 DEFINITION hypothetical protein [Dictyostelium discoideum AX4]. ACCESSION XP_628849 VERSION XP_628849.1 GI:66799847 DBSOURCE REFSEQ: accession XM_628847.1 KEYWORDS . SOURCE Dictyostelium discoideum AX4. When I try to parser, Bioperl abort with message error. Any ideas? Thanks all, -- Atc, Rodrigo Jardim jardim.rodrigo at gmail.com From biopython at maubp.freeserve.co.uk Mon Nov 23 12:36:36 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 23 Nov 2009 17:36:36 +0000 Subject: [Bioperl-l] Problems with Genbank Proteins File In-Reply-To: References: Message-ID: <320fb6e00911230936ofb9d897rbd45abb73a361250@mail.gmail.com> On Sun, Nov 22, 2009 at 4:06 PM, Rodrigo Jardim wrote: > I have been problem to parser genbank protein file. I think that because > this file have a other order of fields. For example: > > ... > > When I try to parser, Bioperl abort with message error. > > Any ideas? There are some important bits of information missing - what is the error message, and what version of BioPerl are you using? Peter From maj at fortinbras.us Mon Nov 23 12:58:46 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 23 Nov 2009 12:58:46 -0500 Subject: [Bioperl-l] building samtools/Bio::DB::Sam on cygwin Message-ID: Hi All-- I've had some hard-won success installing samtools and Lincoln's Bio::DB::Sam under cygwin; thought some on the list would be able to use my notes. (Yes, Jason, I'm working on Bio::Tools::Run::BWA...) (To get the current samtools, ping http://sourceforge.net/projects/samtools/files/samtools/0.1.7/samtools-0.1.7a.tar.bz2/download ) * Getting samtools to make from scratch in cygwin The following diff details the changes to the samtools Makefile I made by hand. The key points are -D_WIN32 and the additional variable LFLAGS and its interpolations. To get the linker to see libgcc libstdc++ I needed to add symlinks from /lib to the correct files in /lib/gcc/i386-pc-cygwin/4.3.2/. Your gcc version may differ. --- ../old/samtools-0.1.7a/Makefile 2009-11-16 10:13:43.000000000 -0500 +++ Makefile 2009-11-23 12:14:18.529000000 -0500 @@ -1,16 +1,18 @@ CC= gcc CFLAGS= -g -Wall -O2 #-m64 #-arch ppc -DFLAGS= -D_FILE_OFFSET_BITS=64 -D_USE_KNETFILE -D_CURSES_LIB=1 +LFLAGS= -lws2_32 -lgcc -lcygwin -lbz2 -lz -lstdc++ +DFLAGS= -D_WIN32 -D_FILE_OFFSET_BITS=64 -D_CURSES_LIB=1 LOBJS= bgzf.o kstring.o bam_aux.o bam.o bam_import.o sam.o bam_index.o \ bam_pileup.o bam_lpileup.o bam_md.o glf.o razf.o faidx.o knetfile.o \ bam_sort.o sam_header.o AOBJS= bam_tview.o bam_maqcns.o bam_plcmd.o sam_view.o \ bam_rmdup.o bam_rmdupse.o bam_mate.o bam_stat.o bam_color.o \ bamtk.o kaln.o @@ -36,13 +38,13 @@ $(AR) -cru $@ $(LOBJS) samtools:lib $(AOBJS) - $(CC) $(CFLAGS) -o $@ $(AOBJS) -lm $(LIBPATH) $(LIBCURSES) -lz -L. -lbam + $(CC) $(CFLAGS) -o $@ $(AOBJS) -Xlinker --enable-auto-import -lm $(LIBPATH) $(LIBCURSES) -lz -L. -lbam $(LFLAGS) razip:razip.o razf.o knetfile.o - $(CC) $(CFLAGS) -o $@ razf.o razip.o knetfile.o -lz + $(CC) $(CFLAGS) -o $@ razf.o razip.o knetfile.o -lz -lm -lws2_32 bgzip:bgzip.o bgzf.o - $(CC) $(CFLAGS) -o $@ bgzf.o bgzip.o -lz + $(CC) $(CFLAGS) -o $@ bgzf.o bgzip.o -lz -lm -lws2_32 razip.o:razf.h bam.o:bam.h razf.h bam_endian.h kstring.h sam_header.h * Getting Bio::DB::Sam to compile and install Bio::DB::Sam requires not the samtools.exe, but the bam library created during the samtools build, as well as all the samtools header files. Create a symlink in /lib to libbam.a in the build directory (or copy libbam.a up to /lib), and create symlinks or copy *.h into /usr/include. Then in cygwin bash shell $ cpan cpan> install Bio::DB::Sam should fly. Hope someone finds this useful. These mods led me to a successful Bio::DB::Sam install--have not yet checked original code based on Bio::DB::Sam. If they don't work for you, reply to the list. cheers, MAJ From jcline at ieee.org Mon Nov 23 14:13:26 2009 From: jcline at ieee.org (Jonathan Cline) Date: Mon, 23 Nov 2009 13:13:26 -0600 Subject: [Bioperl-l] Installing Bio-perl on dreamhost via CPAN In-Reply-To: References: Message-ID: <4B0ADED6.8040901@ieee.org> Dreamhost has terrible reliability. I have stats going back years on a standard dreamhost hosting account (non-dedicated server), and on some days the web server doesn't respond. Dreamhost service is OK for a hobby blog however it is definitely *not* suitable for anything real. Add in latency, arbitrary account limits/restrictions, etc, and as a hosting service, it is a bad idea to host a project there. Although some users apparently get lucky with server allocation and end up on a "good server", the provider can change this at any time as well. I think more typically, the accounts users don't notice, since most are simple bloggers. Here's a data snip that illustrates the problem with a typical dreamhost account: ---------------------------------------------------------------------- date uptime dns connect request ttfb ttlb 2008-08-05 91.40 0.000 0.528 0.528 2.257 1.619 2008-08-04 89.13 0.002 0.301 0.301 1.302 0.971 2008-08-03 94.62 0.000 0.567 0.567 1.506 0.913 2008-08-02 100.00 0.000 0.335 0.335 1.475 1.079 2008-08-01 100.00 0.000 0.310 0.310 1.587 0.825 2008-07-31 93.55 0.023 0.386 0.386 1.280 0.759 2008-07-30 100.00 0.000 0.345 0.345 1.373 0.860 2008-07-29 100.00 0.000 0.358 0.358 1.335 0.757 2008-07-28 100.00 0.000 0.327 0.327 1.462 0.896 2008-07-27 100.00 0.000 0.292 0.292 1.410 0.966 2008-07-26 100.00 0.000 0.283 0.283 1.280 0.815 2008-07-25 100.00 0.000 0.297 0.297 1.231 0.853 2008-07-24 100.00 0.000 0.362 0.362 1.258 0.699 2008-07-23 100.00 0.000 0.339 0.339 1.270 0.785 ---------------------------------------------------------------------- minimum 89.13 0.000 0.283 0.283 1.231 0.699 maximum 100.00 0.023 0.567 0.567 2.257 1.619 average 97.76 0.002 0.359 0.359 1.430 0.914 ---------------------------------------------------------------------- Or this month: ---------------------------------------------------------------------- date uptime dns connect request ttfb ttlb 2009-11-11 100.00 0.011 0.097 0.097 1.260 1.638 2009-11-10 100.00 0.008 0.094 0.094 1.285 1.647 2009-11-09 100.00 0.008 0.094 0.094 1.494 1.872 2009-11-08 100.00 0.015 0.101 0.101 1.509 1.894 2009-11-07 100.00 0.006 0.092 0.092 1.453 1.831 2009-11-06 100.00 0.011 0.097 0.097 1.500 1.882 2009-11-05 97.80 0.012 0.097 0.097 1.445 1.806 2009-11-04 100.00 0.010 0.096 0.096 1.235 1.605 2009-11-03 95.65 0.007 0.093 0.093 1.266 1.612 2009-11-02 100.00 0.010 0.096 0.096 1.267 1.637 2009-11-01 100.00 0.007 0.093 0.093 1.311 1.692 2009-10-31 100.00 0.009 0.095 0.095 1.225 1.594 2009-10-30 100.00 0.009 0.095 0.095 1.364 1.739 2009-10-29 100.00 0.017 0.103 0.103 1.121 1.505 ---------------------------------------------------------------------- minimum 95.65 0.006 0.092 0.092 1.121 1.505 maximum 100.00 0.017 0.103 0.103 1.509 1.894 average 99.53 0.010 0.096 0.096 1.338 1.711 ---------------------------------------------------------------------- ## Jonathan Cline ## jcline at ieee.org ## Mobile: +1-805-617-0223 ######################## From cjfields at illinois.edu Mon Nov 23 22:19:02 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 23 Nov 2009 21:19:02 -0600 Subject: [Bioperl-l] verbosity and error stack, was accessing EMBL database In-Reply-To: <3277368F-615A-4DD3-B9B3-5D32A5EEEE98@sbc.su.se> References: <475F74057A618245A773CD325E105D1E033334AC@phy-srv01.physiol.physiology.wisc.edu> <3277368F-615A-4DD3-B9B3-5D32A5EEEE98@sbc.su.se> Message-ID: <167D2408-653E-4DF5-BCD7-134CE2549E44@illinois.edu> Okay, so I think it's feasible to add this into trunk. I like the idea of optionally having a log class, if someone comes up with a nice way of adding it in I would be for it. chris On Nov 20, 2009, at 4:15 AM, Dave Messina wrote: > Chris, I took a look at how you implemented this in Biome -- very nice! > > >> I like this verbose/strict separability a lot. Should we go for it? > > Me too. So yes, I think so. > > >> We could even allow finer-grained control of verbosity (states which cover all combinations) w/o affecting strictness. > > > Perhaps this is a job for Log::Log4Perl or Log::Dispatch? > http://search.cpan.org/~mschilli/Log-Log4perl-1.25/lib/Log/Log4perl.pm > http://search.cpan.org/~drolsky/Log-Dispatch-2.26/lib/Log/Dispatch.pm > > > That might be overkill, though. > > Dave > From David.Messina at sbc.su.se Tue Nov 24 11:18:22 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 24 Nov 2009 17:18:22 +0100 Subject: [Bioperl-l] verbosity and error stack, was accessing EMBL database In-Reply-To: <167D2408-653E-4DF5-BCD7-134CE2549E44@illinois.edu> References: <475F74057A618245A773CD325E105D1E033334AC@phy-srv01.physiol.physiology.wisc.edu> <3277368F-615A-4DD3-B9B3-5D32A5EEEE98@sbc.su.se> <167D2408-653E-4DF5-BCD7-134CE2549E44@illinois.edu> Message-ID: <3FD2086D-062F-4706-9DC8-2A53224C4913@sbc.su.se> > I like the idea of optionally having a log class, if someone comes up with a nice way of adding it in I would be for it. My suggestion of the logging modules was actually to handle the various levels of verbose output -- I think both of the ones I mentioned "log" to STDERR by default. But of course a nice side effect of using such a logging module is that it would allow optional logging to a file, too. Dave From paolo.pavan at gmail.com Tue Nov 24 14:28:09 2009 From: paolo.pavan at gmail.com (Paolo Pavan) Date: Tue, 24 Nov 2009 20:28:09 +0100 Subject: [Bioperl-l] Bio::Tools::Run::Cap3 usage question Message-ID: <56be91b60911241128s52613a56u99e5b1cb3ba8d19a@mail.gmail.com> Dear, I'm confused about the proper usage of the module Bio::Tools::Run::Cap3. As documented in the pod, the run(@seqs) method returns the cap3 report file while I expect to return a Bio::Assembly object, consistently with other Bio::Tools::Run classes. However, I went around this by getting from the factory object the location and the names of the temp output files (actually accessing a private property, although) and reading them via the Assembly::IO system. I was just wandering what is the proper designed way to do this job. Thank you for enlighten the way! Paolo From Russell.Smithies at agresearch.co.nz Tue Nov 24 17:04:31 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Wed, 25 Nov 2009 11:04:31 +1300 Subject: [Bioperl-l] Bio::DB::Fasta Message-ID: <18DF7D20DFEC044098A1062202F5FFF32B63085409@exchsth.agresearch.co.nz> Is there any way to pass a filename to Bio::DB::Fasta for the location of where to write the directory.index? It's writing in the same dir as the fasta but I'd rather have it write in /tmp as it's part of a web app. Thanx, Russell ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From Russell.Smithies at agresearch.co.nz Tue Nov 24 17:21:52 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Wed, 25 Nov 2009 11:21:52 +1300 Subject: [Bioperl-l] Bio::DB::Fasta In-Reply-To: <4296CD1039FC44B89034A1FD3E6721F3@NewLife> References: <18DF7D20DFEC044098A1062202F5FFF32B63085409@exchsth.agresearch.co.nz> <4296CD1039FC44B89034A1FD3E6721F3@NewLife> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32B6308542C@exchsth.agresearch.co.nz> That's what I ended up doing. Also, there's no "obvious" way to index a single file so I ended putting the filename in the glob parameter. my $db = Bio::DB::Fasta->new( "$tmp", -glob => "test.faa", -reindex => 1 ); --Russell > -----Original Message----- > From: Mark A. Jensen [mailto:maj at fortinbras.us] > Sent: Wednesday, 25 November 2009 11:19 a.m. > To: Smithies, Russell; 'bioperl-l' > Subject: Re: [Bioperl-l] Bio::DB::Fasta > > The code (method index_dir() ) seems to expect all the fasta files to be > contained in that directory. Looks hairy; what about creating symlinks to > your > fasta files in a /tmp subdir and calling new() with that subdir? > ----- Original Message ----- > From: "Smithies, Russell" > To: "'bioperl-l'" > Sent: Tuesday, November 24, 2009 5:04 PM > Subject: [Bioperl-l] Bio::DB::Fasta > > > > Is there any way to pass a filename to Bio::DB::Fasta for the location > of > > where to write the directory.index? > > It's writing in the same dir as the fasta but I'd rather have it write > in /tmp > > as it's part of a web app. > > > > Thanx, > > > > Russell > > > > > > ======================================================================= > > Attention: The information contained in this message and/or attachments > > from AgResearch Limited is intended only for the persons or entities > > to which it is addressed and may contain confidential and/or privileged > > material. Any review, retransmission, dissemination or other use of, or > > taking of any action in reliance upon, this information by persons or > > entities other than the intended recipients is prohibited by AgResearch > > Limited. If you have received this message in error, please notify the > > sender immediately. > > ======================================================================= > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > From maj at fortinbras.us Tue Nov 24 17:18:51 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 24 Nov 2009 17:18:51 -0500 Subject: [Bioperl-l] Bio::DB::Fasta In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32B63085409@exchsth.agresearch.co.nz> References: <18DF7D20DFEC044098A1062202F5FFF32B63085409@exchsth.agresearch.co.nz> Message-ID: <4296CD1039FC44B89034A1FD3E6721F3@NewLife> The code (method index_dir() ) seems to expect all the fasta files to be contained in that directory. Looks hairy; what about creating symlinks to your fasta files in a /tmp subdir and calling new() with that subdir? ----- Original Message ----- From: "Smithies, Russell" To: "'bioperl-l'" Sent: Tuesday, November 24, 2009 5:04 PM Subject: [Bioperl-l] Bio::DB::Fasta > Is there any way to pass a filename to Bio::DB::Fasta for the location of > where to write the directory.index? > It's writing in the same dir as the fasta but I'd rather have it write in /tmp > as it's part of a web app. > > Thanx, > > Russell > > > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From florent.angly at gmail.com Tue Nov 24 17:54:48 2009 From: florent.angly at gmail.com (Florent Angly) Date: Tue, 24 Nov 2009 14:54:48 -0800 Subject: [Bioperl-l] Bio::Tools::Run::Cap3 usage question In-Reply-To: <56be91b60911241128s52613a56u99e5b1cb3ba8d19a@mail.gmail.com> References: <56be91b60911241128s52613a56u99e5b1cb3ba8d19a@mail.gmail.com> Message-ID: <4B0C6438.8070405@gmail.com> Hi Paolo, It turns out that there is no standard for what is to be passed to the Bio::Tools::Run wrappers and returned by them. I noticed the inconsistency between the assembly wrappers recently while implementing support for new wrapper. I implemented inital support for additional de novo assembly programs in BioPerl (454 Newbler and Minimo) a couple of weeks ago and Mark Jensen added support for Maq, a program that assembler reads against a reference. In the process, all the assembly wrappers were changed to take the same type of input data (a FASTA sequence or an array reference of sequence objects) and return one of the following: * a Bio::Assembly::Scaffold object (the default), or * a Bio::Assembly::IO object, or * the name of a file for the output of the assembler Use the out_type method to set up which output you want, e.g.: $factory->out_type('Bio::Assembly::IO'); or $factory->out_type('cap3_results.ace'); You'll have to use the code in the bioperl-run subversion if you want to use these new features. Cheers, Florent Paolo Pavan wrote: > Dear, > I'm confused about the proper usage of the module Bio::Tools::Run::Cap3. > As documented in the pod, the run(@seqs) method returns the cap3 report file > while I expect to return a Bio::Assembly object, consistently with other > Bio::Tools::Run classes. > However, I went around this by getting from the factory object the location > and the names of the temp output files (actually accessing a private > property, although) and reading them via the Assembly::IO system. > I was just wandering what is the proper designed way to do this job. > > Thank you for enlighten the way! > Paolo > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From roychu at gmail.com Tue Nov 24 18:00:58 2009 From: roychu at gmail.com (Roy) Date: Tue, 24 Nov 2009 15:00:58 -0800 Subject: [Bioperl-l] Remote Blast - same script but different results Message-ID: <4d7f3e450911241500y7df305acq1d03819ea1ec7d3e@mail.gmail.com> Hi bioperl community, I've tried searching the old lists to see if this topic has been covered, and perhaps this question arises from my own lack of familiarity with BLAST, but (from my perl script listed below) I get different results with remote blast when I call my script (that is, I will either get hits or no hits at all). I'll call the script one time, and get no hits. Then call the script again (with the same parameters), and get the same several hits that I may have before after having gotten no hits. I use a subroutine to parse the blast report information, and then I use a boolean to indicate whether results are returned or not. Any insight into what I may have missed would be appreciated. Short question, is this behavior typical? My understanding of how BLAST works is that it shouldn'tl... Thanks in advance, Roy #!/usr/bin/perl -w use strict; use warnings; use Carp; use Bio::Perl; use CGI; use Bio::SeqIO; use Bio::SearchIO; use Bio::SeqFeature::Generic; use Bio::Restriction::Analysis; use Bio::Tools::Run::RemoteBlast; use Bio::SimpleAlign; use Bio::AlignIO; use Bio::LocatableSeq; my $five_seqobj = Bio::Seq->new( -seq => 'ATTCCCACCGGGACCTGCGGGGCTGAGTGCCCTTCTCGGTTGCTGCCGCTGAGGAGCCCGCCCAGCCAGCCAGGGCCGCGAGGCCGAGGCCAGGCCGCAGCCCAGGAGCCGCCCCACCGCAGCTGGCGATGGACCCGCCGAGGCCCGCGCTGCTGGCGCTGCTGGCGCTGCCTGCGCTGCTGCTGCTGCTGCTGGCGGGCGCCAGGGCCG', -display_id => 'genomic_a', -alphabet => 'dna', ); my $three_seqobj = Bio::Seq->new( -seq => 'GTGAGTGCGCGGCCGCTCTGCGGGCGCAGAGGGAGCGGGAGGGAGCCGGCGGCACGAGGTTGGCCGGGGCAGCCTGGGCCTAGGCCAGAGGGAGGGCAGCCACAGGGTCCAGGGCGAGTGGGGGGATTGGACCAGCTGGCGGCCCCTGCAGGCTCAGGATGGGGGGCGCGGGATGGAGGGGCTGAGGAGGGGGTCTCCGGAGCCTGCCTC', -display_id => 'genomic_b', -alphabet => 'dna', ); my @params = ( '-program' => 'blastn', '-database' => 'refseq_genomic', '-expect' => '10', '-readmethod' => 'blastxml' ); $Bio::Tools::Run::RemoteBlast::HEADER{'MEGABLAST'} = 'YES'; $Bio::Tools::Run::RemoteBlast::HEADER{'PERC_IDENT'} = 75; $Bio::Tools::Run::RemoteBlast::HEADER{'FORMAT_TYPE'} = 'XML'; $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Homo sapiens [ORGN]'; $Bio::Tools::Run::RemoteBlast::HEADER{'HITLIST_SIZE'} = 100; # Put: limit number of hits my $factory_a = Bio::Tools::Run::RemoteBlast->new(@params); $factory_a->retrieve_parameter('FORMAT_TYPE', 'XML'); my $hits_a; my $hits_b; my $r; my $bool_hit; print "Submitting BLAST query - 5' end (MEGABLAST = YES)\n"; $Bio::Tools::Run::RemoteBlast::HEADER{'MEGABLAST'} = 'YES'; $r = $factory_a->submit_blast($a_seqobj); $bool_hit = fetch_blast_report($factory_a); unless ($bool_hit) { print "\nNo hits\n"; print "Re-submitting BLAST query - 5' end (MEGABLAST = NO)\n"; sleep 5; $Bio::Tools::Run::RemoteBlast::HEADER{'MEGABLAST'} = 'NO'; $r = $factory_a->submit_blast($a_seqobj); ($bool_hit, $hits_a) = fetch_blast_report($factory_a); if ($bool_hit == 0) { print "No hits\n"; } sleep 5; } my $factory_b = Bio::Tools::Run::RemoteBlast->new(@params); print "\n--------------------------------------------------\n\n"; print "Submitting BLAST query - 3' end (MEGABLAST = YES)\n"; $Bio::Tools::Run::RemoteBlast::HEADER{'MEGABLAST'} = 'YES'; $r = $remote_blast_three->submit_blast($b_seqobj); $bool_hit = fetch_blast_report($factory_b); unless ($bool_hit) { print " No hits\n"; print "Re-submitting BLAST query - 3' end (MEGABLAST = NO)\n"; sleep 5; $Bio::Tools::Run::RemoteBlast::HEADER{'MEGABLAST'} = 'NO'; $r = $factory_b->submit_blast($b_seqobj); ($bool_hit, $hits_b) = fetch_blast_report($factory_b); if ($bool_hit == 0) { print " No hits\n"; } sleep 5; } print "\nbye\n\n"; print "$hits_a\n$hits_b\n"; exit; sub fetch_blast_report { my ($factory) = @_; my $v = 1; my $bool_hit = 0; my $hits = ''; print STDERR "waiting..."; while (my @rids = $factory->each_rid) { foreach my $rid (@rids) { print STDERR "."; my $rc = $factory->retrieve_blast($rid); # retrieves blast report from remote blast queue, # returns -1 on error, 0 on 'job not finished', Bio::SearchIO object # args, remote blast id (rid) if (!ref($rc)) { # if not empty string, ref EXPR returns a non-empty string if EXPR is a reference if ($rc < 0) { $factory->remove_rid($rid); } print STDERR "." if ($v > 0); ##################################################################################### is this printing out as multiple dots? when and why? sleep 5; } else { $bool_hit = 1; my $result = $rc->next_result(); unless ($result->num_hits > 0) { $bool_hit = 0; } # returns: Bio::Search::Result::ResultI object $factory->remove_rid($rid); print "\ndatabase:\t", $result->database_name,"\n"; print "query name:\t", $result->query_name,"\n"; print "query length\t", $result->query_length,"\n"; print "num hits\t", $result->num_hits,"\n"; if ($result->num_hits) { # $result->hits returns an array of hits # $results->no_hits_found, boolean vs $#{@hits} ie. filtering\ while (my $hit = $result->next_hit) { print "\nhit name:\t", $hit->name,"\n"; print "description:\t", $hit->description,"\n"; print "locus:\t", $hit->locus,"\n"; print "algorithm: ", $hit->algorithm,"\thit length: ", $hit->length,"\thit ranking: ", $hit->rank,"\n"; while (my $hsp = $hit->next_hsp) { print "evalue: ", $hsp->evalue,"\tscore: ", $hsp->score,"\tpercent_id: ", $hsp->percent_identity,"\n"; print "query_start: ", $hsp->query->start,"\tquery_end: ", $hsp->query->end; print "\tquery_length: ", $hsp->query->length,"\tquery_strand: ", $hsp->strand('query'), "\n"; print "subject_start: ", $hsp->subject->start,"\tsubject_end: ", $hsp->subject->end; print "\tsubject_length: ", $hsp->subject->length,"\tsubject_strand: ", $hsp->strand('subject'), "\n\n"; my $aln = $hsp->get_aln; if ($aln->is_flush) { foreach my $seq ($aln->each_seq) { print $seq->seq,"\n"; } print $aln->gap_line, "\n"; print $aln->consensus_string(95), "\n\n"; } $hits .= $hit->name."\t".$hsp->subject->start."\t".$hsp->subject->end."\t".$hsp->strand('subject')."\n"; } } } } } return ($bool_hit, $hits); } } From maj at fortinbras.us Tue Nov 24 23:12:13 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 24 Nov 2009 23:12:13 -0500 Subject: [Bioperl-l] Bio::DB::Fasta In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32B6308542C@exchsth.agresearch.co.nz> References: <18DF7D20DFEC044098A1062202F5FFF32B63085409@exchsth.agresearch.co.nz> <4296CD1039FC44B89034A1FD3E6721F3@NewLife> <18DF7D20DFEC044098A1062202F5FFF32B6308542C@exchsth.agresearch.co.nz> Message-ID: <3ECFA0236D1B467181EE63C8C6BE7E1F@NewLife> I seem to be able to do $db = Bio::DB::Fasta->new("$tmp/test.faa"); without a problem- something in the mixing of named and unnamed parameters? ----- Original Message ----- From: "Smithies, Russell" To: "'Mark A. Jensen'" ; "'bioperl-l'" Sent: Tuesday, November 24, 2009 5:21 PM Subject: RE: [Bioperl-l] Bio::DB::Fasta That's what I ended up doing. Also, there's no "obvious" way to index a single file so I ended putting the filename in the glob parameter. my $db = Bio::DB::Fasta->new( "$tmp", -glob => "test.faa", -reindex => 1 ); --Russell > -----Original Message----- > From: Mark A. Jensen [mailto:maj at fortinbras.us] > Sent: Wednesday, 25 November 2009 11:19 a.m. > To: Smithies, Russell; 'bioperl-l' > Subject: Re: [Bioperl-l] Bio::DB::Fasta > > The code (method index_dir() ) seems to expect all the fasta files to be > contained in that directory. Looks hairy; what about creating symlinks to > your > fasta files in a /tmp subdir and calling new() with that subdir? > ----- Original Message ----- > From: "Smithies, Russell" > To: "'bioperl-l'" > Sent: Tuesday, November 24, 2009 5:04 PM > Subject: [Bioperl-l] Bio::DB::Fasta > > > > Is there any way to pass a filename to Bio::DB::Fasta for the location > of > > where to write the directory.index? > > It's writing in the same dir as the fasta but I'd rather have it write > in /tmp > > as it's part of a web app. > > > > Thanx, > > > > Russell > > > > > > ======================================================================= > > Attention: The information contained in this message and/or attachments > > from AgResearch Limited is intended only for the persons or entities > > to which it is addressed and may contain confidential and/or privileged > > material. Any review, retransmission, dissemination or other use of, or > > taking of any action in reliance upon, this information by persons or > > entities other than the intended recipients is prohibited by AgResearch > > Limited. If you have received this message in error, please notify the > > sender immediately. > > ======================================================================= > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > From maj at fortinbras.us Wed Nov 25 12:25:30 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 25 Nov 2009 12:25:30 -0500 Subject: [Bioperl-l] question for all regarding a sam-based Bio::Assembly::IO Message-ID: <1E72D5B0A190448FA27545DB5B68638D@NewLife> Short-readers, I'm working on an Assembly::IO class for sam alignments. I'm currently making a decision about handling multiple reference sequences: would you prefer that next_assembly() return an assembly that covers all reference sequences, or that next_assembly iterates over each reference sequence? (Or both?) thanks for your input- MAJ From timbourine81 at gmail.com Wed Nov 25 12:40:52 2009 From: timbourine81 at gmail.com (Tim) Date: Wed, 25 Nov 2009 18:40:52 +0100 Subject: [Bioperl-l] How to parse BLAST output - all hits of each query in new file Message-ID: <4B0D6C24.2080308@gmail.com> Dear bioperl users, I am a real newbie and have - maybe a very trivial - question. I searched the mailing list archive and many howtos but I have not found a concrete answer to my problem. So hopefully you can help me :) Background: I use the latest Bioperl version (installed it two weeks before). When I use Bio::Tools::Run::StandAloneBlast to BLAST one fasta file including different sequences, I get a BLAST output with many queries each having several hits / sbjcts. My problem is how to parse *all* hits of *one* query into a single new file. And this for all the queries I have in my BLAST output file. Or is it better the other way round; first to make fasta files with only single sequences inside and BLAST each file? But how can I automize that using Bioperl? I tried Bio::SearchIO but can only parse all queries and their respective hits in only one file... I think iteration is also necessary here, but I do not really know how to include that into Bio::SearchIO. Or do I have to use Module:Bio::Index::Blast? I can index a file (see below), but I have no idea what comes next... ###How I index a file... #!/usr/bin/perl -w $ENV{BIOPERL_INDEX_TYPE} = "SDBM_File"; use Bio::Index::Fasta; $file_name = "8_to_BLAST_two_seq_index.fasta"; $id = "48882"; $inx = Bio::Index::Fasta->new (-filename => $file_name . ".idx", -write_flag => 1); $inx->make_index($file_name); Hopefully, you can give me at least hints what to look for. A big THANKS in advance! Cheers, Tim From timbourine81 at gmail.com Wed Nov 25 12:53:34 2009 From: timbourine81 at gmail.com (Tim) Date: Wed, 25 Nov 2009 18:53:34 +0100 Subject: [Bioperl-l] How to parse different (fasta) files Message-ID: <4B0D6F1E.8@gmail.com> Hey everybody, another question from me...if you do not mind :) My situation is like this: I have parsed a standalone BLAST output using SearchIO with only the hit names. Now I have a second fasta file with the same sequences like in the BLAST database but including an alignment (meaning "." and "-"). (There is no chance to make a BLAST database with fasta files including the alignment, unfortunately...). My intention is now to take the name of the hit sequences (BLAST output) and to get the corresponding aligned sequences (fasta file incl. alignment) and putting it in a new file. Is anybody out there who has tried that before? Again, I am a absolute greenhorn in using (Bio)perl. Maybe it is very simple :D Looking forward to get an answer of you. All the best, Tim -- Tim K?hler MPI for Terrestrial Microbiology Karl-von-Frisch-Stra?e D-35043 Marburg / Germany Email: koehlerd at mpi-marburg.mpg.de Phone: +49 6421 178-740 Fax: +49 6421 178-999 From maj at fortinbras.us Wed Nov 25 13:20:03 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 25 Nov 2009 13:20:03 -0500 Subject: [Bioperl-l] How to parse BLAST output - all hits of each query innew file In-Reply-To: <4B0D6C24.2080308@gmail.com> References: <4B0D6C24.2080308@gmail.com> Message-ID: <53DE480F205E42CE8D2B9421592AAF0E@NewLife> hey Tim-- Sound like you need to go about collecting your queries inside out: my %hits_by_query; for ($result->hits) { push @{$hits_by_query{$hit->name}} $hit; } I believe now each hash element, keyed by the query name, will contain an arrayref to the set of hits assoc with that query. >From here, I believe use Bio::Search::Result::BlastResult; use Bio::SearchIO; foreach my $qid ( keys %hits_by_query ) { my $result = Bio::Search::Result::BlastResult->new(); $result->add_hit($_) for ( @{$hits_by_query{$qid}} ); my $blio = Bio::SearchIO->new( -file => $qid.".bls", -format=>'blast' ); $blio->write_result($result); } will do what you want. hope this helps - Mark ----- Original Message ----- From: "Tim" To: Sent: Wednesday, November 25, 2009 12:40 PM Subject: [Bioperl-l] How to parse BLAST output - all hits of each query innew file > Dear bioperl users, > > I am a real newbie and have - maybe a very trivial - question. > > I searched the mailing list archive and many howtos but I have not found > a concrete answer to my problem. So hopefully you can help me :) > > Background: I use the latest Bioperl version (installed it two weeks > before). > When I use Bio::Tools::Run::StandAloneBlast to BLAST one fasta file > including different sequences, I get a BLAST output with many queries > each having several hits / sbjcts. > > My problem is how to parse *all* hits of *one* query into a single new > file. And this for all the queries I have in my BLAST output file. > > Or is it better the other way round; first to make fasta files with only > single sequences inside and BLAST each file? But how can I automize that > using Bioperl? > > I tried Bio::SearchIO but can only parse all queries and their > respective hits in only one file... > I think iteration is also necessary here, but I do not really know how > to include that into Bio::SearchIO. > Or do I have to use Module:Bio::Index::Blast? > > I can index a file (see below), but I have no idea what comes next... > > ###How I index a file... > > #!/usr/bin/perl -w > > $ENV{BIOPERL_INDEX_TYPE} = "SDBM_File"; > > use Bio::Index::Fasta; > > > $file_name = "8_to_BLAST_two_seq_index.fasta"; > $id = "48882"; > $inx = Bio::Index::Fasta->new (-filename => $file_name . ".idx", > -write_flag => 1); > $inx->make_index($file_name); > > > Hopefully, you can give me at least hints what to look for. > > A big THANKS in advance! > > Cheers, > > Tim > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From Russell.Smithies at agresearch.co.nz Wed Nov 25 14:07:26 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 26 Nov 2009 08:07:26 +1300 Subject: [Bioperl-l] How to parse BLAST output - all hits of each query in new file In-Reply-To: <4B0D6C24.2080308@gmail.com> References: <4B0D6C24.2080308@gmail.com> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32B63085701@exchsth.agresearch.co.nz> Hi Tim, Here's some code for a job I'm working on at the moment that contains all the bits you'll probably need. It's extracting 2 species-specific databases from nr (based on tax ids), doing a blast, then parsing the results and creating a substitution matrix. I was initially using Bio::DB::Eutilities to query and retrieve sequences but I kept getting errors and time-outs from NCBI when pulling back large numbers of sequences. It should give you a rough idea of how to run Bio::Tools::Run::StandAloneBlast, Bio::DB::Fasta and Bio::SearchIO. Email me direct if you want further explaination as it's not well commented ;-) Russell Smithies Bioinformatics Applications Developer T +64 3 489 9085 E russell.smithies at agresearch.co.nz Invermay Research Centre Puddle Alley, Mosgiel, New Zealand T +64 3 489 3809 F +64 3 489 9174 www.agresearch.co.nz ======================================= #!/usr/local/bin/perl use strict; use Bio::Tools::Run::StandAloneBlast; use Bio::SeqIO; use Bio::SearchIO; use Bio::DB::Fasta; use Storable; # Parameters: # Percentage can be specified as either 20p, 20P or 20% # So for 20% of rice sequences blasted against oil palm: # 4530 51953 20p (4530=rice,51953=oil_palm, 20p=20%) # Or for 20 searches: # 4530 51953 20 # my ( $q, $s, $c ) = @ARGV; my $nr = "/data/databases/flatfile/illuminati_blastdata/nr"; my $tax_file = "/data/anonftp/pub/mirror/taxonomy/gi_taxid_prot.dmp.gz"; my $tmp = "/tmp/tax"; my %stats = (); my $total_subs = 0; my $min_hsp_len = 0; my $min_hsp_identity = 0; my $num_searches = $c || 10; my $blast_e = '1e-6'; my $count = 0; # check if all the fasta and blast files exist # if not, extract new fasta and re-formatdb the database foreach my $t ( $q, $s ) { foreach ( map { "$tmp/$t.$_" } qw(faa list phr pin psq) ) { unless ( -e $_ ) { print "Creating database for $t\n"; &create_database($t); last; } } } my @params = ( -database => "$tmp/$q", -program => 'blastp', -e => $blast_e, -outfile => "$tmp/blast.out", -v => '1', -b => '1' ); my $factory = Bio::Tools::Run::StandAloneBlast->new(@params) or die $!; # load the query sequences into a db # makes it easier to randomly access them my $db = Bio::DB::Fasta->new( "$tmp", -glob => "$s.faa", -reindex => 1 ); my @ids = $db->ids; my $id_count = $#ids; exit "No sequences\n" unless $id_count; # if a percentage is requested, calculate # the required number of searches if ( $num_searches =~ m/(\d+)[pP%]/ ) { $num_searches = int( ( $1 / 100 ) * $id_count ); warn "Searching random $1 percent ($num_searches) of $id_count sequences from taxid $q\n"; } my $summary_file = "$tmp/".$$."_summary.txt"; open( OUT, ">", $summary_file ) or die $!; print OUT "#Summary of $num_searches random blast searches from taxid $q against taxid $s.\n"; print OUT "#Parameters used were:\n"; print OUT "#blast_e: $blast_e\n"; print OUT "#min_hsp_len: $min_hsp_len\n"; print OUT "#min_hsp_identity: $min_hsp_identity\n"; print OUT "\n"; while ( my $seq = $db->get_Seq_by_id( $ids[ rand($#ids) ] ) ) { next unless $seq; warn "Processing ", $seq->id, "\n"; eval { my $blast_report = $factory->blastall($seq); sleep 5; }; my $blast_in = new Bio::SearchIO( -format => "blast", -file => "$tmp/blast.out" ); while ( my $result = $blast_in->next_result ) { if ( $result->num_hits <= 0 ) { warn "No hits for ", $result->query_accession, "\n"; print OUT "No hits for ", $result->query_accession, "\n"; next; } $count++; while ( my $hit = $result->next_hit ) { while ( my $hsp = $hit->next_hsp ) { warn sprintf( "%s had %s hsp%s\n", $result->query_accession, $hit->num_hsps, $hit->num_hsps > 1 ? "s" : "" ); print OUT sprintf( "%s had %s hsp%s\n", $result->query_accession, $hit->num_hsps, $hit->num_hsps > 1 ? "s" : "" ); # http://www.bioperl.org/wiki/HOWTO:SearchIO#Table_of_Methods if ( $hsp->length('total') > $min_hsp_len ) { if ( $hsp->percent_identity >= $min_hsp_identity ) { my @query_string = split '', $hsp->query_string; my @homol_string = split '', $hsp->homology_string; my @hit_string = split '', $hsp->hit_string; for ( my $i = 0; $i < $#query_string; $i++ ) { next unless $homol_string[$i] =~ /\+/; $stats{ $query_string[$i] }{ $hit_string[$i] }++; $total_subs++; } } } } } } unlink '$tmp/blast.out' if -e '$tmp/blast.out'; last if $count >= $num_searches; } # create summary frequency list my %summary = (); for my $query ( keys %stats ) { for my $hit ( keys %{ $stats{$query} } ) { $summary{"$query->$hit"} = sprintf( "%6f", $stats{$query}{$hit} / $total_subs ); } } print OUT "\n"; # sort by decending frequencies and print to summary file foreach my $k ( sort { $summary{$b} <=> $summary{$a} } keys %summary ) { print OUT "$k\t", $summary{$k}, "\n" unless $k =~ /TOTAL/; } print OUT "\n\n"; # print substitution matrix my $i = 0; my @prots = qw(A R N D C Q E G H I L K M F P S T W Y V); my $sep = "\t"; print OUT sprintf( "%7s %s", $_, $sep ) foreach ( " ", @prots ); print OUT "\n"; foreach my $x (@prots) { print OUT sprintf( "%7s|%s", $prots[ $i++ ], $sep ); foreach my $y (@prots) { my $val = defined( $stats{$x}{$y} ) ? sprintf( "%0.6f", $stats{$x}{$y} / $total_subs ) : "--------"; print OUT sprintf( "%s%s", $val, $sep ); } print OUT "\n"; } close OUT; open(IN, $summary_file) or die $!; print $_ while(); close IN; # extract sequences from nr database based on taxid. sub create_database { my $txid = shift; my %hash = (); my $gi_stored = "/tmp/gi.dat"; if ( -e $gi_stored ) { %hash = %{ retrieve($gi_stored) }; } else { open( TXID, "zcat $tax_file | " ) or die $!; while () { chomp; my ( $gi, $tx ) = split( "\t", $_ ); push( @{ $hash{$tx} }, $gi ); } close TXID; store( \%hash, $gi_stored ); } my $txlist = "$tmp/$txid.list"; my $txseq = "$tmp/$txid.faa"; die "No sequences found for taxid $txid\n" unless defined( @{ $hash{$txid} }); my $num_seqs = scalar( @{ $hash{$txid} }); warn "Found $num_seqs sequences for taxid $txid in $tax_file\n"; open OUT, ">", $txlist or die $!; print OUT "$_\n" foreach ( @{ $hash{$txid} } ); close OUT; my $cmd = "fastacmd -d $nr -i $txlist -t T -o $txseq 2>/dev/null"; system $cmd; my $count = `grep -c '>' $txseq`; $count =~ s/\n//; warn "Could only extract $count sequences from $nr\n"; $cmd = "formatdb -p T -i $tmp/$txid.faa -n $tmp/$txid -l $tmp/formatdb.log"; system $cmd; $cmd = "fastacmd -d $tmp/$txid -I"; system $cmd; warn "Check the formatdb.log for any errors\n"; } ======================================= > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Tim > Sent: Thursday, 26 November 2009 6:41 a.m. > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] How to parse BLAST output - all hits of each query in > new file > > Dear bioperl users, > > I am a real newbie and have - maybe a very trivial - question. > > I searched the mailing list archive and many howtos but I have not found > a concrete answer to my problem. So hopefully you can help me :) > > Background: I use the latest Bioperl version (installed it two weeks > before). > When I use Bio::Tools::Run::StandAloneBlast to BLAST one fasta file > including different sequences, I get a BLAST output with many queries > each having several hits / sbjcts. > > My problem is how to parse *all* hits of *one* query into a single new > file. And this for all the queries I have in my BLAST output file. > > Or is it better the other way round; first to make fasta files with only > single sequences inside and BLAST each file? But how can I automize that > using Bioperl? > > I tried Bio::SearchIO but can only parse all queries and their > respective hits in only one file... > I think iteration is also necessary here, but I do not really know how > to include that into Bio::SearchIO. > Or do I have to use Module:Bio::Index::Blast? > > I can index a file (see below), but I have no idea what comes next... > > ###How I index a file... > > #!/usr/bin/perl -w > > $ENV{BIOPERL_INDEX_TYPE} = "SDBM_File"; > > use Bio::Index::Fasta; > > > $file_name = "8_to_BLAST_two_seq_index.fasta"; > $id = "48882"; > $inx = Bio::Index::Fasta->new (-filename => $file_name . ".idx", > -write_flag => 1); > $inx->make_index($file_name); > > > Hopefully, you can give me at least hints what to look for. > > A big THANKS in advance! > > Cheers, > > Tim > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From maj at fortinbras.us Wed Nov 25 14:21:27 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 25 Nov 2009 14:21:27 -0500 Subject: [Bioperl-l] How to parse BLAST output - all hits of each queryinnew file In-Reply-To: <53DE480F205E42CE8D2B9421592AAF0E@NewLife> References: <4B0D6C24.2080308@gmail.com> <53DE480F205E42CE8D2B9421592AAF0E@NewLife> Message-ID: <815D2A47BC9C4D89B8DEF0B10DA9EAF8@NewLife> whoops: change the following line: my $blio = Bio::SearchIO->new( -file => $qid.".bls", -format=>'blast' ); to my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", -format=>'blast' ); (I always forget that...) MAJ ----- Original Message ----- From: "Mark A. Jensen" To: "Tim" ; Sent: Wednesday, November 25, 2009 1:20 PM Subject: Re: [Bioperl-l] How to parse BLAST output - all hits of each queryinnew file > hey Tim-- > > Sound like you need to go about collecting your queries inside out: > > my %hits_by_query; > for ($result->hits) { > push @{$hits_by_query{$hit->name}} $hit; > } > > I believe now each hash element, keyed by the query name, will contain > an arrayref to the set of hits assoc with that query. >>From here, I believe > > use Bio::Search::Result::BlastResult; > use Bio::SearchIO; > > foreach my $qid ( keys %hits_by_query ) { > my $result = Bio::Search::Result::BlastResult->new(); > $result->add_hit($_) for ( @{$hits_by_query{$qid}} ); > my $blio = Bio::SearchIO->new( -file => $qid.".bls", -format=>'blast' ); > $blio->write_result($result); > } > > will do what you want. > > hope this helps - > Mark > > ----- Original Message ----- > From: "Tim" > To: > Sent: Wednesday, November 25, 2009 12:40 PM > Subject: [Bioperl-l] How to parse BLAST output - all hits of each query innew > file > > >> Dear bioperl users, >> >> I am a real newbie and have - maybe a very trivial - question. >> >> I searched the mailing list archive and many howtos but I have not found >> a concrete answer to my problem. So hopefully you can help me :) >> >> Background: I use the latest Bioperl version (installed it two weeks >> before). >> When I use Bio::Tools::Run::StandAloneBlast to BLAST one fasta file >> including different sequences, I get a BLAST output with many queries >> each having several hits / sbjcts. >> >> My problem is how to parse *all* hits of *one* query into a single new >> file. And this for all the queries I have in my BLAST output file. >> >> Or is it better the other way round; first to make fasta files with only >> single sequences inside and BLAST each file? But how can I automize that >> using Bioperl? >> >> I tried Bio::SearchIO but can only parse all queries and their >> respective hits in only one file... >> I think iteration is also necessary here, but I do not really know how >> to include that into Bio::SearchIO. >> Or do I have to use Module:Bio::Index::Blast? >> >> I can index a file (see below), but I have no idea what comes next... >> >> ###How I index a file... >> >> #!/usr/bin/perl -w >> >> $ENV{BIOPERL_INDEX_TYPE} = "SDBM_File"; >> >> use Bio::Index::Fasta; >> >> >> $file_name = "8_to_BLAST_two_seq_index.fasta"; >> $id = "48882"; >> $inx = Bio::Index::Fasta->new (-filename => $file_name . ".idx", >> -write_flag => 1); >> $inx->make_index($file_name); >> >> >> Hopefully, you can give me at least hints what to look for. >> >> A big THANKS in advance! >> >> Cheers, >> >> Tim >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From alden.huang at gmail.com Thu Nov 26 05:54:30 2009 From: alden.huang at gmail.com (Alden Huang) Date: Thu, 26 Nov 2009 02:54:30 -0800 Subject: [Bioperl-l] Function that determines serious mutations In-Reply-To: References: Message-ID: <9e408d720911260254r1e85169lb92d944d88a1880c@mail.gmail.com> Hey rob, Sorting Intolerant from Tolerant http://sift.jcvi.org/ ~alden ...a bit late, i kno; I just read you post now while cleaning the inbox On Fri, Nov 6, 2009 at 9:35 AM, Robert Bradbury wrote: > Is there a function in the library (or has someone written one) that can > take a genbank entry and determine which mutations are harmful? > > It would be used to produce a table summary of: > ?GENE ? ? ? ? ?# SNP ? ? ?# BadSNP > > One kind of gets this from NCBI if you lookup in the "GENE" db a gene name > and then go to the "GeneView" om dbSNP page it has the information I want > but largely in a graphical format while I simply want numbers I can dump > into a spreadsheet. > > I don't think it would be hard, fetch the gene, run through the features for > the SNP database, figure out whether they are good or bad SNPs, accumulate > the statistics and dump it. ?I think the functions available are flexible > enough to do it but I can't believe nobody has already done it. ?It could be > a bit more complex in that one could do an analysis to see if the mutations > are in a conserved domain or mutations that code for Cysteine or Methionine > (or othe potentially "critical" amino acids) but since "critical" is in the > eye of the beholder there would have to be some kind of callback to a > scoring function. > > Thanks, > Robert > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From robert.bradbury at gmail.com Thu Nov 26 06:27:50 2009 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Thu, 26 Nov 2009 06:27:50 -0500 Subject: [Bioperl-l] Function that determines serious mutations In-Reply-To: <9e408d720911260254r1e85169lb92d944d88a1880c@mail.gmail.com> References: <9e408d720911260254r1e85169lb92d944d88a1880c@mail.gmail.com> Message-ID: On Thu, Nov 26, 2009 at 5:54 AM, Alden Huang wrote: > > Sorting Intolerant from Tolerant > http://sift.jcvi.org/ > > Ah yes, thank you very much. This looks very much like a tool that can be adapted for various uses. Robert From jason at bioperl.org Thu Nov 26 12:16:17 2009 From: jason at bioperl.org (Jason Stajich) Date: Thu, 26 Nov 2009 09:16:17 -0800 Subject: [Bioperl-l] question about a Bio::Tree::Tree method In-Reply-To: <30960443.966281259248778372.JavaMail.defaultUser@defaultHost> References: <30960443.966281259248778372.JavaMail.defaultUser@defaultHost> Message-ID: <14F4B8C9-A1F4-436B-813F-50E139932D3D@bioperl.org> Emilio - please ask your questions on the list - many people there can help answer questions. get_nodes returns all the nodes in the tree, the options specify the order they are returned in. Depending on your question the order probably won't matter so you can just call it without any arguments like in the examples and the HOWTO. The documentation for the method says: Title : get_nodes Usage : my @nodes = $tree?>get_nodes() Function: Return list of Bio::Tree::NodeI objects Returns : array of Bio::Tree::NodeI objects Args : (named values) hash with one value order => ?b?breadth? first order or ?d?depth? first order So you can provide no arguments and get the default (breadth-first I believe) or you can specify -order => 'd' or -order => 'depth' to get the nodes in depth-first order. -jason On Nov 26, 2009, at 7:19 AM, miglio83 at libero.it wrote: > Hi Jason, > I'm Emilio Siena, a PhD student of the University of Perugia. > I have > a question about the method "get_nodes" of the "Bio::Tree::Tree" > class. > In > particular I didn't understand which type of arguments it accepts > and in which > format an argument should be given. > > Thank you in advance! > > Emilio -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From maj at fortinbras.us Thu Nov 26 12:40:45 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 26 Nov 2009 12:40:45 -0500 Subject: [Bioperl-l] Bio::Assembly::IO::sam is alpha Message-ID: <599F8BABCD2848EFA98FB24A4419674E@NewLife> in bioperl-live/trunk with plenty pod; bravehearts can (please!) test on .bam files cheers, MAJ From mauricio at open-bio.org Thu Nov 26 16:45:43 2009 From: mauricio at open-bio.org (Mauricio Herrera Cuadra) Date: Thu, 26 Nov 2009 15:45:43 -0600 Subject: [Bioperl-l] [DAS] DAS workshop 7th-9th April 2010 In-Reply-To: References: Message-ID: <4B0EF707.6080202@open-bio.org> Hi Jonathan, Any chance it can be webcasted? I'm sure it would attract a lot of remote attendees ;) Regards, Mauricio. Jonathan Warren wrote: > We are considering running a Distributed Annotation System workshop here > at the Sanger/EBI in the UK subject to decent demand. > The workshop will be held from Wednesday 7th-Friday 9th April 2010. If > you would be interested in attending either to present or just take part > then please email me jw12 at sanger.ac.uk > > The format of the workshop is likely to be similar to last years (1st > day for beginners, 2nd for both beginners and advanced users, 3rd day > for advanced), information for which can be found here: > http://www.dasregistry.org/course.jsp > > If you would like to present then please send a short summary of what > you would like to talk about. > > Thanks > > Jonathan. > > Jonathan Warren > Senior Developer and DAS coordinator > jw12 at sanger.ac.uk > > > > > > > > > From robert.bradbury at gmail.com Thu Nov 26 21:06:40 2009 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Thu, 26 Nov 2009 21:06:40 -0500 Subject: [Bioperl-l] BioPerl "guts" question regarding forked processes Message-ID: I'm currently running near my process limit and running sequence fetches from swissprot (I've also had this happen with getting gi's from NCBI) and am running out of processes about halfway through the set I'm trying to fetch [1]. Now, is there someplace in the bioperl documentation that documents where one is supposed to wait() for defunct processes after each sequence fetch. I'm encountering the problem both when the sequence fetches succeed as well as when they fail. Thanks in advance. Robert 1. This is due to a bug in chromium's use of flash that involves it leaving many defunct processes that are uncollected and therefore counting towards ones "process limit". From kanzure at gmail.com Thu Nov 26 21:12:46 2009 From: kanzure at gmail.com (Bryan Bishop) Date: Thu, 26 Nov 2009 20:12:46 -0600 Subject: [Bioperl-l] BioPerl "guts" question regarding forked processes In-Reply-To: References: Message-ID: <55ad6af70911261812q583277d5l71df0d66e756f617@mail.gmail.com> On Thu, Nov 26, 2009 at 8:06 PM, Robert Bradbury wrote: > I'm currently running near my process limit and running sequence fetches > from swissprot (I've also had this happen with getting gi's from NCBI) and > am running out of processes about halfway through the set I'm trying to > fetch [1]. Hey Robert, sorry for the off-topic question, but I was wondering if you're the same Robert Bradbury from the extropy-chat list. Hi? - Bryan http://heybryan.org/ 1 512 203 0507 From paolo.pavan at gmail.com Fri Nov 27 06:35:03 2009 From: paolo.pavan at gmail.com (Paolo Pavan) Date: Fri, 27 Nov 2009 12:35:03 +0100 Subject: [Bioperl-l] More general Bio::Assembly::Contig question (was Bio::Tools::Run::Cap3 usage question) Message-ID: <56be91b60911270335s3a50ab0cpb03aabb6660f81dc@mail.gmail.com> Dear Florent, Thank you for your kind answer and for your efforts spent in this module. Since you are working on these topics I would like to seize the day and put you some questions about some doubts I have in mind, if you agree, of course :-) Some times ago I tried to work with bioperl, loading the data from an ACE file originated by Newbler; my need was to extract part of the contig like an alignment of reads and I tought to do it with a slice() method, since I saw Bio::Assembly::Contig implements Bio::AlignI interface. Unfortunately I realize that this interface is inherited but not implemented. I tried to hack it by adding a slice method which would act on a Bio::Alignment created from the array of LocatableSeqs representing the reads. This is the question: If I'm not wrong (please correct me if yes), Bio::Assembly::Contig class stores reads informations in: Bio::Assembly::Contigs->{_elem}{READ_NAME}{_feat}{ _align_clipping:READ_NAME} _aligned_coord:READ_NAME} _quality_clipping:READ_NAME} Anyone of these 3 features _align_clipping, _aligned_coord, _quality_clipping, contains a Bio::SeqFeature::Generic, which of them is more suitable to the purpose expressed before, the slice method? And more, If you apologize me for being too long, is consequently to the previous: I don't have perfectly clear the purpose of this 3 feature per read, can you explain it? Really thanks you for the time you would spend. Bye bye, Paolo 2009/11/24 Florent Angly > Hi Paolo, > > It turns out that there is no standard for what is to be passed to the > Bio::Tools::Run wrappers and returned by them. I noticed the inconsistency > between the assembly wrappers recently while implementing support for new > wrapper. I implemented inital support for additional de novo assembly > programs in BioPerl (454 Newbler and Minimo) a couple of weeks ago and Mark > Jensen added support for Maq, a program that assembler reads against a > reference. In the process, all the assembly wrappers were changed to take > the same type of input data (a FASTA sequence or an array reference of > sequence objects) and return one of the following: > * a Bio::Assembly::Scaffold object (the default), or > * a Bio::Assembly::IO object, or > * the name of a file for the output of the assembler > Use the out_type method to set up which output you want, e.g.: > $factory->out_type('Bio::Assembly::IO'); > or > $factory->out_type('cap3_results.ace'); > You'll have to use the code in the bioperl-run subversion if you want to > use these new features. > > Cheers, > > Florent > > > > > Paolo Pavan wrote: > >> Dear, >> I'm confused about the proper usage of the module Bio::Tools::Run::Cap3. >> As documented in the pod, the run(@seqs) method returns the cap3 report >> file >> while I expect to return a Bio::Assembly object, consistently with other >> Bio::Tools::Run classes. >> However, I went around this by getting from the factory object the >> location >> and the names of the temp output files (actually accessing a private >> property, although) and reading them via the Assembly::IO system. >> I was just wandering what is the proper designed way to do this job. >> >> Thank you for enlighten the way! >> Paolo >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > > From jw12 at sanger.ac.uk Thu Nov 26 09:57:35 2009 From: jw12 at sanger.ac.uk (Jonathan Warren) Date: Thu, 26 Nov 2009 14:57:35 +0000 Subject: [Bioperl-l] DAS workshop 7th-9th April 2010 Message-ID: We are considering running a Distributed Annotation System workshop here at the Sanger/EBI in the UK subject to decent demand. The workshop will be held from Wednesday 7th-Friday 9th April 2010. If you would be interested in attending either to present or just take part then please email me jw12 at sanger.ac.uk The format of the workshop is likely to be similar to last years (1st day for beginners, 2nd for both beginners and advanced users, 3rd day for advanced), information for which can be found here: http://www.dasregistry.org/course.jsp If you would like to present then please send a short summary of what you would like to talk about. Thanks Jonathan. Jonathan Warren Senior Developer and DAS coordinator jw12 at sanger.ac.uk -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From timbourine81 at googlemail.com Thu Nov 26 11:02:30 2009 From: timbourine81 at googlemail.com (Tim Koehler) Date: Thu, 26 Nov 2009 17:02:30 +0100 Subject: [Bioperl-l] How to parse BLAST output - all hits of each queryinnew file In-Reply-To: <4B0EA44D.2050507@gmail.com> References: <4B0D6C24.2080308@gmail.com> <53DE480F205E42CE8D2B9421592AAF0E@NewLife> <815D2A47BC9C4D89B8DEF0B10DA9EAF8@NewLife> <4B0EA44D.2050507@gmail.com> Message-ID: ups, sent too early... Hey Mark, thanks for the answer. But I am still struggling, especially where to put in your code. Here ist the code I have, so far: #!/usr/bin/perl -w ### should I put your code here as push is a perl command? my %hits_by_query; for ($result->hits) { ### I inserted a comma after name}}; if there is no comma, there was the error: Scalar found where operator expected at 12_BLAST_two_sequence_each_query_one_file.PL line7, near "} $hit" ### (Missing operator before $hit?) ###Useless use of push with no values at 12_BLAST_two_sequence_each_query_one_file.PL line 7. ###syntax error at 12_BLAST_two_sequence_each_query_one_file.PL line 7, near "} $hit" ###BEGIN not safe after errors--compilation aborted at 12_BLAST_two_sequence_each_query_one_file.PL line 13. push @{$hits_by_query{$hit->name}}, $hit; ###here, every time this terror appears: Name "main::result" used only once: possible typo at 12_BLAST_two_sequence_each_query_one_file.PL line 5. ###error: Can't call method "hits" on an undefined value at 12_BLAST_two_sequence_each_query_one_file.PL line 5. } use strict; use Bio::Tools::Run::StandAloneBlast; use Bio::SeqIO; use Bio::SearchIO; use Bio::Search::Result::BlastResult; my $Seq_in = Bio::SeqIO->new ( -file => "/home/koehler/Programs/for_BLAST/BLAST_Pipeline/1_to_BLAST_two_seq.fasta", -format => 'fasta' ); while (my $query = $Seq_in->next_seq()) { my $factory = Bio::Tools::Run::StandAloneBlast->new( 'program' => 'blastn', 'database' => '/home/koehler/Programs/for_BLAST/BLAST_Pipeline/3_BLAST_db', _READMETHOD => "Blast" ); my $blast_report = $factory->blastall($query); ### Should I need to use a module? are the commands here at the right position? errors, e.g., Global symbol "$hit" requires explicit package name #my %hits_by_query; #for ($result->hits) { ### inserted comma after name}} # push @{$hits_by_query{$hit->name}}, $hit; #} foreach my $qid ( keys %hits_by_query ) { my $result = Bio::Search::Result::BlastResult->new(); $result->add_hit($_) for ( @{$hits_by_query{$qid}} ); my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", -format=>'blast' ); $blio->write_result($result); } ###where are the files stored? what is their name. Sorry, but I cannot get behind that :( while( my $result = $blast_report->next_result ) { ## $result is a Bio::Search::Result::ResultI compliant object while( my $hit = $result->next_hit ) { ## $hit is a Bio::Search::Hit::HitI compliant object while( my $hsp = $hit->next_hsp ) { ## $hsp is a Bio::Search::HSP::HSPI compliant object if( $hsp->length('total') > 50 ) { if ( $hsp->percent_identity >= 75 ) { print "Query= ", $result->query_name, "Hit= ", $hit->name, "Length= ", $hsp->length('total'), "Percent_id= ", $hsp->percent_identity, "Subject=", $hsp->hit_string,"\n"; } } } } } } Again, a big thanks in advance :) All the best, Tim On Thu, Nov 26, 2009 at 4:52 PM, Tim wrote: > Hey Mark, > > thanks for the answer > > On 25.11.2009 20:21, Mark A. Jensen wrote: > > whoops: change the following line: > > my $blio = Bio::SearchIO->new( -file => $qid.".bls", -format=>'blast' ); > > > > to > > > > my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", -format=>'blast' ); > > > > (I always forget that...) > > MAJ > > > > ----- Original Message ----- From: "Mark A. Jensen" > > To: "Tim" ; > > Sent: Wednesday, November 25, 2009 1:20 PM > > Subject: Re: [Bioperl-l] How to parse BLAST output - all hits of each > > queryinnew file > > > > > >> hey Tim-- > >> > >> Sound like you need to go about collecting your queries inside out: > >> > >> my %hits_by_query; > >> for ($result->hits) { > >> push @{$hits_by_query{$hit->name}} $hit; > >> } > >> > >> I believe now each hash element, keyed by the query name, will contain > >> an arrayref to the set of hits assoc with that query. > >>> From here, I believe > >> > >> use Bio::Search::Result::BlastResult; > >> use Bio::SearchIO; > >> > >> foreach my $qid ( keys %hits_by_query ) { > >> my $result = Bio::Search::Result::BlastResult->new(); > >> $result->add_hit($_) for ( @{$hits_by_query{$qid}} ); > >> my $blio = Bio::SearchIO->new( -file => $qid.".bls", -format=>'blast' > ); > >> $blio->write_result($result); > >> } > >> > >> will do what you want. > >> > >> hope this helps - > >> Mark > >> > >> ----- Original Message ----- From: "Tim" > >> To: > >> Sent: Wednesday, November 25, 2009 12:40 PM > >> Subject: [Bioperl-l] How to parse BLAST output - all hits of each > >> query innew file > >> > >> > >>> Dear bioperl users, > >>> > >>> I am a real newbie and have - maybe a very trivial - question. > >>> > >>> I searched the mailing list archive and many howtos but I have not > found > >>> a concrete answer to my problem. So hopefully you can help me :) > >>> > >>> Background: I use the latest Bioperl version (installed it two weeks > >>> before). > >>> When I use Bio::Tools::Run::StandAloneBlast to BLAST one fasta file > >>> including different sequences, I get a BLAST output with many queries > >>> each having several hits / sbjcts. > >>> > >>> My problem is how to parse *all* hits of *one* query into a single new > >>> file. And this for all the queries I have in my BLAST output file. > >>> > >>> Or is it better the other way round; first to make fasta files with > only > >>> single sequences inside and BLAST each file? But how can I automize > that > >>> using Bioperl? > >>> > >>> I tried Bio::SearchIO but can only parse all queries and their > >>> respective hits in only one file... > >>> I think iteration is also necessary here, but I do not really know how > >>> to include that into Bio::SearchIO. > >>> Or do I have to use Module:Bio::Index::Blast? > >>> > >>> I can index a file (see below), but I have no idea what comes next... > >>> > >>> ###How I index a file... > >>> > >>> #!/usr/bin/perl -w > >>> > >>> $ENV{BIOPERL_INDEX_TYPE} = "SDBM_File"; > >>> > >>> use Bio::Index::Fasta; > >>> > >>> > >>> $file_name = "8_to_BLAST_two_seq_index.fasta"; > >>> $id = "48882"; > >>> $inx = Bio::Index::Fasta->new (-filename => $file_name . ".idx", > >>> -write_flag => 1); > >>> $inx->make_index($file_name); > >>> > >>> > >>> Hopefully, you can give me at least hints what to look for. > >>> > >>> A big THANKS in advance! > >>> > >>> Cheers, > >>> > >>> Tim > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >>> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > > > > Tim K?hler MPI for Terrestrial Microbiology Karl-von-Frisch-Stra?e D-35043 Marburg / Germany Email: koehlerd at mpi-marburg.mpg.de Phone: +49 6421 178-740 Fax: +49 6421 178-999 From rtbio.2009 at gmail.com Sat Nov 28 02:53:43 2009 From: rtbio.2009 at gmail.com (Roopa Raghuveer) Date: Sat, 28 Nov 2009 08:53:43 +0100 Subject: [Bioperl-l] Linking of two cgi scripts Message-ID: hello everyone, I have a small question. I would like to link two cgi scripts i.e., I have an input sequence being entered in a text area ex:->gi|at442323|... ATGCCCCCTTGGAACCAAAAAAA.... So I would like to compare this with the query sequences.These query sequences would be from a BLAST script in the module blast.pm So once I enter the input sequence and request for BLAST using submit button,my request should go to a program which performs BLAST search.After this, the sequences obtained from BLAST have to be returned to a program Roopa.pm which compares the input sequence and the sequences obtained from blast. But I am unable to provide this link between the cgi scripts.(i.e.,one script to use BLAST,the other script to compare the sequences and send the results to the browser) Could any one help me in this regard? Regards, Roopa. From s.denaxas at gmail.com Sat Nov 28 05:56:15 2009 From: s.denaxas at gmail.com (Spiros Denaxas) Date: Sat, 28 Nov 2009 10:56:15 +0000 Subject: [Bioperl-l] Linking of two cgi scripts In-Reply-To: References: Message-ID: Hello, Why do they both have to be CGi scripts? cant all the processing happen server side, i.e. both BLAST and comparison of returned results? If that is strictly a requirement, you could: a) get input from user on script A, i.e. the input sequence b) do a HTTP request from the CGI to the other script B using LWP::UserAgent c) get results from script B, pass on to comparison module d) return results to user As I said, this will be clunky so either do everything in one go or consider AJAX hope this helps Spiros On Sat, Nov 28, 2009 at 7:53 AM, Roopa Raghuveer wrote: > hello everyone, > > I have a small question. > > I would like to link two cgi scripts i.e., > > I have an input sequence being entered in a text area > > ex:->gi|at442323|... > ATGCCCCCTTGGAACCAAAAAAA.... > > So I would like to compare this with the query sequences.These query > sequences would be from a BLAST script in the module blast.pm > So once I enter the input sequence and request for BLAST using submit > button,my request should go to a program which performs BLAST search.After > this, the sequences obtained from BLAST have to be returned to a program > Roopa.pm which compares the input sequence and the sequences obtained from > blast. > > But I am unable to provide this link between the cgi scripts.(i.e.,one > script to use BLAST,the other script to compare the sequences and send the > results to the browser) > > Could any one help me in this regard? > > Regards, > Roopa. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From maj at fortinbras.us Sat Nov 28 11:23:53 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sat, 28 Nov 2009 11:23:53 -0500 Subject: [Bioperl-l] Run wrappers for BWA and Samtools Message-ID: <7F56A6EEEB0E4EE291D5340F27DF7D3A@NewLife> Hi All, Run wrappers for the bwa assembler and the samtools suite are now available as beta in the bioperl-run/trunk. The bwa wrapper allows you to run a canned assembly pipeline, or to execute individual bwa components. The assembly pipeline can return a Bio::Assembly::Scaffold object via the new Bio::Assembly::IO::sam module in bioperl-live/trunk (this requires lstein's Bio::DB::Sam, from CPAN). Details at http://www.bioperl.org/wiki/HOWTO:Short-read_assemblies_with_BWA and, of course, in the pod. Cheers, MAJ From maj at fortinbras.us Sat Nov 28 21:55:42 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sat, 28 Nov 2009 21:55:42 -0500 Subject: [Bioperl-l] How to parse BLAST output - all hits of eachqueryinnew file In-Reply-To: References: <4B0D6C24.2080308@gmail.com><53DE480F205E42CE8D2B9421592AAF0E@NewLife><815D2A47BC9C4D89B8DEF0B10DA9EAF8@NewLife><4B0EA44D.2050507@gmail.com> Message-ID: <21BFD947CEEF43CCAC8AFFDB7A064A49@NewLife> Hi Tim-- There's a bug in my code; should be for my $hit ($result->hits) { ... } and you're right about the comma. My bad. But I don't think you need this-- you're already looping over your query sequences and doing blastn on each one. So in the middle of your loop, you can simply write the blast result that you got: my $blio = Bio::SearchIO->new( -file => ">".$query->id.".bls", -format=>"blast" ); $blio->write_result($result); and forget about the foreach my $qid loop entirely. The files should show up in the directory from which you're running the script. cheers, MAJ ----- Original Message ----- From: "Tim Koehler" To: Sent: Thursday, November 26, 2009 11:02 AM Subject: Re: [Bioperl-l] How to parse BLAST output - all hits of eachqueryinnew file ups, sent too early... Hey Mark, thanks for the answer. But I am still struggling, especially where to put in your code. Here ist the code I have, so far: #!/usr/bin/perl -w ### should I put your code here as push is a perl command? my %hits_by_query; for ($result->hits) { ### I inserted a comma after name}}; if there is no comma, there was the error: Scalar found where operator expected at 12_BLAST_two_sequence_each_query_one_file.PL line7, near "} $hit" ### (Missing operator before $hit?) ###Useless use of push with no values at 12_BLAST_two_sequence_each_query_one_file.PL line 7. ###syntax error at 12_BLAST_two_sequence_each_query_one_file.PL line 7, near "} $hit" ###BEGIN not safe after errors--compilation aborted at 12_BLAST_two_sequence_each_query_one_file.PL line 13. push @{$hits_by_query{$hit->name}}, $hit; ###here, every time this terror appears: Name "main::result" used only once: possible typo at 12_BLAST_two_sequence_each_query_one_file.PL line 5. ###error: Can't call method "hits" on an undefined value at 12_BLAST_two_sequence_each_query_one_file.PL line 5. } use strict; use Bio::Tools::Run::StandAloneBlast; use Bio::SeqIO; use Bio::SearchIO; use Bio::Search::Result::BlastResult; my $Seq_in = Bio::SeqIO->new ( -file => "/home/koehler/Programs/for_BLAST/BLAST_Pipeline/1_to_BLAST_two_seq.fasta", -format => 'fasta' ); while (my $query = $Seq_in->next_seq()) { my $factory = Bio::Tools::Run::StandAloneBlast->new( 'program' => 'blastn', 'database' => '/home/koehler/Programs/for_BLAST/BLAST_Pipeline/3_BLAST_db', _READMETHOD => "Blast" ); my $blast_report = $factory->blastall($query); ### Should I need to use a module? are the commands here at the right position? errors, e.g., Global symbol "$hit" requires explicit package name #my %hits_by_query; #for ($result->hits) { ### inserted comma after name}} # push @{$hits_by_query{$hit->name}}, $hit; #} foreach my $qid ( keys %hits_by_query ) { my $result = Bio::Search::Result::BlastResult->new(); $result->add_hit($_) for ( @{$hits_by_query{$qid}} ); my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", -format=>'blast' ); $blio->write_result($result); } ###where are the files stored? what is their name. Sorry, but I cannot get behind that :( while( my $result = $blast_report->next_result ) { ## $result is a Bio::Search::Result::ResultI compliant object while( my $hit = $result->next_hit ) { ## $hit is a Bio::Search::Hit::HitI compliant object while( my $hsp = $hit->next_hsp ) { ## $hsp is a Bio::Search::HSP::HSPI compliant object if( $hsp->length('total') > 50 ) { if ( $hsp->percent_identity >= 75 ) { print "Query= ", $result->query_name, "Hit= ", $hit->name, "Length= ", $hsp->length('total'), "Percent_id= ", $hsp->percent_identity, "Subject=", $hsp->hit_string,"\n"; } } } } } } Again, a big thanks in advance :) All the best, Tim On Thu, Nov 26, 2009 at 4:52 PM, Tim wrote: > Hey Mark, > > thanks for the answer > > On 25.11.2009 20:21, Mark A. Jensen wrote: > > whoops: change the following line: > > my $blio = Bio::SearchIO->new( -file => $qid.".bls", -format=>'blast' ); > > > > to > > > > my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", -format=>'blast' ); > > > > (I always forget that...) > > MAJ > > > > ----- Original Message ----- From: "Mark A. Jensen" > > To: "Tim" ; > > Sent: Wednesday, November 25, 2009 1:20 PM > > Subject: Re: [Bioperl-l] How to parse BLAST output - all hits of each > > queryinnew file > > > > > >> hey Tim-- > >> > >> Sound like you need to go about collecting your queries inside out: > >> > >> my %hits_by_query; > >> for ($result->hits) { > >> push @{$hits_by_query{$hit->name}} $hit; > >> } > >> > >> I believe now each hash element, keyed by the query name, will contain > >> an arrayref to the set of hits assoc with that query. > >>> From here, I believe > >> > >> use Bio::Search::Result::BlastResult; > >> use Bio::SearchIO; > >> > >> foreach my $qid ( keys %hits_by_query ) { > >> my $result = Bio::Search::Result::BlastResult->new(); > >> $result->add_hit($_) for ( @{$hits_by_query{$qid}} ); > >> my $blio = Bio::SearchIO->new( -file => $qid.".bls", -format=>'blast' > ); > >> $blio->write_result($result); > >> } > >> > >> will do what you want. > >> > >> hope this helps - > >> Mark > >> > >> ----- Original Message ----- From: "Tim" > >> To: > >> Sent: Wednesday, November 25, 2009 12:40 PM > >> Subject: [Bioperl-l] How to parse BLAST output - all hits of each > >> query innew file > >> > >> > >>> Dear bioperl users, > >>> > >>> I am a real newbie and have - maybe a very trivial - question. > >>> > >>> I searched the mailing list archive and many howtos but I have not > found > >>> a concrete answer to my problem. So hopefully you can help me :) > >>> > >>> Background: I use the latest Bioperl version (installed it two weeks > >>> before). > >>> When I use Bio::Tools::Run::StandAloneBlast to BLAST one fasta file > >>> including different sequences, I get a BLAST output with many queries > >>> each having several hits / sbjcts. > >>> > >>> My problem is how to parse *all* hits of *one* query into a single new > >>> file. And this for all the queries I have in my BLAST output file. > >>> > >>> Or is it better the other way round; first to make fasta files with > only > >>> single sequences inside and BLAST each file? But how can I automize > that > >>> using Bioperl? > >>> > >>> I tried Bio::SearchIO but can only parse all queries and their > >>> respective hits in only one file... > >>> I think iteration is also necessary here, but I do not really know how > >>> to include that into Bio::SearchIO. > >>> Or do I have to use Module:Bio::Index::Blast? > >>> > >>> I can index a file (see below), but I have no idea what comes next... > >>> > >>> ###How I index a file... > >>> > >>> #!/usr/bin/perl -w > >>> > >>> $ENV{BIOPERL_INDEX_TYPE} = "SDBM_File"; > >>> > >>> use Bio::Index::Fasta; > >>> > >>> > >>> $file_name = "8_to_BLAST_two_seq_index.fasta"; > >>> $id = "48882"; > >>> $inx = Bio::Index::Fasta->new (-filename => $file_name . ".idx", > >>> -write_flag => 1); > >>> $inx->make_index($file_name); > >>> > >>> > >>> Hopefully, you can give me at least hints what to look for. > >>> > >>> A big THANKS in advance! > >>> > >>> Cheers, > >>> > >>> Tim > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >>> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > > > > Tim K?hler MPI for Terrestrial Microbiology Karl-von-Frisch-Stra?e D-35043 Marburg / Germany Email: koehlerd at mpi-marburg.mpg.de Phone: +49 6421 178-740 Fax: +49 6421 178-999 _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Sat Nov 28 22:32:42 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sat, 28 Nov 2009 22:32:42 -0500 Subject: [Bioperl-l] HOWTO copyright policy vs FDL on wiki Message-ID: <9EC73CA501BD45BA912F2D77954D6CD7@NewLife> The HOWTOs appear to have a more restrictive copyright than FDL-- in particular, the blurb at the bottom of the HOWTO page asks users to use the documents for personal use only. I'm for this; I think we should therefore have some explicit license for these that specifies this kind of restriction, and then express that on each howto and in BioPerl:Copyright. Any thoughts on the right license and whether this is a good plan? MAJ From florent.angly at gmail.com Sat Nov 28 22:47:45 2009 From: florent.angly at gmail.com (Florent Angly) Date: Sat, 28 Nov 2009 19:47:45 -0800 Subject: [Bioperl-l] More general Bio::Assembly::Contig question (was Bio::Tools::Run::Cap3 usage question) In-Reply-To: <56be91b60911270335s3a50ab0cpb03aabb6660f81dc@mail.gmail.com> References: <56be91b60911270335s3a50ab0cpb03aabb6660f81dc@mail.gmail.com> Message-ID: <4B11EEE1.8070907@gmail.com> Hi Paolo, The aligned reads of a contig are stored in Bio::Assembly::Contigs->{_elem}{READ_NAME}{_seq}. To implement a slice() method, you could retrieve the reads using get_seq_ids(), get_seq_by_name() or get_seq_by_pos(). To retrieve the position of an aligned read in the contig, use get_seq_coord() which returns a Bio::SeqFeature::Generic object (from Bio::Assembly::Contigs->{_elem}{READ_NAME}{_feat}{_aligned_coord:READ_NAME}) on which you can call the start() and end() methods. I'm not entirely sure what Bio::Assembly::Contigs->{_elem}{READ_NAME}{_feat}{_align_clipping:READ_NAME} and {_quality_clipping:READ_NAME} are. I believe that they represent the clear range of the read/contig. Hope it helps, Florent Paolo Pavan wrote: > Dear Florent, > Thank you for your kind answer and for your efforts spent in this module. > Since you are working on these topics I would like to seize the day > and put you some questions about some doubts I have in mind, if you > agree, of course :-) > Some times ago I tried to work with bioperl, loading the data from an > ACE file originated by Newbler; my need was to extract part of the > contig like an alignment of reads and I tought to do it with a slice() > method, since I saw Bio::Assembly::Contig implements Bio::AlignI > interface. Unfortunately I realize that this interface is inherited > but not implemented. > I tried to hack it by adding a slice method which would act on a > Bio::Alignment created from the array of LocatableSeqs representing > the reads. > > This is the question: > If I'm not wrong (please correct me if yes), Bio::Assembly::Contig > class stores reads informations in: > Bio::Assembly::Contigs->{_elem}{READ_NAME}{_feat}{ > _align_clipping:READ_NAME} > _aligned_coord:READ_NAME} > _quality_clipping:READ_NAME} > > Anyone of these 3 features _align_clipping, _aligned_coord, > _quality_clipping, contains a Bio::SeqFeature::Generic, which of them > is more suitable to the purpose expressed before, the slice method? > And more, If you apologize me for being too long, is consequently to > the previous: I don't have perfectly clear the purpose of this 3 > feature per read, can you explain it? > > Really thanks you for the time you would spend. > Bye bye, > Paolo From bimber at wisc.edu Sun Nov 29 00:31:25 2009 From: bimber at wisc.edu (Ben Bimber) Date: Sat, 28 Nov 2009 23:31:25 -0600 Subject: [Bioperl-l] using bioperl to compare sequences Message-ID: <9f985cdc0911282131l350bc525gd9ad4717c101ac63@mail.gmail.com> Hello, I have a couple years programming experience, but am reasonably new to perl and extremely new to bioperl. I have been reading through the bioperl documentation and am trying to understand the best way to approach a particular problem. I'm hoping someone could offer some tips and point me in the right direction. If someone has solved this sort of problem before, i'd prefer not to reinvent things. Here's what I'm trying to do: Our lab generates mRNA sequence data, consisting of alleles of a given gene or genes I want to compare each of these sequences against a reference using BLAST or clustalw (will need the ability to choose at run time) Take the result of this alignment, then record positions of difference between the experimental sequence and reference sequence (SNPs) Translate the corresponding AA change(s) associated with each SNP. There can be overlapping ORFs. I see that bioperl has modules for BLAST and clustal. I've also been looking at the modules under variation. I havent fully wrapped my head around them, but they look to be what i'd use for SNP detection. has anyone has written code to perform similar things and if so, would you be willing to share specific examples? Anything concrete to see exactly how these modules operate would be extremely helpful. Thanks in advance for any tips or help. From jason at bioperl.org Sun Nov 29 10:54:53 2009 From: jason at bioperl.org (Jason Stajich) Date: Sun, 29 Nov 2009 07:54:53 -0800 Subject: [Bioperl-l] How to parse BLAST output - all hits of eachqueryinnew file In-Reply-To: <21BFD947CEEF43CCAC8AFFDB7A064A49@NewLife> References: <4B0D6C24.2080308@gmail.com><53DE480F205E42CE8D2B9421592AAF0E@NewLife><815D2A47BC9C4D89B8DEF0B10DA9EAF8@NewLife><4B0EA44D.2050507@gmail.com> <21BFD947CEEF43CCAC8AFFDB7A064A49@NewLife> Message-ID: <897A8DB4-AF29-4601-A1E5-9A04D9D8C151@bioperl.org> or while( my $hit = $result->next_hit ) { } On Nov 28, 2009, at 6:55 PM, Mark A. Jensen wrote: > Hi Tim-- > There's a bug in my code; should be > for my $hit ($result->hits) { > ... > } > and you're right about the comma. My bad. > > But I don't think you need this-- you're already looping over your > query sequences and doing blastn on each one. So in the middle of > your loop, you can simply write the blast result that you got: > > my $blio = Bio::SearchIO->new( -file => ">".$query->id.".bls", - > format=>"blast" ); > $blio->write_result($result); > > and forget about the foreach my $qid loop entirely. > > The files should show up in the directory from which you're > running the script. > cheers, MAJ > > > > ----- Original Message ----- From: "Tim Koehler" > > To: > Sent: Thursday, November 26, 2009 11:02 AM > Subject: Re: [Bioperl-l] How to parse BLAST output - all hits of > eachqueryinnew file > > > ups, sent too early... > > Hey Mark, > > thanks for the answer. But I am still struggling, especially where > to put in > your code. > > Here ist the code I have, so far: > > #!/usr/bin/perl -w > > ### should I put your code here as push is a perl command? > my %hits_by_query; > for ($result->hits) { > ### I inserted a comma after name}}; if there is no comma, there was > the > error: Scalar found where operator expected at > 12_BLAST_two_sequence_each_query_one_file.PL line7, near "} $hit" > ### (Missing operator before $hit?) > ###Useless use of push with no values at > 12_BLAST_two_sequence_each_query_one_file.PL line 7. > ###syntax error at 12_BLAST_two_sequence_each_query_one_file.PL line > 7, near > "} $hit" > ###BEGIN not safe after errors--compilation aborted at > 12_BLAST_two_sequence_each_query_one_file.PL line 13. > push @{$hits_by_query{$hit->name}}, $hit; > ###here, every time this terror appears: Name "main::result" used > only once: > possible typo at 12_BLAST_two_sequence_each_query_one_file.PL line 5. > ###error: Can't call method "hits" on an undefined value at > 12_BLAST_two_sequence_each_query_one_file.PL line 5. > } > > > use strict; > use Bio::Tools::Run::StandAloneBlast; > use Bio::SeqIO; > use Bio::SearchIO; > use Bio::Search::Result::BlastResult; > > my $Seq_in = Bio::SeqIO->new ( > -file => > "/home/koehler/Programs/for_BLAST/BLAST_Pipeline/ > 1_to_BLAST_two_seq.fasta", > -format => 'fasta' > ); > while (my $query = $Seq_in->next_seq()) { > my $factory = Bio::Tools::Run::StandAloneBlast->new( > 'program' => 'blastn', > 'database' => '/home/koehler/Programs/for_BLAST/BLAST_Pipeline/ > 3_BLAST_db', > _READMETHOD => "Blast" > ); > > my $blast_report = $factory->blastall($query); > > ### Should I need to use a module? are the commands here at the right > position? errors, e.g., Global symbol "$hit" requires explicit > package name > #my %hits_by_query; > #for ($result->hits) { > ### inserted comma after name}} > # push @{$hits_by_query{$hit->name}}, $hit; > #} > > foreach my $qid ( keys %hits_by_query ) { > my $result = Bio::Search::Result::BlastResult->new(); > $result->add_hit($_) for ( @{$hits_by_query{$qid}} ); > my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", - > format=>'blast' ); > $blio->write_result($result); > } > > ###where are the files stored? what is their name. Sorry, but I > cannot get > behind that :( > > while( my $result = $blast_report->next_result ) { > ## $result is a Bio::Search::Result::ResultI compliant object > while( my $hit = $result->next_hit ) { > ## $hit is a Bio::Search::Hit::HitI compliant object > while( my $hsp = $hit->next_hsp ) { > ## $hsp is a Bio::Search::HSP::HSPI compliant object > if( $hsp->length('total') > 50 ) { > if ( $hsp->percent_identity >= 75 ) { > print "Query= ", $result->query_name, > "Hit= ", $hit->name, > "Length= ", $hsp->length('total'), > "Percent_id= ", $hsp->percent_identity, > "Subject=", $hsp->hit_string,"\n"; > } > } > } > } > } > } > > Again, a big thanks in advance :) > > All the best, > > Tim > > > On Thu, Nov 26, 2009 at 4:52 PM, Tim wrote: > >> Hey Mark, >> >> thanks for the answer >> >> On 25.11.2009 20:21, Mark A. Jensen wrote: >> > whoops: change the following line: >> > my $blio = Bio::SearchIO->new( -file => $qid.".bls", - >> format=>'blast' ); >> > >> > to >> > >> > my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", - >> format=>'blast' ); >> > >> > (I always forget that...) >> > MAJ >> > >> > ----- Original Message ----- From: "Mark A. Jensen" > > >> > To: "Tim" ; >> > Sent: Wednesday, November 25, 2009 1:20 PM >> > Subject: Re: [Bioperl-l] How to parse BLAST output - all hits of >> each >> > queryinnew file >> > >> > >> >> hey Tim-- >> >> >> >> Sound like you need to go about collecting your queries inside >> out: >> >> >> >> my %hits_by_query; >> >> for ($result->hits) { >> >> push @{$hits_by_query{$hit->name}} $hit; >> >> } >> >> >> >> I believe now each hash element, keyed by the query name, will >> contain >> >> an arrayref to the set of hits assoc with that query. >> >>> From here, I believe >> >> >> >> use Bio::Search::Result::BlastResult; >> >> use Bio::SearchIO; >> >> >> >> foreach my $qid ( keys %hits_by_query ) { >> >> my $result = Bio::Search::Result::BlastResult->new(); >> >> $result->add_hit($_) for ( @{$hits_by_query{$qid}} ); >> >> my $blio = Bio::SearchIO->new( -file => $qid.".bls", - >> format=>'blast' >> ); >> >> $blio->write_result($result); >> >> } >> >> >> >> will do what you want. >> >> >> >> hope this helps - >> >> Mark >> >> >> >> ----- Original Message ----- From: "Tim" >> >> To: >> >> Sent: Wednesday, November 25, 2009 12:40 PM >> >> Subject: [Bioperl-l] How to parse BLAST output - all hits of each >> >> query innew file >> >> >> >> >> >>> Dear bioperl users, >> >>> >> >>> I am a real newbie and have - maybe a very trivial - question. >> >>> >> >>> I searched the mailing list archive and many howtos but I have >> not >> found >> >>> a concrete answer to my problem. So hopefully you can help me :) >> >>> >> >>> Background: I use the latest Bioperl version (installed it two >> weeks >> >>> before). >> >>> When I use Bio::Tools::Run::StandAloneBlast to BLAST one fasta >> file >> >>> including different sequences, I get a BLAST output with many >> queries >> >>> each having several hits / sbjcts. >> >>> >> >>> My problem is how to parse *all* hits of *one* query into a >> single new >> >>> file. And this for all the queries I have in my BLAST output >> file. >> >>> >> >>> Or is it better the other way round; first to make fasta files >> with >> only >> >>> single sequences inside and BLAST each file? But how can I >> automize >> that >> >>> using Bioperl? >> >>> >> >>> I tried Bio::SearchIO but can only parse all queries and their >> >>> respective hits in only one file... >> >>> I think iteration is also necessary here, but I do not really >> know how >> >>> to include that into Bio::SearchIO. >> >>> Or do I have to use Module:Bio::Index::Blast? >> >>> >> >>> I can index a file (see below), but I have no idea what comes >> next... >> >>> >> >>> ###How I index a file... >> >>> >> >>> #!/usr/bin/perl -w >> >>> >> >>> $ENV{BIOPERL_INDEX_TYPE} = "SDBM_File"; >> >>> >> >>> use Bio::Index::Fasta; >> >>> >> >>> >> >>> $file_name = "8_to_BLAST_two_seq_index.fasta"; >> >>> $id = "48882"; >> >>> $inx = Bio::Index::Fasta->new (-filename => $file_name . ".idx", >> >>> -write_flag => 1); >> >>> $inx->make_index($file_name); >> >>> >> >>> >> >>> Hopefully, you can give me at least hints what to look for. >> >>> >> >>> A big THANKS in advance! >> >>> >> >>> Cheers, >> >>> >> >>> Tim >> >>> _______________________________________________ >> >>> Bioperl-l mailing list >> >>> Bioperl-l at lists.open-bio.org >> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >>> >> >>> >> >> >> >> _______________________________________________ >> >> Bioperl-l mailing list >> >> Bioperl-l at lists.open-bio.org >> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> >> >> > >> >> Tim K?hler > MPI for Terrestrial Microbiology > Karl-von-Frisch-Stra?e > D-35043 Marburg / Germany > > Email: koehlerd at mpi-marburg.mpg.de > Phone: +49 6421 178-740 > Fax: +49 6421 178-999 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From suzi at berkeleybop.org Sun Nov 29 23:03:09 2009 From: suzi at berkeleybop.org (Suzanna Lewis) Date: Sun, 29 Nov 2009 20:03:09 -0800 Subject: [Bioperl-l] [DAS] DAS workshop 7th-9th April 2010 In-Reply-To: References: Message-ID: <3AD3C819-4BAA-4D90-B141-9611F48C5CAD@ berkeleybop.org> I/we (Gregg) would be interested in attending. We'd present an update on the collaborative, web-based version of Apollo. We will be working with Ian Holmes and Mitch Skinner using JBrowse for basic display. -S On Nov 26, 2009, at 6:57 AM, Jonathan Warren wrote: > We are considering running a Distributed Annotation System workshop here at the Sanger/EBI in the UK subject to decent demand. > The workshop will be held from Wednesday 7th-Friday 9th April 2010. If you would be interested in attending either to present or just take part > then please email me jw12 at sanger.ac.uk > > The format of the workshop is likely to be similar to last years (1st day for beginners, 2nd for both beginners and advanced users, 3rd day for advanced), information for which can be found here: > http://www.dasregistry.org/course.jsp > > If you would like to present then please send a short summary of what you would like to talk about. > > Thanks > > Jonathan. > > Jonathan Warren > Senior Developer and DAS coordinator > jw12 at sanger.ac.uk > > > > > > > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome ResearchLimited, a charity registered in England with number 1021457 and acompany registered in England with number 2742969, whose registeredoffice is 215 Euston Road, London, NW1 2BE._______________________________________________ > DAS mailing list > DAS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das > From maj at fortinbras.us Mon Nov 30 09:31:27 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 30 Nov 2009 09:31:27 -0500 Subject: [Bioperl-l] HOWTO copyright policy vs FDL on wiki In-Reply-To: <81B3C4A1-9F14-4FF9-A4AF-F7E90817A2F1@verizon.net> References: <9EC73CA501BD45BA912F2D77954D6CD7@NewLife> <81B3C4A1-9F14-4FF9-A4AF-F7E90817A2F1@verizon.net> Message-ID: <513F1C824EF84974993A76F0CC719CDF@NewLife> Well, it has a history, Jason's point. So the question could be: "is this still a valid issue"? A while back, a user on the wiki, with natural and good intentions, removed the authorship and revision info from a couple of the HOWTOs; it is more wiki-like, after all. But Chris had some objections to that, which I seconded, mainly on the basis of the special status that seems implied by the copyright note on the HOWTO page. I also think that the nature of the howto is somewhat different from other info on the site -- that developers themselves put a lot of time in to explaining how to use their modules, and that in this world where devs get paid by recognition, it is a reasonable thing to allow this extra horn-tooting. Now, that is a policy that could be completely separable from the issue of copyright. However, devs may also get paid by using their materials in teaching seminars. The dilemma would be that people who like to use the wiki are people who like to share, and so it feels unnatural to withhold from the community the materials they develop, but people who like to share also like to eat and wear shoes... so I'm interested in everyone's thoughts about it. ----- Original Message ----- From: "Brian Osborne" To: "Mark A. Jensen" Cc: "Chris Fields" ; "Jason Stajich" ; "bioperl List" Sent: Monday, November 30, 2009 9:16 AM Subject: Re: [Bioperl-l] HOWTO copyright policy vs FDL on wiki > Mark, > > Let me ask you a question, and don't take this question as an implicit > criticism of your suggestion, it is not. Why would you want this more > restrictive copyright? > > Brian O. > > On Nov 28, 2009, at 10:32 PM, Mark A. Jensen wrote: > >> The HOWTOs appear to have a more restrictive copyright >> than FDL-- in particular, the blurb at the bottom of the >> HOWTO page asks users to use the documents for personal >> use only. I'm for this; I think we should therefore have some >> explicit license for these that specifies this kind of restriction, >> and then express that on each howto and in BioPerl:Copyright. >> Any thoughts on the right license and whether this is a good plan? >> MAJ >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From bosborne11 at verizon.net Mon Nov 30 10:15:32 2009 From: bosborne11 at verizon.net (Brian Osborne) Date: Mon, 30 Nov 2009 10:15:32 -0500 Subject: [Bioperl-l] HOWTO copyright policy vs FDL on wiki In-Reply-To: <513F1C824EF84974993A76F0CC719CDF@NewLife> References: <9EC73CA501BD45BA912F2D77954D6CD7@NewLife> <81B3C4A1-9F14-4FF9-A4AF-F7E90817A2F1@verizon.net> <513F1C824EF84974993A76F0CC719CDF@NewLife> Message-ID: <54671455-A02C-4139-8C39-AC17B50D5CE6@verizon.net> Mark, I have no objection to a more restrictive copyright, and I also have no objection to using FDL, or things like it. Brian O. On Nov 30, 2009, at 9:31 AM, Mark A. Jensen wrote: > Well, it has a history, Jason's point. So the question could > be: "is this still a valid issue"? A while back, a user on the wiki, > with natural and good intentions, removed the authorship and revision > info from a couple of the HOWTOs; it is more wiki-like, > after all. But Chris had some objections to that, which I > seconded, mainly on the basis of the special status that > seems implied by the copyright note on the HOWTO > page. I also think that the nature of the howto is somewhat > different from other info on the site -- that developers themselves > put a lot of time in to explaining how to use their modules, and > that in this world where devs get paid by recognition, it is a > reasonable > thing to allow this extra horn-tooting. Now, that is a policy > that could be completely separable from the issue of copyright. > However, devs may also get paid by using their materials in teaching > seminars. The dilemma would be that people who like to use the > wiki are people who like to share, and so it feels unnatural to > withhold from the community the materials they develop, but > people who like to share also like to eat and wear shoes... > so I'm interested in everyone's thoughts about it. > ----- Original Message ----- From: "Brian Osborne" > > To: "Mark A. Jensen" > Cc: "Chris Fields" ; "Jason Stajich" >; "bioperl List" > Sent: Monday, November 30, 2009 9:16 AM > Subject: Re: [Bioperl-l] HOWTO copyright policy vs FDL on wiki > > >> Mark, >> >> Let me ask you a question, and don't take this question as an >> implicit criticism of your suggestion, it is not. Why would you >> want this more restrictive copyright? >> >> Brian O. >> >> On Nov 28, 2009, at 10:32 PM, Mark A. Jensen wrote: >> >>> The HOWTOs appear to have a more restrictive copyright >>> than FDL-- in particular, the blurb at the bottom of the >>> HOWTO page asks users to use the documents for personal >>> use only. I'm for this; I think we should therefore have some >>> explicit license for these that specifies this kind of restriction, >>> and then express that on each howto and in BioPerl:Copyright. >>> Any thoughts on the right license and whether this is a good plan? >>> MAJ >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > From bosborne11 at verizon.net Mon Nov 30 09:16:07 2009 From: bosborne11 at verizon.net (Brian Osborne) Date: Mon, 30 Nov 2009 09:16:07 -0500 Subject: [Bioperl-l] HOWTO copyright policy vs FDL on wiki In-Reply-To: <9EC73CA501BD45BA912F2D77954D6CD7@NewLife> References: <9EC73CA501BD45BA912F2D77954D6CD7@NewLife> Message-ID: <81B3C4A1-9F14-4FF9-A4AF-F7E90817A2F1@verizon.net> Mark, Let me ask you a question, and don't take this question as an implicit criticism of your suggestion, it is not. Why would you want this more restrictive copyright? Brian O. On Nov 28, 2009, at 10:32 PM, Mark A. Jensen wrote: > The HOWTOs appear to have a more restrictive copyright > than FDL-- in particular, the blurb at the bottom of the > HOWTO page asks users to use the documents for personal > use only. I'm for this; I think we should therefore have some > explicit license for these that specifies this kind of restriction, > and then express that on each howto and in BioPerl:Copyright. > Any thoughts on the right license and whether this is a good plan? > MAJ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Mon Nov 30 12:41:44 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 30 Nov 2009 12:41:44 -0500 Subject: [Bioperl-l] How to parse BLAST output - all hits of each queryinnew file In-Reply-To: References: <4B0D6C24.2080308@gmail.com> <53DE480F205E42CE8D2B9421592AAF0E@NewLife> <815D2A47BC9C4D89B8DEF0B10DA9EAF8@NewLife> <4B0EA44D.2050507@gmail.com> <18DF7D20DFEC044098A1062202F5FFF32B630E6C53@exchsth.agresearch.co.nz> <52D67F20A9CB4953B86FF794ADE0BE96@NewLife> <18DF7D20DFEC044098A1062202F5FFF32B630E6D05@exchsth.agresearch.co.nz> Message-ID: <8C288FEF9CEB4055B0CDD19267FBA26C@NewLife> thanks Tim! corrected (I hope) in r16432... MAJ ----- Original Message ----- From: Tim Koehler To: Smithies, Russell Cc: Mark A. Jensen ; bioperl-l at lists.open-bio.org Sent: Monday, November 30, 2009 12:23 PM Subject: Re: [Bioperl-l] How to parse BLAST output - all hits of each queryinnew file Hello everybody, thanks a lot for the overwhelming answers! All these codes are different flavors and worked all. For me the added code works the best. But I think I found a bug in ...Bio/SearchIO/blast.pm. There the DEFAULT_BLAST_... variable is set to Bio::Search::Writer::HitTableWriter instead of Bio::SearchIO::Writer::HitTableWriter. This variable I changed also to HTMLResultWriter and others. So again: THANKS for the support! Cheers, Tim #!/usr/bin/perl -w use strict; use Bio::Tools::Run::StandAloneBlast; use Bio::SeqIO; use Bio::SearchIO; ### add here the writer you want use Bio::SearchIO::Writer::HitTableWriter; use Bio::Search::Result::BlastResult; use Data::Dumper; my $Seq_in = Bio::SeqIO->new( -file => "/home/koehler/Programs/for_BLAST/1_to_BLAST_two_seq.fasta", -format => "fasta" ); while ( my $query = $Seq_in->next_seq() ) { warn "Processing ",$query->id, "\n"; my $factory = Bio::Tools::Run::StandAloneBlast->new( program => "blastn", database => "/home/koehler/Programs/for_BLAST/BLAST_Pipeline/3_BLAST_db", _READMETHOD => "Blast" ); my $blast_report = $factory->blastall($query); sleep 5; # just write the result we got for this query into a #new blast-formatted file...named after the id of the query seq... my $result = $blast_report->next_result; my $blio = Bio::SearchIO->new( -file => ">".$query->id.".bls", -format => "blast" ) or die $!; $blio->write_result($result); # below, just looking at the current blast result ###this does not appear in the output files while ( my $result = $blast_report->next_result ) { ## $result is a Bio::Search::Result::ResultI compliant object while ( my $hit = $result->next_hit ) { ## $hit is a Bio::Search::Hit::HitI compliant object while ( my $hsp = $hit->next_hsp ) { ## $hsp is a Bio::Search::HSP::HSPI compliant object if ( $hsp->length('total') > 50 ) { if ( $hsp->percent_identity >= 75 ) { print "Query= ", $result->query_name, "Hit= ", $hit->name, "Length= ", $hsp->length('total'), "Percent_id= ", $hsp->percent_identity, "Subject=", $hsp->hit_string, "\n"; } } } } } } On Sun, Nov 29, 2009 at 11:29 PM, Smithies, Russell wrote: Changed it to a generic result and added a writer and it seems tio work: foreach my $qid ( keys %hits_by_query ) { warn "qid = $qid\n"; my $res = Bio::Search::Result::GenericResult->new(-algorithm => "blastn") or die $!; # print Dumper $res; foreach my $h ( @{ $hits_by_query{$qid} } ){ warn "adding hit ", $h->name, "\n"; $res->add_hit($h) if defined($h); } my $writerhtml = Bio::SearchIO::Writer::HTMLResultWriter->new(); my $blio = Bio::SearchIO->new(-writer => $writerhtml, -file => ">$qid\.bls\.html", -format => "blast" ) or die $!; $blio->write_result($res); } From: Mark A. Jensen [mailto:maj at fortinbras.us] Sent: Monday, 30 November 2009 10:19 a.m. To: Smithies, Russell; 'Tim Koehler' Subject: Re: [Bioperl-l] How to parse BLAST output - all hits of each queryinnew file My thought here was that since Tim's already going one at a time thru his queries, my scrap was not really necessary: use strict; use Bio::Tools::Run::StandAloneBlast; use Bio::SeqIO; use Bio::SearchIO; use Bio::Search::Result::BlastResult; use Data::Dumper; my $Seq_in = Bio::SeqIO->new( -file => "sequences.fasta", -format => "fasta" ); while ( my $query = $Seq_in->next_seq() ) { warn "Processing ",$query->id, "\n"; my $factory = Bio::Tools::Run::StandAloneBlast->new( program => "blastn", database => "/data/databases/flatfile/illuminati_blastdata/nt", _READMETHOD => "Blast" ); my $blast_report = $factory->blastall($query); sleep 5; # just write the result we got for this query into a #new blast-formatted file...named after the id of the query seq... my $result = $blast_report->next_result; my $blio = Bio::SearchIO->new( -file => ">".$query->id.".bls", -format => "blast" ) or die $!; $blio->write_result($result); # below, just looking at the current blast result while ( my $result = $blast_report->next_result ) { ## $result is a Bio::Search::Result::ResultI compliant object while ( my $hit = $result->next_hit ) { ## $hit is a Bio::Search::Hit::HitI compliant object while ( my $hsp = $hit->next_hsp ) { ## $hsp is a Bio::Search::HSP::HSPI compliant object if ( $hsp->length('total') > 50 ) { if ( $hsp->percent_identity >= 75 ) { print "Query= ", $result->query_name, "Hit= ", $hit->name, "Length= ", $hsp->length('total'), "Percent_id= ", $hsp->percent_identity, "Subject=", $hsp->hit_string, "\n"; } } } } } } ----- Original Message ----- From: Smithies, Russell To: 'Tim Koehler' ; 'maj at fortinbras.us' Sent: Sunday, November 29, 2009 3:58 PM Subject: RE: [Bioperl-l] How to parse BLAST output - all hits of each queryinnew file Hi Tim With various people writing the ?howtos? and other docs, the examples are bound to have differing names for the variables used but as long as you?re consistent, it should all fit together. I think I?ve almost got your code working, just getting errors from Bio::Search::Result::BlastResult which I?m not entirely sure how to use. Perhaps Mark can get this bit going? --Russell =============================== use strict; use Bio::Tools::Run::StandAloneBlast; use Bio::SeqIO; use Bio::SearchIO; use Bio::Search::Result::BlastResult; use Data::Dumper; my $Seq_in = Bio::SeqIO->new( -file => "sequences.fasta", -format => "fasta" ); while ( my $query = $Seq_in->next_seq() ) { warn "Processing ",$query->id, "\n"; my $factory = Bio::Tools::Run::StandAloneBlast->new( program => "blastn", database => "/data/databases/flatfile/illuminati_blastdata/nt", _READMETHOD => "Blast" ); my $blast_report = $factory->blastall($query); sleep 5; my %hits_by_query; while ( my $result = $blast_report->next_result ) { foreach my $hit ( $result->hits ) { warn "Pushed a hit for ",$hit->name, "\n"; push( @{ $hits_by_query{ $hit->name } }, $hit ); } } foreach my $qid ( keys %hits_by_query ) { warn "qid = $qid\n"; my $res = Bio::Search::Result::BlastResult->new() or die $!; print Dumper $res; foreach my $h ( @{ $hits_by_query{$qid} } ){ warn "adding hit ", $h->name, "\n"; $res->add_hit($h) if defined($h); } my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", -format => "blast" ) or die $!; $blio->write_result($res); } while ( my $result = $blast_report->next_result ) { ## $result is a Bio::Search::Result::ResultI compliant object while ( my $hit = $result->next_hit ) { ## $hit is a Bio::Search::Hit::HitI compliant object while ( my $hsp = $hit->next_hsp ) { ## $hsp is a Bio::Search::HSP::HSPI compliant object if ( $hsp->length('total') > 50 ) { if ( $hsp->percent_identity >= 75 ) { print "Query= ", $result->query_name, "Hit= ", $hit->name, "Length= ", $hsp->length('total'), "Percent_id= ", $hsp->percent_identity, "Subject=", $hsp->hit_string, "\n"; } } } } } } =============================== From: Tim Koehler [mailto:timbourine81 at googlemail.com] Sent: Friday, 27 November 2009 10:24 p.m. To: Smithies, Russell; maj at fortinbras.us Subject: Re: [Bioperl-l] How to parse BLAST output - all hits of each queryinnew file Hey guys, please, do not get me wrong that I wanted to put the workload on you. So far I only found the HowTo's but in there in some way the language changed with time (e.g. $in to $Seq_in) or some things I simply could not find. Now I got a tip where else to search: the scrapbook and deobfuscator. I immediately will have a look at that. This is the first time for me touching linux / perl commands; that's why I thought after several days of trial and many errors ;) asking the mailinglist. I was very happy about your fast answers! Cheers and a nice weekend, Tim On Thu, Nov 26, 2009 at 5:02 PM, Tim Koehler wrote: ups, sent too early... Hey Mark, thanks for the answer. But I am still struggling, especially where to put in your code. Here ist the code I have, so far: #!/usr/bin/perl -w ### should I put your code here as push is a perl command? my %hits_by_query; for ($result->hits) { ### I inserted a comma after name}}; if there is no comma, there was the error: Scalar found where operator expected at 12_BLAST_two_sequence_each_query_one_file.PL line7, near "} $hit" ### (Missing operator before $hit?) ###Useless use of push with no values at 12_BLAST_two_sequence_each_query_one_file.PL line 7. ###syntax error at 12_BLAST_two_sequence_each_query_one_file.PL line 7, near "} $hit" ###BEGIN not safe after errors--compilation aborted at 12_BLAST_two_sequence_each_query_one_file.PL line 13. push @{$hits_by_query{$hit->name}}, $hit; ###here, every time this terror appears: Name "main::result" used only once: possible typo at 12_BLAST_two_sequence_each_query_one_file.PL line 5. ###error: Can't call method "hits" on an undefined value at 12_BLAST_two_sequence_each_query_one_file.PL line 5. } use strict; use Bio::Tools::Run::StandAloneBlast; use Bio::SeqIO; use Bio::SearchIO; use Bio::Search::Result::BlastResult; my $Seq_in = Bio::SeqIO->new ( -file => "/home/koehler/Programs/for_BLAST/BLAST_Pipeline/1_to_BLAST_two_seq.fasta", -format => 'fasta' ); while (my $query = $Seq_in->next_seq()) { my $factory = Bio::Tools::Run::StandAloneBlast->new( 'program' => 'blastn', 'database' => '/home/koehler/Programs/for_BLAST/BLAST_Pipeline/3_BLAST_db', _READMETHOD => "Blast" ); my $blast_report = $factory->blastall($query); ### Should I need to use a module? are the commands here at the right position? errors, e.g., Global symbol "$hit" requires explicit package name #my %hits_by_query; #for ($result->hits) { ### inserted comma after name}} # push @{$hits_by_query{$hit->name}}, $hit; #} foreach my $qid ( keys %hits_by_query ) { my $result = Bio::Search::Result::BlastResult->new(); $result->add_hit($_) for ( @{$hits_by_query{$qid}} ); my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", -format=>'blast' ); $blio->write_result($result); } ###where are the files stored? what is their name. Sorry, but I cannot get behind that :( while( my $result = $blast_report->next_result ) { ## $result is a Bio::Search::Result::ResultI compliant object while( my $hit = $result->next_hit ) { ## $hit is a Bio::Search::Hit::HitI compliant object while( my $hsp = $hit->next_hsp ) { ## $hsp is a Bio::Search::HSP::HSPI compliant object if( $hsp->length('total') > 50 ) { if ( $hsp->percent_identity >= 75 ) { print "Query= ", $result->query_name, "Hit= ", $hit->name, "Length= ", $hsp->length('total'), "Percent_id= ", $hsp->percent_identity, "Subject=", $hsp->hit_string,"\n"; } } } } } } Again, a big thanks in advance :) All the best, Tim On Thu, Nov 26, 2009 at 4:52 PM, Tim wrote: Hey Mark, thanks for the answer On 25.11.2009 20:21, Mark A. Jensen wrote: > whoops: change the following line: > my $blio = Bio::SearchIO->new( -file => $qid.".bls", -format=>'blast' ); > > to > > my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", -format=>'blast' ); > > (I always forget that...) > MAJ > > ----- Original Message ----- From: "Mark A. Jensen" > To: "Tim" ; > Sent: Wednesday, November 25, 2009 1:20 PM > Subject: Re: [Bioperl-l] How to parse BLAST output - all hits of each > queryinnew file > > >> hey Tim-- >> >> Sound like you need to go about collecting your queries inside out: >> >> my %hits_by_query; >> for ($result->hits) { >> push @{$hits_by_query{$hit->name}} $hit; >> } >> >> I believe now each hash element, keyed by the query name, will contain >> an arrayref to the set of hits assoc with that query. >>> From here, I believe >> >> use Bio::Search::Result::BlastResult; >> use Bio::SearchIO; >> >> foreach my $qid ( keys %hits_by_query ) { >> my $result = Bio::Search::Result::BlastResult->new(); >> $result->add_hit($_) for ( @{$hits_by_query{$qid}} ); >> my $blio = Bio::SearchIO->new( -file => $qid.".bls", -format=>'blast' ); >> $blio->write_result($result); >> } >> >> will do what you want. >> >> hope this helps - >> Mark >> >> ----- Original Message ----- From: "Tim" >> To: >> Sent: Wednesday, November 25, 2009 12:40 PM >> Subject: [Bioperl-l] How to parse BLAST output - all hits of each >> query innew file >> >> >>> Dear bioperl users, >>> >>> I am a real newbie and have - maybe a very trivial - question. >>> >>> I searched the mailing list archive and many howtos but I have not found >>> a concrete answer to my problem. So hopefully you can help me :) >>> >>> Background: I use the latest Bioperl version (installed it two weeks >>> before). >>> When I use Bio::Tools::Run::StandAloneBlast to BLAST one fasta file >>> including different sequences, I get a BLAST output with many queries >>> each having several hits / sbjcts. >>> >>> My problem is how to parse *all* hits of *one* query into a single new >>> file. And this for all the queries I have in my BLAST output file. >>> >>> Or is it better the other way round; first to make fasta files with only >>> single sequences inside and BLAST each file? But how can I automize that >>> using Bioperl? >>> >>> I tried Bio::SearchIO but can only parse all queries and their >>> respective hits in only one file... >>> I think iteration is also necessary here, but I do not really know how >>> to include that into Bio::SearchIO. >>> Or do I have to use Module:Bio::Index::Blast? >>> >>> I can index a file (see below), but I have no idea what comes next... >>> >>> ###How I index a file... >>> >>> #!/usr/bin/perl -w >>> >>> $ENV{BIOPERL_INDEX_TYPE} = "SDBM_File"; >>> >>> use Bio::Index::Fasta; >>> >>> >>> $file_name = "8_to_BLAST_two_seq_index.fasta"; >>> $id = "48882"; >>> $inx = Bio::Index::Fasta->new (-filename => $file_name . ".idx", >>> -write_flag => 1); >>> $inx->make_index($file_name); >>> >>> >>> Hopefully, you can give me at least hints what to look for. >>> >>> A big THANKS in advance! >>> >>> Cheers, >>> >>> Tim >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > Tim K?hler MPI for Terrestrial Microbiology Karl-von-Frisch-Stra?e D-35043 Marburg / Germany Email: koehlerd at mpi-marburg.mpg.de Phone: +49 6421 178-740 Fax: +49 6421 178-999 -------------------------------------------------------------------------- Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. -------------------------------------------------------------------------- From timbourine81 at googlemail.com Mon Nov 30 12:23:58 2009 From: timbourine81 at googlemail.com (Tim Koehler) Date: Mon, 30 Nov 2009 18:23:58 +0100 Subject: [Bioperl-l] How to parse BLAST output - all hits of each queryinnew file In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32B630E6D05@exchsth.agresearch.co.nz> References: <4B0D6C24.2080308@gmail.com> <53DE480F205E42CE8D2B9421592AAF0E@NewLife> <815D2A47BC9C4D89B8DEF0B10DA9EAF8@NewLife> <4B0EA44D.2050507@gmail.com> <18DF7D20DFEC044098A1062202F5FFF32B630E6C53@exchsth.agresearch.co.nz> <52D67F20A9CB4953B86FF794ADE0BE96@NewLife> <18DF7D20DFEC044098A1062202F5FFF32B630E6D05@exchsth.agresearch.co.nz> Message-ID: Hello everybody, thanks a lot for the overwhelming answers! All these codes are different flavors and worked all. For me the added code works the best. But I think I found a bug in ...Bio/SearchIO/blast.pm. There the DEFAULT_BLAST_... variable is set to Bio::Search::Writer::HitTableWriter instead of Bio::SearchIO::Writer::HitTableWriter. This variable I changed also to HTMLResultWriter and others. So again: THANKS for the support! Cheers, Tim #!/usr/bin/perl -w use strict; use Bio::Tools::Run::StandAloneBlast; use Bio::SeqIO; use Bio::SearchIO; ### add here the writer you want use Bio::SearchIO::Writer::HitTableWriter; use Bio::Search::Result::BlastResult; use Data::Dumper; my $Seq_in = Bio::SeqIO->new( -file => "/home/koehler/Programs/for_BLAST/1_to_BLAST_two_seq.fasta", -format => "fasta" ); while ( my $query = $Seq_in->next_seq() ) { warn "Processing ",$query->id, "\n"; my $factory = Bio::Tools::Run::StandAloneBlast->new( program => "blastn", database => "/home/koehler/Programs/for_BLAST/BLAST_Pipeline/3_BLAST_db", _READMETHOD => "Blast" ); my $blast_report = $factory->blastall($query); sleep 5; # just write the result we got for this query into a #new blast-formatted file...named after the id of the query seq... my $result = $blast_report->next_result; my $blio = Bio::SearchIO->new( -file => ">".$query->id.".bls", -format => "blast" ) or die $!; $blio->write_result($result); # below, just looking at the current blast result ###this does not appear in the output files while ( my $result = $blast_report->next_result ) { ## $result is a Bio::Search::Result::ResultI compliant object while ( my $hit = $result->next_hit ) { ## $hit is a Bio::Search::Hit::HitI compliant object while ( my $hsp = $hit->next_hsp ) { ## $hsp is a Bio::Search::HSP::HSPI compliant object if ( $hsp->length('total') > 50 ) { if ( $hsp->percent_identity >= 75 ) { print "Query= ", $result->query_name, "Hit= ", $hit->name, "Length= ", $hsp->length('total'), "Percent_id= ", $hsp->percent_identity, "Subject=", $hsp->hit_string, "\n"; } } } } } } On Sun, Nov 29, 2009 at 11:29 PM, Smithies, Russell < Russell.Smithies at agresearch.co.nz> wrote: > Changed it to a generic result and added a writer and it seems tio work: > > > > foreach my $qid ( keys %hits_by_query ) { > > warn "qid = $qid\n"; > > my $res = Bio::Search::Result::GenericResult->new(-algorithm => > "blastn") or die $!; > > # print Dumper $res; > > foreach my $h ( @{ $hits_by_query{$qid} } ){ > > warn "adding hit ", $h->name, "\n"; > > $res->add_hit($h) if defined($h); > > } > > my $writerhtml = Bio::SearchIO::Writer::HTMLResultWriter->new(); > > my $blio = Bio::SearchIO->new(-writer => $writerhtml, -file => > ">$qid\.bls\.html", -format => "blast" ) or die $!; > > $blio->write_result($res); > > } > > > > > > *From:* Mark A. Jensen [mailto:maj at fortinbras.us] > *Sent:* Monday, 30 November 2009 10:19 a.m. > *To:* Smithies, Russell; 'Tim Koehler' > > *Subject:* Re: [Bioperl-l] How to parse BLAST output - all hits of each > queryinnew file > > > > My thought here was that since Tim's already going one at a time thru > > his queries, my scrap was not really necessary: > > > > use strict; > > use Bio::Tools::Run::StandAloneBlast; > > use Bio::SeqIO; > > use Bio::SearchIO; > > use Bio::Search::Result::BlastResult; > > > > use Data::Dumper; > > > > my $Seq_in = Bio::SeqIO->new( -file => "sequences.fasta", > > -format => "fasta" ); > > > > while ( my $query = $Seq_in->next_seq() ) { > > warn "Processing ",$query->id, "\n"; > > my $factory = > > Bio::Tools::Run::StandAloneBlast->new( > > program => "blastn", > > database => > "/data/databases/flatfile/illuminati_blastdata/nt", > > _READMETHOD => "Blast" > > ); > > > > my $blast_report = $factory->blastall($query); > > sleep 5; > > > > # just write the result we got for this query into a > > #new blast-formatted file...named after the id of the query seq... > > my $result = $blast_report->next_result; > > my $blio = Bio::SearchIO->new( -file => ">".$query->id.".bls", -format => > "blast" ) or die $!; > > $blio->write_result($result); > > > > # below, just looking at the current blast result > > while ( my $result = $blast_report->next_result ) { > > ## $result is a Bio::Search::Result::ResultI compliant object > > while ( my $hit = $result->next_hit ) { > > ## $hit is a Bio::Search::Hit::HitI compliant object > > while ( my $hsp = $hit->next_hsp ) { > > ## $hsp is a Bio::Search::HSP::HSPI compliant object > > if ( $hsp->length('total') > 50 ) { > > if ( $hsp->percent_identity >= 75 ) { > > print "Query= ", $result->query_name, > > "Hit= ", $hit->name, > > "Length= ", $hsp->length('total'), > > "Percent_id= ", $hsp->percent_identity, > > "Subject=", $hsp->hit_string, "\n"; > > } > > } > > } > > } > > } > > } > > ----- Original Message ----- > > *From:* Smithies, Russell > > *To:* 'Tim Koehler' ; 'maj at fortinbras.us'<%27maj at fortinbras.us%27> > > *Sent:* Sunday, November 29, 2009 3:58 PM > > *Subject:* RE: [Bioperl-l] How to parse BLAST output - all hits of each > queryinnew file > > > > Hi Tim > > With various people writing the ?howtos? and other docs, the examples are > bound to have differing names for the variables used but as long as you?re > consistent, it should all fit together. > > > > I think I?ve almost got your code working, just getting errors from > Bio::Search::Result::BlastResult which I?m not entirely sure how to use. > Perhaps Mark can get this bit going? > > > > --Russell > > =============================== > > > > use strict; > > use Bio::Tools::Run::StandAloneBlast; > > use Bio::SeqIO; > > use Bio::SearchIO; > > use Bio::Search::Result::BlastResult; > > > > use Data::Dumper; > > > > my $Seq_in = Bio::SeqIO->new( -file => "sequences.fasta", > > -format => "fasta" ); > > > > while ( my $query = $Seq_in->next_seq() ) { > > warn "Processing ",$query->id, "\n"; > > my $factory = > > Bio::Tools::Run::StandAloneBlast->new( > > program => "blastn", > > database => > "/data/databases/flatfile/illuminati_blastdata/nt", > > _READMETHOD => "Blast" > > ); > > > > my $blast_report = $factory->blastall($query); > > sleep 5; > > > > > > my %hits_by_query; > > > > while ( my $result = $blast_report->next_result ) { > > foreach my $hit ( $result->hits ) { > > warn "Pushed a hit for ",$hit->name, "\n"; > > push( @{ $hits_by_query{ $hit->name } }, $hit ); > > } > > } > > > > foreach my $qid ( keys %hits_by_query ) { > > warn "qid = $qid\n"; > > my $res = Bio::Search::Result::BlastResult->new() or die $!; > > print Dumper $res; > > foreach my $h ( @{ $hits_by_query{$qid} } ){ > > warn "adding hit ", $h->name, "\n"; > > $res->add_hit($h) if defined($h); > > } > > my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", -format => > "blast" ) or die $!; > > $blio->write_result($res); > > } > > > > while ( my $result = $blast_report->next_result ) { > > ## $result is a Bio::Search::Result::ResultI compliant object > > while ( my $hit = $result->next_hit ) { > > ## $hit is a Bio::Search::Hit::HitI compliant object > > while ( my $hsp = $hit->next_hsp ) { > > ## $hsp is a Bio::Search::HSP::HSPI compliant object > > if ( $hsp->length('total') > 50 ) { > > if ( $hsp->percent_identity >= 75 ) { > > print "Query= ", $result->query_name, > > "Hit= ", $hit->name, > > "Length= ", $hsp->length('total'), > > "Percent_id= ", $hsp->percent_identity, > > "Subject=", $hsp->hit_string, "\n"; > > } > > } > > } > > } > > } > > } > > =============================== > > > > *From:* Tim Koehler [mailto:timbourine81 at googlemail.com] > *Sent:* Friday, 27 November 2009 10:24 p.m. > *To:* Smithies, Russell; maj at fortinbras.us > *Subject:* Re: [Bioperl-l] How to parse BLAST output - all hits of each > queryinnew file > > > > Hey guys, > > please, do not get me wrong that I wanted to put the workload on you. So > far I only found the HowTo's but in there in some way the language changed > with time (e.g. $in to $Seq_in) or some things I simply could not find. > Now I got a tip where else to search: the scrapbook and deobfuscator. > > I immediately will have a look at that. > > This is the first time for me touching linux / perl commands; that's why I > thought after several days of trial and many errors ;) asking the > mailinglist. > > I was very happy about your fast answers! > > Cheers and a nice weekend, > > Tim > > On Thu, Nov 26, 2009 at 5:02 PM, Tim Koehler > wrote: > > ups, sent too early... > > Hey Mark, > > thanks for the answer. But I am still struggling, especially where to put > in your code. > > Here ist the code I have, so far: > > #!/usr/bin/perl -w > > ### should I put your code here as push is a perl command? > > > my %hits_by_query; > for ($result->hits) { > > ### I inserted a comma after name}}; if there is no comma, there was the > error: Scalar found where operator expected at > 12_BLAST_two_sequence_each_query_one_file.PL line7, near "} $hit" > ### (Missing operator before $hit?) > ###Useless use of push with no values at > 12_BLAST_two_sequence_each_query_one_file.PL line 7. > ###syntax error at 12_BLAST_two_sequence_each_query_one_file.PL line 7, > near "} $hit" > ###BEGIN not safe after errors--compilation aborted at > 12_BLAST_two_sequence_each_query_one_file.PL line 13. > > > push @{$hits_by_query{$hit->name}}, $hit; > > ###here, every time this terror appears: Name "main::result" used only > once: possible typo at 12_BLAST_two_sequence_each_query_one_file.PL line 5. > ###error: Can't call method "hits" on an undefined value at > 12_BLAST_two_sequence_each_query_one_file.PL line 5. > > > } > > > use strict; > use Bio::Tools::Run::StandAloneBlast; > use Bio::SeqIO; > use Bio::SearchIO; > > use Bio::Search::Result::BlastResult; > > my $Seq_in = Bio::SeqIO->new ( > -file => > "/home/koehler/Programs/for_BLAST/BLAST_Pipeline/1_to_BLAST_two_seq.fasta", > -format => 'fasta' > ); > while (my $query = $Seq_in->next_seq()) { > > > my $factory = Bio::Tools::Run::StandAloneBlast->new( > > 'program' => 'blastn', > 'database' => '/home/koehler/Programs/for_BLAST/BLAST_Pipeline/3_BLAST_db', > _READMETHOD => "Blast" > ); > > my $blast_report = $factory->blastall($query); > > ### Should I need to use a module? are the commands here at the right > position? errors, e.g., Global symbol "$hit" requires explicit package name > #my %hits_by_query; > #for ($result->hits) { > ### inserted comma after name}} > # push @{$hits_by_query{$hit->name}}, $hit; > #} > > > > foreach my $qid ( keys %hits_by_query ) { > my $result = Bio::Search::Result::BlastResult->new(); > $result->add_hit($_) for ( @{$hits_by_query{$qid}} ); > my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", -format=>'blast' ); > $blio->write_result($result); > } > > ###where are the files stored? what is their name. Sorry, but I cannot get > behind that :( > > while( my $result = $blast_report->next_result ) { > ## $result is a Bio::Search::Result::ResultI compliant object > > > while( my $hit = $result->next_hit ) { > > ## $hit is a Bio::Search::Hit::HitI compliant object > > > while( my $hsp = $hit->next_hsp ) { > > ## $hsp is a Bio::Search::HSP::HSPI compliant object > if( $hsp->length('total') > 50 ) { > if ( $hsp->percent_identity >= 75 ) { > print "Query= ", $result->query_name, > "Hit= ", $hit->name, > "Length= ", $hsp->length('total'), > "Percent_id= ", $hsp->percent_identity, > "Subject=", $hsp->hit_string,"\n"; > } > } > } > } > } > } > > Again, a big thanks in advance :) > > All the best, > > Tim > > On Thu, Nov 26, 2009 at 4:52 PM, Tim wrote: > > Hey Mark, > > thanks for the answer > > > > > On 25.11.2009 20:21, Mark A. Jensen wrote: > > whoops: change the following line: > > my $blio = Bio::SearchIO->new( -file => $qid.".bls", -format=>'blast' ); > > > > to > > > > my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", -format=>'blast' ); > > > > (I always forget that...) > > MAJ > > > > ----- Original Message ----- From: "Mark A. Jensen" > > To: "Tim" ; > > Sent: Wednesday, November 25, 2009 1:20 PM > > Subject: Re: [Bioperl-l] How to parse BLAST output - all hits of each > > queryinnew file > > > > > >> hey Tim-- > >> > >> Sound like you need to go about collecting your queries inside out: > >> > >> my %hits_by_query; > >> for ($result->hits) { > >> push @{$hits_by_query{$hit->name}} $hit; > >> } > >> > >> I believe now each hash element, keyed by the query name, will contain > >> an arrayref to the set of hits assoc with that query. > >>> From here, I believe > >> > >> use Bio::Search::Result::BlastResult; > >> use Bio::SearchIO; > >> > >> foreach my $qid ( keys %hits_by_query ) { > >> my $result = Bio::Search::Result::BlastResult->new(); > >> $result->add_hit($_) for ( @{$hits_by_query{$qid}} ); > >> my $blio = Bio::SearchIO->new( -file => $qid.".bls", -format=>'blast' > ); > >> $blio->write_result($result); > >> } > >> > >> will do what you want. > >> > >> hope this helps - > >> Mark > >> > >> ----- Original Message ----- From: "Tim" > >> To: > >> Sent: Wednesday, November 25, 2009 12:40 PM > >> Subject: [Bioperl-l] How to parse BLAST output - all hits of each > >> query innew file > >> > >> > >>> Dear bioperl users, > >>> > >>> I am a real newbie and have - maybe a very trivial - question. > >>> > >>> I searched the mailing list archive and many howtos but I have not > found > >>> a concrete answer to my problem. So hopefully you can help me :) > >>> > >>> Background: I use the latest Bioperl version (installed it two weeks > >>> before). > >>> When I use Bio::Tools::Run::StandAloneBlast to BLAST one fasta file > >>> including different sequences, I get a BLAST output with many queries > >>> each having several hits / sbjcts. > >>> > >>> My problem is how to parse *all* hits of *one* query into a single new > >>> file. And this for all the queries I have in my BLAST output file. > >>> > >>> Or is it better the other way round; first to make fasta files with > only > >>> single sequences inside and BLAST each file? But how can I automize > that > >>> using Bioperl? > >>> > >>> I tried Bio::SearchIO but can only parse all queries and their > >>> respective hits in only one file... > >>> I think iteration is also necessary here, but I do not really know how > >>> to include that into Bio::SearchIO. > >>> Or do I have to use Module:Bio::Index::Blast? > >>> > >>> I can index a file (see below), but I have no idea what comes next... > >>> > >>> ###How I index a file... > >>> > >>> #!/usr/bin/perl -w > >>> > >>> $ENV{BIOPERL_INDEX_TYPE} = "SDBM_File"; > >>> > >>> use Bio::Index::Fasta; > >>> > >>> > >>> $file_name = "8_to_BLAST_two_seq_index.fasta"; > >>> $id = "48882"; > >>> $inx = Bio::Index::Fasta->new (-filename => $file_name . ".idx", > >>> -write_flag => 1); > >>> $inx->make_index($file_name); > >>> > >>> > >>> Hopefully, you can give me at least hints what to look for. > >>> > >>> A big THANKS in advance! > >>> > >>> Cheers, > >>> > >>> Tim > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >>> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > > > > Tim K?hler > MPI for Terrestrial Microbiology > Karl-von-Frisch-Stra?e > D-35043 Marburg / Germany > > Email: koehlerd at mpi-marburg.mpg.de > Phone: +49 6421 178-740 > Fax: +49 6421 178-999 > > > > > ------------------------------ > > *Attention: *The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities to > which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ------------------------------ > > > > From maj at fortinbras.us Sun Nov 1 23:47:15 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sun, 1 Nov 2009 23:47:15 -0500 Subject: [Bioperl-l] annotations Message-ID: <5150801225E0484D95DC51B2D00AE519@NewLife> I'm cogitating on features and annotations. For a RichSeq, one gets the set of annotations by $seq->annotation->get_Annotations while getting features by $seq->get_Features Is there a reason not to have a method in SeqI sub get_Annotations { shift->annotation->get_Annotations } to allow a user to do what seems natural from a user's perspective, viz. $seq->get_Annotations? I imagine this might save hundreds of hours of frustration, integrated over all newbies. MAJ From cjfields at illinois.edu Mon Nov 2 08:08:54 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 2 Nov 2009 07:08:54 -0600 Subject: [Bioperl-l] annotations In-Reply-To: <5150801225E0484D95DC51B2D00AE519@NewLife> References: <5150801225E0484D95DC51B2D00AE519@NewLife> Message-ID: <6920A9E1-D221-4CF8-9866-0ADBDB254C19@illinois.edu> On Nov 1, 2009, at 10:47 PM, Mark A. Jensen wrote: > I'm cogitating on features and annotations. For a RichSeq, one gets > the set of annotations by > > $seq->annotation->get_Annotations > > while getting features by > > $seq->get_Features > > Is there a reason not to have a method in SeqI > > sub get_Annotations { shift->annotation->get_Annotations } > > to allow a user to do what seems natural from a user's perspective, > viz. $seq->get_Annotations? I imagine this might save hundreds of > hours of frustration, integrated over all newbies. > MAJ One could add the methods to delegate to annotation() (that's essentially what I'm planning on doing for Biome). chris From kiekyon.huang at gmail.com Tue Nov 3 10:14:39 2009 From: kiekyon.huang at gmail.com (Kie Kyon Huang) Date: Tue, 3 Nov 2009 23:14:39 +0800 Subject: [Bioperl-l] render_blast problem Message-ID: Hi, I was trying to follow the HOWTO:Graphics at http://www.bioperl.org/wiki/HOWTO:Graphics When running the command line in cygwin $ perl render_blast1.pl data1.txt | display - I get the following error line, bash: display: command not found I also tried $ perl render_blast1.pl data1.txt > data1.png however, I was unable to open the data1.png file using Microsoft Office Picture Manager or windows Photo Gallery Thanks Huang From biopython at maubp.freeserve.co.uk Tue Nov 3 10:45:37 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 3 Nov 2009 15:45:37 +0000 Subject: [Bioperl-l] render_blast problem In-Reply-To: References: Message-ID: <320fb6e00911030745s68331ef7n729505f460863e21@mail.gmail.com> On Tue, Nov 3, 2009 at 3:14 PM, Kie Kyon Huang wrote: > Hi, > > I was trying to follow the HOWTO:Graphics at > http://www.bioperl.org/wiki/HOWTO:Graphics > > When running the command line in cygwin > > $ perl render_blast1.pl data1.txt | display - > > I get the following error line, > > bash: display: command not found That makes sense on Windows, since display is a Unix command line tool. > I also tried > > $ perl render_blast1.pl data1.txt > data1.png Based on the wiki, I think that ought to have worked. > however, I was unable to open the data1.png file using Microsoft > Office Picture Manager or windows Photo Gallery Did you do this step?: >> Important! If you are on a Windows platform, you need to put >> STDOUT into binary mode so that the PNG file does not go >> through Window's carriage return/linefeed transformations. >> Before the final print statement, put the statement >> binmode(STDOUT). This advice also applies to certain older >> versions of RedHat, which ship with a patched (and possibly >> broken) version of Perl. (BioPerl devs - couldn't that be added to the default render_blast1.pl script with an if statement checking for Windows?) Peter From biopython at maubp.freeserve.co.uk Tue Nov 3 11:04:59 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 3 Nov 2009 16:04:59 +0000 Subject: [Bioperl-l] render_blast problem In-Reply-To: References: <320fb6e00911030745s68331ef7n729505f460863e21@mail.gmail.com> Message-ID: <320fb6e00911030804r62e50da6w373bbb61e9823f28@mail.gmail.com> Mailing list CC'd - solved :) On Tue, Nov 3, 2009 at 3:55 PM, Kie Kyon Huang wrote: > > ok, that fix it > i forget sometimes what platform am i on. > thanks Great. Peter From amackey at virginia.edu Tue Nov 3 12:09:00 2009 From: amackey at virginia.edu (Aaron Mackey) Date: Tue, 3 Nov 2009 12:09:00 -0500 Subject: [Bioperl-l] svn errors? Message-ID: <24c96eca0911030909p7cfbf858h4de5a345cf8a0782@mail.gmail.com> [ajm6q at lc4 bioperl-live]$ svn update svn: Decompression of svndiff data failed I'll admit to not having svn updated in awhile; A clean, anonymous svn co failed with the same message: [...] A bioperl-live/Bio/Structure/StructureI.pm A bioperl-live/Bio/Structure/IO svn: Decompression of svndiff data failed -Aaron P.S. I used this command: svn co svn:// code.open-bio.org/bioperl/bioperl-live/trunk bioperl-live From cjfields at illinois.edu Tue Nov 3 12:17:10 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 3 Nov 2009 11:17:10 -0600 Subject: [Bioperl-l] svn errors? In-Reply-To: <24c96eca0911030909p7cfbf858h4de5a345cf8a0782@mail.gmail.com> References: <24c96eca0911030909p7cfbf858h4de5a345cf8a0782@mail.gmail.com> Message-ID: <8C5FC42D-F957-45AC-9AAC-876ACC9D77E0@illinois.edu> Aaron, Yep, this was reported to support (a couple of users on #bioperl reported the same problem). Chris D. is looking into it. I'm wondering if it's worth setting up a second mirror to github for this purpose. chris On Nov 3, 2009, at 11:09 AM, Aaron Mackey wrote: > [ajm6q at lc4 bioperl-live]$ svn update > svn: Decompression of svndiff data failed > > > I'll admit to not having svn updated in awhile; A clean, anonymous > svn co > failed with the same message: > > [...] > A bioperl-live/Bio/Structure/StructureI.pm > A bioperl-live/Bio/Structure/IO > svn: Decompression of svndiff data failed > > -Aaron > > P.S. I used this command: svn co svn:// > code.open-bio.org/bioperl/bioperl-live/trunk bioperl-live > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Nov 3 12:19:56 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 3 Nov 2009 11:19:56 -0600 Subject: [Bioperl-l] render_blast problem In-Reply-To: <320fb6e00911030745s68331ef7n729505f460863e21@mail.gmail.com> References: <320fb6e00911030745s68331ef7n729505f460863e21@mail.gmail.com> Message-ID: <8336341C-C7B4-4740-A7C3-E2DE5FDAF651@illinois.edu> On Nov 3, 2009, at 9:45 AM, Peter wrote: > ... > Did you do this step?: >>> Important! If you are on a Windows platform, you need to put >>> STDOUT into binary mode so that the PNG file does not go >>> through Window's carriage return/linefeed transformations. >>> Before the final print statement, put the statement >>> binmode(STDOUT). This advice also applies to certain older >>> versions of RedHat, which ship with a patched (and possibly >>> broken) version of Perl. > > (BioPerl devs - couldn't that be added to the default > render_blast1.pl script with an if statement checking for > Windows?) > > Peter Yes, that should be added. I'll work on it. chris From mauricio at open-bio.org Tue Nov 3 12:20:52 2009 From: mauricio at open-bio.org (Mauricio Herrera Cuadra) Date: Tue, 03 Nov 2009 11:20:52 -0600 Subject: [Bioperl-l] svn errors? In-Reply-To: <24c96eca0911030909p7cfbf858h4de5a345cf8a0782@mail.gmail.com> References: <24c96eca0911030909p7cfbf858h4de5a345cf8a0782@mail.gmail.com> Message-ID: <4AF06674.30506@open-bio.org> Hi Aaron, This was reported a few days ago. Chris Dagdigian is working today on a fix for it. Mauricio. Aaron Mackey wrote: > [ajm6q at lc4 bioperl-live]$ svn update > svn: Decompression of svndiff data failed > > > I'll admit to not having svn updated in awhile; A clean, anonymous svn co > failed with the same message: > > [...] > A bioperl-live/Bio/Structure/StructureI.pm > A bioperl-live/Bio/Structure/IO > svn: Decompression of svndiff data failed > > -Aaron > > P.S. I used this command: svn co svn:// > code.open-bio.org/bioperl/bioperl-live/trunk bioperl-live > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From rachitasharma at gmail.com Tue Nov 3 17:12:11 2009 From: rachitasharma at gmail.com (Rachita Sharma) Date: Tue, 3 Nov 2009 14:12:11 -0800 Subject: [Bioperl-l] Trouble parsing PSI-BLAST Message-ID: <48f9c0d0911031412v26935097ib06d13c2266cfd8a@mail.gmail.com> I am having trouble parsing PSI-BLAST results. Please help. The code is: my $in = new Bio::SearchIO( -format => 'blast', -file => "BS_XFpsiRblastoutputs/e${ev}/bloutput${i}.txt"); while( my $result = $in->next_result ) { while( my $hit = $result->next_hit ) { $sth->execute($result->query_name, $hit->name, $hit->significance); print "Query executed!\n"; } } The error is: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: no data for midline ***** No hits found ****** STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:359 STACK: Bio::SearchIO::blast::next_result /usr/lib/perl5/site_perl/5.8.8/Bio/SearchIO/blast.pm:1813 STACK: BSubVCpsiRblast.pl:92 ----------------------------------------------------------- From cjfields at illinois.edu Tue Nov 3 22:42:55 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 3 Nov 2009 21:42:55 -0600 Subject: [Bioperl-l] Trouble parsing PSI-BLAST In-Reply-To: <48f9c0d0911031412v26935097ib06d13c2266cfd8a@mail.gmail.com> References: <48f9c0d0911031412v26935097ib06d13c2266cfd8a@mail.gmail.com> Message-ID: Rachita, You'll have to give us more to go on than this. The best thing to do is file a bug report and attach an example PSI-BLAST report and code that causes the problem. The $sth->execute(...) is a bit odd, but that shouldn't cause the error in question. Also, make sure to stipulate the OS, version of BioPerl, and perl version. chris On Nov 3, 2009, at 4:12 PM, Rachita Sharma wrote: > I am having trouble parsing PSI-BLAST results. Please help. > > The code is: > my $in = new Bio::SearchIO( -format => 'blast', > -file => > "BS_XFpsiRblastoutputs/e${ev}/bloutput${i}.txt"); > > > while( my $result = $in->next_result ) { > while( my $hit = $result->next_hit ) { > > $sth->execute($result->query_name, $hit->name, $hit->significance); > print "Query executed!\n"; > > } > } > > The error is: > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: no data for midline ***** No hits found ****** > STACK: Error::throw > STACK: Bio::Root::Root::throw > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:359 > STACK: Bio::SearchIO::blast::next_result > /usr/lib/perl5/site_perl/5.8.8/Bio/SearchIO/blast.pm:1813 > STACK: BSubVCpsiRblast.pl:92 > ----------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From alexl at users.sourceforge.net Wed Nov 4 02:30:21 2009 From: alexl at users.sourceforge.net (Alex Lancaster) Date: Wed, 04 Nov 2009 02:30:21 -0500 Subject: [Bioperl-l] version of ExtUtils::Manifest too strict? Message-ID: Does the version of ExtUtils::Manifest really need to be strictly greater than or equal to 1.52? Currently this blocks me updating the Fedora package of BioPerl to 1.6.1, because the version of perl that Fedora ships is on 1.51 and hence the build fails with: Checking prerequisites... - ERROR: ExtUtils::Manifest (1.51_01) is installed, but we need version >= 1.52 Full logs are here: http://koji.fedoraproject.org/koji/taskinfo?taskID=1787483 http://koji.fedoraproject.org/koji/getfile?taskID=1787483&name=build.log This is true even with the version of Perl in rawhide/F-12 etc. (ExtUtils::Manifest is in the base perl package). If it really is necessary, I would like to be armed with a good argument why it needs to be updated, since the Perl package maintainer would have to update the entire Perl package simply to get a more recent version of one small subpackage. Regards, Alex From jluis.lavin at unavarra.es Wed Nov 4 03:43:35 2009 From: jluis.lavin at unavarra.es (jluis.lavin at unavarra.es) Date: Wed, 4 Nov 2009 09:43:35 +0100 (CET) Subject: [Bioperl-l] Trouble retrieving multiple sequences from NCBI in a single list query Message-ID: <1386.130.206.164.153.1257324215.squirrel@webmail.unavarra.es> Hello all, I?m a newbie who is having terrible troubles trying to retrieve a list multiple sequences from the NCBI and write them to a single file in Fasta format. The code I?ve written seems to read mylist and retrive the sequences, but it kinda overwrites them so that I only get the last sequence on the list. I?ve been told to ask the people on this mailing list for help, since you may have come across this problem also or at last will know how to solve it... Here is my code, which basically consist on an STDIN for the list to be read into an array and a loop to read each sequence (stopping when the list ends) and retrieve a sequence each time the loop is launched, writting that sequence to a fasta file. I only get a sequence back although it seems to perform the retrieving process with each of the sequences of the list... #!/usr/bin/perl -w use strict; use Bio::DB::GenPept; use Bio::DB::GenBank; use Bio::SeqIO; print "Enter your list name:"; my $archivo=; chomp $archivo; die ("Can?t open input\n") unless (open(INFILE, $archivo)); my @lista = ; foreach my $seq (@lista) { if ($seq eq '') { die ("empty list") } else { my $db = new Bio::DB::GenPept("-format" => "Fasta"); my $seqobj = $db->get_Seq_by_acc($seq); my $out = new Bio::SeqIO (-file => ">extracted_seqs.fasta", -format => 'fasta'); $out->write_seq($seqobj); } } exit; An example list of sequences can be this one: YP_003107578.1 YP_003106103.1 YP_003106552.1 YP_003106560.1 YP_003107053.1 YP_003107450.1 YP_003108000.1 YP_003105023.1 YP_003105264.1 Thanks in advance for your help ;) -- Jos? Luis Lav?n Trueba, PhD Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN From e.osimo at gmail.com Wed Nov 4 04:54:52 2009 From: e.osimo at gmail.com (Emanuele Osimo) Date: Wed, 4 Nov 2009 10:54:52 +0100 Subject: [Bioperl-l] Bio::Graphics and picture format Message-ID: <2ac05d0f0911040154h4eed4a1j8108f78e6e4761f3@mail.gmail.com> Hello everyone, do you know if it is possible to generate an image with Bio::Graphics in a vector format? Is there a list of available formats? Thanks Emanuele From David.Messina at sbc.su.se Wed Nov 4 04:52:53 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 4 Nov 2009 10:52:53 +0100 Subject: [Bioperl-l] Trouble retrieving multiple sequences from NCBI in a single list query In-Reply-To: <1386.130.206.164.153.1257324215.squirrel@webmail.unavarra.es> References: <1386.130.206.164.153.1257324215.squirrel@webmail.unavarra.es> Message-ID: <628aabb70911040152r19ed79dfnbc54f346295d28a8@mail.gmail.com> > > The code I?ve written seems to read mylist and retrive the sequences, but > it kinda overwrites them so that I only get the last sequence on the list. > With this line my $out = new Bio::SeqIO (-file => ">extracted_seqs.fasta", -format => 'fasta'); you are opening the filehandle for the output file inside your loop, so each time it is writing over the previous file with an empty file. Then, you write a single sequence to that file with this line $out->write_seq($seqobj); So when you are done, you just have the last sequence in the output file. If you move the opening of the output filehandle outside the loop (it needs to be done only once), then it should work as you expect. Also, I notice the newline characters are not being removed from your sequence IDs (actually I'm a little surprised that the sequences are being retrieved). Just to be safe, you may want to add the line chomp @lista; after my @lista = ; Dave From jluis.lavin at unavarra.es Wed Nov 4 05:14:40 2009 From: jluis.lavin at unavarra.es (jluis.lavin at unavarra.es) Date: Wed, 4 Nov 2009 11:14:40 +0100 (CET) Subject: [Bioperl-l] Trouble retrieving multiple sequences from NCBI in a single list query In-Reply-To: <628aabb70911040152r19ed79dfnbc54f346295d28a8@mail.gmail.com> References: <1386.130.206.164.153.1257324215.squirrel@webmail.unavarra.es> <628aabb70911040152r19ed79dfnbc54f346295d28a8@mail.gmail.com> Message-ID: <1791.130.206.164.153.1257329680.squirrel@webmail.unavarra.es> Thank you very very much Dave, I?ve had a really frustrating time trying to find out what I was doing wrong, it has been so frustrating that I was about to quit Bioperl. Now I can try to focus on BLAST parsing for my comparative genomic analysis You?re great in this mailing list, because you give a fast and neat advice to all the questions asked here by newbies like me ;) El Mie, 4 de Noviembre de 2009, 10:52, Dave Messina escribi?: >> >> The code I??ve written seems to read mylist and retrive the sequences, >> but >> it kinda overwrites them so that I only get the last sequence on the >> list. >> > > With this line > > my $out = new Bio::SeqIO (-file => ">extracted_seqs.fasta", -format => > 'fasta'); > > > you are opening the filehandle for the output file inside your loop, so > each > time it is writing over the previous file with an empty file. Then, you > write a single sequence to that file with this line > > $out->write_seq($seqobj); > > > So when you are done, you just have the last sequence in the output file. > > If you move the opening of the output filehandle outside the loop (it > needs > to be done only once), then it should work as you expect. > > Also, I notice the newline characters are not being removed from your > sequence IDs (actually I'm a little surprised that the sequences are > being > retrieved). Just to be safe, you may want to add the line > > chomp @lista; > > > after > > my @lista = ; > > > > > Dave > -- Dr. Jos? Luis Lav?n Trueba Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN From hrh at fmi.ch Wed Nov 4 05:05:17 2009 From: hrh at fmi.ch (Hotz, Hans-Rudolf) Date: Wed, 04 Nov 2009 11:05:17 +0100 Subject: [Bioperl-l] Trouble retrieving multiple sequences from NCBI in a single list query In-Reply-To: <1386.130.206.164.153.1257324215.squirrel@webmail.unavarra.es> Message-ID: Hi try my $out = new Bio::SeqIO (-file => ">>extracted_seqs.fasta", ^ this way you no longer overwrite your existing file, but append the next sequence. Regards, Hans On 11/4/09 9:43 AM, "jluis.lavin at unavarra.es" wrote: > > Hello all, > > I?m a newbie who is having terrible troubles trying to retrieve a list > multiple sequences from the NCBI and write them to a single file in Fasta > format. > The code I?ve written seems to read mylist and retrive the sequences, but > it kinda overwrites them so that I only get the last sequence on the list. > I?ve been told to ask the people on this mailing list for help, since you > may have come across this problem also or at last will know how to solve > it... > > Here is my code, which basically consist on an STDIN for the list to be > read into an array and a loop to read each sequence (stopping when the > list ends) and retrieve a sequence each time the loop is launched, > writting that sequence to a fasta file. I only get a sequence back > although it seems to perform the retrieving process with each of the > sequences of the list... > > > #!/usr/bin/perl -w > use strict; > use Bio::DB::GenPept; > use Bio::DB::GenBank; > use Bio::SeqIO; > print "Enter your list name:"; > my $archivo=; > chomp $archivo; > die ("Can?t open input\n") unless (open(INFILE, $archivo)); > my @lista = ; > foreach my $seq (@lista) { > if ($seq eq '') { > die ("empty list") > } > else { > my $db = new Bio::DB::GenPept("-format" => "Fasta"); > my $seqobj = $db->get_Seq_by_acc($seq); > my $out = new Bio::SeqIO (-file => ">extracted_seqs.fasta", > -format => 'fasta'); > $out->write_seq($seqobj); > } > } > exit; > > > An example list of sequences can be this one: > > YP_003107578.1 > YP_003106103.1 > YP_003106552.1 > YP_003106560.1 > YP_003107053.1 > YP_003107450.1 > YP_003108000.1 > YP_003105023.1 > YP_003105264.1 > > Thanks in advance for your help ;) From jluis.lavin at unavarra.es Wed Nov 4 05:25:38 2009 From: jluis.lavin at unavarra.es (jluis.lavin at unavarra.es) Date: Wed, 4 Nov 2009 11:25:38 +0100 (CET) Subject: [Bioperl-l] Trouble retrieving multiple sequences from NCBI in asingle list query In-Reply-To: References: <1386.130.206.164.153.1257324215.squirrel@webmail.unavarra.es> Message-ID: <1834.130.206.164.153.1257330338.squirrel@webmail.unavarra.es> Thank you very much for your answer Hans!!! It works perfectly,also a neat and fast solution, like Dave?s. Blessings to you all ;) El Mie, 4 de Noviembre de 2009, 11:05, Hotz, Hans-Rudolf escribi?: > Hi > > try > > my $out = new Bio::SeqIO (-file => ">>extracted_seqs.fasta", > ^ > > this way you no longer overwrite your existing file, but append the next > sequence. > > Regards, Hans > > > > On 11/4/09 9:43 AM, "jluis.lavin at unavarra.es" > wrote: > >> >> Hello all, >> >> I?m a newbie who is having terrible troubles trying to retrieve a list >> multiple sequences from the NCBI and write them to a single file in >> Fasta >> format. >> The code I?ve written seems to read mylist and retrive the sequences, >> but >> it kinda overwrites them so that I only get the last sequence on the >> list. >> I?ve been told to ask the people on this mailing list for help, since >> you >> may have come across this problem also or at last will know how to solve >> it... >> >> Here is my code, which basically consist on an STDIN for the list to be >> read into an array and a loop to read each sequence (stopping when the >> list ends) and retrieve a sequence each time the loop is launched, >> writting that sequence to a fasta file. I only get a sequence back >> although it seems to perform the retrieving process with each of the >> sequences of the list... >> >> >> #!/usr/bin/perl -w >> use strict; >> use Bio::DB::GenPept; >> use Bio::DB::GenBank; >> use Bio::SeqIO; >> print "Enter your list name:"; >> my $archivo=; >> chomp $archivo; >> die ("Can?t open input\n") unless (open(INFILE, $archivo)); >> my @lista = ; >> foreach my $seq (@lista) { >> if ($seq eq '') { >> die ("empty list") >> } >> else { >> my $db = new Bio::DB::GenPept("-format" => "Fasta"); >> my $seqobj = $db->get_Seq_by_acc($seq); >> my $out = new Bio::SeqIO (-file => ">extracted_seqs.fasta", >> -format => 'fasta'); >> $out->write_seq($seqobj); >> } >> } >> exit; >> >> >> An example list of sequences can be this one: >> >> YP_003107578.1 >> YP_003106103.1 >> YP_003106552.1 >> YP_003106560.1 >> YP_003107053.1 >> YP_003107450.1 >> YP_003108000.1 >> YP_003105023.1 >> YP_003105264.1 >> >> Thanks in advance for your help ;) > > -- Dr. Jos? Luis Lav?n Trueba Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN From scott at scottcain.net Wed Nov 4 08:26:02 2009 From: scott at scottcain.net (Scott Cain) Date: Wed, 4 Nov 2009 08:26:02 -0500 Subject: [Bioperl-l] Bio::Graphics and picture format In-Reply-To: <2ac05d0f0911040154h4eed4a1j8108f78e6e4761f3@mail.gmail.com> References: <2ac05d0f0911040154h4eed4a1j8108f78e6e4761f3@mail.gmail.com> Message-ID: <0FB17FBC-16BE-4A9F-AC75-983D3B4ECE7D@scottcain.net> Hi Emanuele, It is possible to use GD::SVG instead of GD to generate SVG graphics. To use it, you provide an argument of "-image_class GD::SVG" to the constructor of Bio::Graphics::Panel. See the perldoc of Bio::Graphics::Panel for more info. Scott On Nov 4, 2009, at 4:54 AM, Emanuele Osimo wrote: > Hello everyone, > do you know if it is possible to generate an image with > Bio::Graphics in a > vector format? Is there a list of available formats? > Thanks > Emanuele > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ----------------------------------------------------------------------- Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From b3sn7 at UNB.ca Tue Nov 3 12:30:24 2009 From: b3sn7 at UNB.ca (Sharma, Rachita) Date: Tue, 3 Nov 2009 13:30:24 -0400 Subject: [Bioperl-l] Trouble parsing PSI-BLAST Message-ID: <1257269424.4af068b045434@webmail.unb.ca> I am having trouble parsing PSI-BLAST results. Please help. The code is: my $in = new Bio::SearchIO( -format => 'blast', -file => "BS_XFpsiRblastoutputs/e${ev}/bloutput${i}.txt"); while( my $result = $in->next_result ) { while( my $hit = $result->next_hit ) { $sth->execute($result->query_name, $hit->name, $hit->significance); print "Query executed!\n"; } } The error is: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: no data for midline ***** No hits found ****** STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:359 STACK: Bio::SearchIO::blast::next_result /usr/lib/perl5/site_perl/5.8.8/Bio/SearchIO/blast.pm:1813 STACK: BSubVCpsiRblast.pl:92 ----------------------------------------------------------- ******************************* Rachita Sharma Research Assistant (PhD Student) University of New Brunswick, NB, CANADA email: Rachita.Sharma at unb.ca Phone no: 503-895-3619 ******************************* From cjfields at illinois.edu Wed Nov 4 08:53:35 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 4 Nov 2009 07:53:35 -0600 Subject: [Bioperl-l] version of ExtUtils::Manifest too strict? In-Reply-To: References: Message-ID: <1D9E943F-2EDC-49AB-83DE-78DED5A8AC23@illinois.edu> Alex, Not sure why ExtUtils::Manifest can't be bundled as a separate perl package alone. It is part of perl core but it's also available on CPAN separately from perl itself: http://search.cpan.org/~rkobes/ExtUtils-Manifest-1.57/lib/ExtUtils/Manifest.pm This is the commit message for that BTW. This allows spaces in file names for the MANIFEST. v1.52 is a bug fix and is required. http://code.open-bio.org/svnweb/index.cgi/bioperl/revision/?rev=15673 chris On Nov 4, 2009, at 1:30 AM, Alex Lancaster wrote: > Does the version of ExtUtils::Manifest really need to be strictly > greater than or equal to 1.52? > > Currently this blocks me updating the Fedora package of BioPerl to > 1.6.1, because the version of perl that Fedora ships is on 1.51 and > hence the build fails with: > > Checking prerequisites... > - ERROR: ExtUtils::Manifest (1.51_01) is installed, but we need > version >= 1.52 > > Full logs are here: > http://koji.fedoraproject.org/koji/taskinfo?taskID=1787483 > http://koji.fedoraproject.org/koji/getfile?taskID=1787483&name=build.log > > This is true even with the version of Perl in rawhide/F-12 etc. > (ExtUtils::Manifest is in the base perl package). > > If it really is necessary, I would like to be armed with a good > argument why this ca > why it needs to be updated, since the Perl package maintainer would > have > to update the entire Perl package simply to get a more recent > version of > one small subpackage. > > Regards, > Alex > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Nov 4 08:55:34 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 4 Nov 2009 07:55:34 -0600 Subject: [Bioperl-l] Trouble parsing PSI-BLAST In-Reply-To: <1257269424.4af068b045434@webmail.unb.ca> References: <1257269424.4af068b045434@webmail.unb.ca> Message-ID: <70E34111-4E70-463D-86EE-06926EA57073@illinois.edu> Rachita, Asked and answered yesterday. Please submit as a bug. chris On Nov 3, 2009, at 11:30 AM, Sharma, Rachita wrote: > > I am having trouble parsing PSI-BLAST results. Please help. > > The code is: > my $in = new Bio::SearchIO( -format => 'blast', > -file => "BS_XFpsiRblastoutputs/e${ev}/bloutput${i}.txt"); > > > while( my $result = $in->next_result ) { > while( my $hit = $result->next_hit ) { > > $sth->execute($result->query_name, $hit->name, $hit->significance); > print "Query executed!\n"; > > } > } > > The error is: > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: no data for midline ***** No hits found ****** > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/ > Root/Root.pm:359 > STACK: Bio::SearchIO::blast::next_result > /usr/lib/perl5/site_perl/5.8.8/Bio/SearchIO/blast.pm:1813 > STACK: BSubVCpsiRblast.pl:92 > ----------------------------------------------------------- > > > > > ******************************* > Rachita Sharma > Research Assistant (PhD Student) > University of New Brunswick, NB, CANADA > email: Rachita.Sharma at unb.ca > Phone no: 503-895-3619 > ******************************* > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Wed Nov 4 09:11:43 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 4 Nov 2009 15:11:43 +0100 Subject: [Bioperl-l] Trouble retrieving multiple sequences from NCBI in a single list query In-Reply-To: <1791.130.206.164.153.1257329680.squirrel@webmail.unavarra.es> References: <1386.130.206.164.153.1257324215.squirrel@webmail.unavarra.es> <628aabb70911040152r19ed79dfnbc54f346295d28a8@mail.gmail.com> <1791.130.206.164.153.1257329680.squirrel@webmail.unavarra.es> Message-ID: <628aabb70911040611q56b441c8o6888f326d0b314d@mail.gmail.com> Aw shucks, Jos?, glad I could be of help. There are plenty of people who answer questions around here, but my timezone sometimes gives me an advantage for the European ones. :) Dave From daniel.gaston at gmail.com Wed Nov 4 09:45:04 2009 From: daniel.gaston at gmail.com (Daniel Gaston) Date: Wed, 4 Nov 2009 10:45:04 -0400 Subject: [Bioperl-l] SwissProt and Subcellular localization information Message-ID: <50c615ba0911040645j1b28e727p5d7bf47a04db160b@mail.gmail.com> Hi Everyone, I have recently been playing around with SwissProt format flatfiles and want to extract sequences based on subcellular localization. I notice in going through the code for swiss.pm and swissdriver.pm that in both (more so in swissdriver.pm) there are several steps where organelle information based on the OG line could be extracted and added to data structure but isn't. It seems that in both cases the OG line is being added in to the generic lumping of data from the OC, OS, and OX lines in order to extract species names and taxonomy information but getting rid of everything else. Is there a particular reason for this or just a simple oversight? On the surface at least it looks like a relatively simple modification to make although I admit that I am not terribly adept at manipulating these SeqIO datastructures. Thanks for your time, Dan From daniel.gaston at gmail.com Wed Nov 4 12:12:10 2009 From: daniel.gaston at gmail.com (Daniel Gaston) Date: Wed, 4 Nov 2009 13:12:10 -0400 Subject: [Bioperl-l] SwissProt and Subcellular localization information Message-ID: <50c615ba0911040912pfd2483fwe44cd098beed73c7@mail.gmail.com> Sorry folks, it appears I was just being a bonehead and didn't look close enough into Bio:Annotations and Bio:Species objects that store all of this data. Dan On Wed, Nov 4, 2009 at 1:00 PM, wrote: > Send Bioperl-l mailing list submissions to > bioperl-l at lists.open-bio.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://lists.open-bio.org/mailman/listinfo/bioperl-l > or, via email, send a message with subject or body 'help' to > bioperl-l-request at lists.open-bio.org > > You can reach the person managing the list at > bioperl-l-owner at lists.open-bio.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Bioperl-l digest..." > > Today's Topics: > > 1. SwissProt and Subcellular localization information > (Daniel Gaston) > > > ---------- Forwarded message ---------- > From: Daniel Gaston > To: bioperl-l at lists.open-bio.org > Date: Wed, 4 Nov 2009 10:45:04 -0400 > Subject: [Bioperl-l] SwissProt and Subcellular localization information > Hi Everyone, > > I have recently been playing around with SwissProt format flatfiles and > want > to extract sequences based on subcellular localization. I notice in going > through the code for swiss.pm and swissdriver.pm that in both (more so in > swissdriver.pm) there are several steps where organelle information based > on > the OG line could be extracted and added to data structure but isn't. It > seems that in both cases the OG line is being added in to the generic > lumping of data from the OC, OS, and OX lines in order to extract species > names and taxonomy information but getting rid of everything else. Is there > a particular reason for this or just a simple oversight? On the surface at > least it looks like a relatively simple modification to make although I > admit that I am not terribly adept at manipulating these SeqIO > datastructures. > > Thanks for your time, > > Dan > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jluis.lavin at unavarra.es Thu Nov 5 10:28:23 2009 From: jluis.lavin at unavarra.es (jluis.lavin at unavarra.es) Date: Thu, 5 Nov 2009 16:28:23 +0100 (CET) Subject: [Bioperl-l] A question about iBio::Index: and its correct use Message-ID: <2120.130.206.164.153.1257434903.squirrel@webmail.unavarra.es> Hello to all, I?m trying to write a script to retrieve a list of sequences from a local FASTA file (for example a fasta archive where all the protein models of an organism are stored). This file would be used by me as some kind "local database" (sorry if I mistake a few concepts...) I?ve been reading the BioPerl HOWTOs and I came across the Bio::Index::Fasta tool. If I didn?t misunderstood what I read (which can be easy because my low level on programming) this Indexing tool should do the job. I wrote a couple of scripts based on the documentation i read about this tool, but I don?t seem to be able to create the index file to be used later (to retrieve the sequences from). -First of all, I want to ask the people in this forum if the Bio::Index::Fasta is the right one to chose for this tasks. -Then I?ll beg you to take a look at my scripts, because I don?t seem to catch the bug... Best wishes to you all and thanks in advance ;) -- Jos? Luis Lav?n Trueba, PhD Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN From maj at fortinbras.us Thu Nov 5 10:39:05 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 5 Nov 2009 10:39:05 -0500 Subject: [Bioperl-l] A question about iBio::Index: and its correct use In-Reply-To: <2120.130.206.164.153.1257434903.squirrel@webmail.unavarra.es> References: <2120.130.206.164.153.1257434903.squirrel@webmail.unavarra.es> Message-ID: Jos? -- It looks like this is a good solution to your problem. Please send you script so we can look at it- cheers Mark ----- Original Message ----- From: To: Sent: Thursday, November 05, 2009 10:28 AM Subject: [Bioperl-l] A question about iBio::Index: and its correct use Hello to all, I?m trying to write a script to retrieve a list of sequences from a local FASTA file (for example a fasta archive where all the protein models of an organism are stored). This file would be used by me as some kind "local database" (sorry if I mistake a few concepts...) I?ve been reading the BioPerl HOWTOs and I came across the Bio::Index::Fasta tool. If I didn?t misunderstood what I read (which can be easy because my low level on programming) this Indexing tool should do the job. I wrote a couple of scripts based on the documentation i read about this tool, but I don?t seem to be able to create the index file to be used later (to retrieve the sequences from). -First of all, I want to ask the people in this forum if the Bio::Index::Fasta is the right one to chose for this tasks. -Then I?ll beg you to take a look at my scripts, because I don?t seem to catch the bug... Best wishes to you all and thanks in advance ;) -- Jos? Luis Lav?n Trueba, PhD Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From jluis.lavin at unavarra.es Thu Nov 5 10:46:36 2009 From: jluis.lavin at unavarra.es (jluis.lavin at unavarra.es) Date: Thu, 5 Nov 2009 16:46:36 +0100 (CET) Subject: [Bioperl-l] [Fwd: Re: A question about iBio::Index: and its correct use] Message-ID: <2642.130.206.164.153.1257435996.squirrel@webmail.unavarra.es> ---------------------------- Mensaje original ---------------------------- Subject: Re: [Bioperl-l] A question about iBio::Index: and its correct use From: jluis.lavin at unavarra.es Fecha: Jue, 5 de Noviembre de 2009, 16:46 To: "Mark A. Jensen" -------------------------------------------------------------------------- Hi Mark, I?ve actually got two scripts, the first one is to create the index and the second one is to retrieve the sequence lis from the indexed file. 1)Here is the Index creation script: #!/c:/Perl -w use strict; use Bio::Index::Fasta; use strict; print "Enter file for indexing: \n"; my $Index_File_Name = ; my $inx = Bio::Index::Fasta->new(-filename => $Index_File_Name.".idx", -write_flag => 1); $inx->make_index(my $File_Name); 2)And here is the sequence retrieval script: #!/c:/Perl -w use Bio::Index::Fasta; use strict; #PC9.fasta is my genomic file my $Index_File_Name ="PC9.fasta"; my $inx = Bio::Index::Fasta->new($Index_File_Name); #LCS.txt is my sequences list @ARGV = ; foreach my $id (@ARGV) { if ($id eq ''){ die ("empty list") } else { my $seqobj = $inx->fetch($id); my $out = new Bio::SeqIO (-file => ">>extracted_seqs.fasta", -format => 'fasta'); $out->write_seq($seqobj); } } exit; } I hope this code is not a total scum... Thanks in advance ;) El Jue, 5 de Noviembre de 2009, 16:39, Mark A. Jensen escribi?: > Jos? -- It looks like this is a good solution to your problem. Please send > you > script so we can look at it- > cheers Mark > ----- Original Message ----- > From: > To: > Sent: Thursday, November 05, 2009 10:28 AM > Subject: [Bioperl-l] A question about iBio::Index: and its correct use > > > > Hello to all, > > I?m trying to write a script to retrieve a list of sequences from a local > FASTA file (for example a fasta archive where all the protein models of an > organism are stored). This file would be used by me as some kind "local > database" (sorry if I mistake a few concepts...) > I?ve been reading the BioPerl HOWTOs and I came across the > Bio::Index::Fasta tool. > If I didn?t misunderstood what I read (which can be easy because my low > level on programming) this Indexing tool should do the job. > I wrote a couple of scripts based on the documentation i read about this > tool, but I don?t seem to be able to create the index file to be used > later (to retrieve the sequences from). > -First of all, I want to ask the people in this forum if the > Bio::Index::Fasta is the right one to chose for this tasks. > -Then I?ll beg you to take a look at my scripts, because I don?t seem to > catch the bug... > > Best wishes to you all and thanks in advance ;) > > -- > Jos? Luis Lav?n Trueba, PhD > > Dpto. de Producci?n Agraria > Grupo de Gen?tica y Microbiolog?a > Universidad P?blica de Navarra > 31006 Pamplona > Navarra > SPAIN > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > -- Dr. Jos? Luis Lav?n Trueba Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN -- Dr. Jos? Luis Lav?n Trueba Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN From maj at fortinbras.us Thu Nov 5 10:37:53 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 5 Nov 2009 10:37:53 -0500 Subject: [Bioperl-l] Trouble retrieving multiple sequences from NCBI ina single list query In-Reply-To: <628aabb70911040611q56b441c8o6888f326d0b314d@mail.gmail.com> References: <1386.130.206.164.153.1257324215.squirrel@webmail.unavarra.es> <628aabb70911040152r19ed79dfnbc54f346295d28a8@mail.gmail.com> <1791.130.206.164.153.1257329680.squirrel@webmail.unavarra.es> <628aabb70911040611q56b441c8o6888f326d0b314d@mail.gmail.com> Message-ID: <49075FDFF6764EE48E932D95EB994221@NewLife> True, Dave, you compete only with crazed east coast core developers who're doing "just one more thing" at 2am.... ----- Original Message ----- From: "Dave Messina" To: Cc: Sent: Wednesday, November 04, 2009 9:11 AM Subject: Re: [Bioperl-l] Trouble retrieving multiple sequences from NCBI ina single list query > Aw shucks, Jos?, glad I could be of help. There are plenty of people who > answer questions around here, but my timezone sometimes gives me an > advantage for the European ones. :) > > > Dave > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hrh at fmi.ch Thu Nov 5 11:02:48 2009 From: hrh at fmi.ch (Hotz, Hans-Rudolf) Date: Thu, 05 Nov 2009 17:02:48 +0100 Subject: [Bioperl-l] A question about iBio::Index: and its correct use In-Reply-To: <2120.130.206.164.153.1257434903.squirrel@webmail.unavarra.es> Message-ID: Jluis > -Then I?ll beg you to take a look at my scripts, because I don?t seem to > catch the bug... you haven't attached/included any scripts, have you? Anyway, have you considered using BLAST indices (created with the additional flag "-o") together with the tool 'fastacmd' (which also included in the NCBI blast binaries) as a simple (and very fast) alternative for fetching sequences. Regards, Hans From maj at fortinbras.us Thu Nov 5 11:02:09 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 5 Nov 2009 11:02:09 -0500 Subject: [Bioperl-l] [Fwd: Re: A question about iBio::Index: and its correct use] In-Reply-To: <2642.130.206.164.153.1257435996.squirrel@webmail.unavarra.es> References: <2642.130.206.164.153.1257435996.squirrel@webmail.unavarra.es> Message-ID: <1984ED07F36C446284B25F617964B6C6@NewLife> Hey Jos?, The first thing that jumps out it the index file name. Looks like you create it as PC9.fasta.idx But you read it as PC9.fasta Not an unusual mistake. Do my $inx = Bio::Index::Fasta->new('PC9.fasta.idx'); and see if it works. MAJ ----- Original Message ----- From: To: Sent: Thursday, November 05, 2009 10:46 AM Subject: [Bioperl-l] [Fwd: Re: A question about iBio::Index: and its correct use] ---------------------------- Mensaje original ---------------------------- Subject: Re: [Bioperl-l] A question about iBio::Index: and its correct use From: jluis.lavin at unavarra.es Fecha: Jue, 5 de Noviembre de 2009, 16:46 To: "Mark A. Jensen" -------------------------------------------------------------------------- Hi Mark, I?ve actually got two scripts, the first one is to create the index and the second one is to retrieve the sequence lis from the indexed file. 1)Here is the Index creation script: #!/c:/Perl -w use strict; use Bio::Index::Fasta; use strict; print "Enter file for indexing: \n"; my $Index_File_Name = ; my $inx = Bio::Index::Fasta->new(-filename => $Index_File_Name.".idx", -write_flag => 1); $inx->make_index(my $File_Name); 2)And here is the sequence retrieval script: #!/c:/Perl -w use Bio::Index::Fasta; use strict; #PC9.fasta is my genomic file my $Index_File_Name ="PC9.fasta"; my $inx = Bio::Index::Fasta->new($Index_File_Name); #LCS.txt is my sequences list @ARGV = ; foreach my $id (@ARGV) { if ($id eq ''){ die ("empty list") } else { my $seqobj = $inx->fetch($id); my $out = new Bio::SeqIO (-file => ">>extracted_seqs.fasta", -format => 'fasta'); $out->write_seq($seqobj); } } exit; } I hope this code is not a total scum... Thanks in advance ;) El Jue, 5 de Noviembre de 2009, 16:39, Mark A. Jensen escribi?: > Jos? -- It looks like this is a good solution to your problem. Please send > you > script so we can look at it- > cheers Mark > ----- Original Message ----- > From: > To: > Sent: Thursday, November 05, 2009 10:28 AM > Subject: [Bioperl-l] A question about iBio::Index: and its correct use > > > > Hello to all, > > I?m trying to write a script to retrieve a list of sequences from a local > FASTA file (for example a fasta archive where all the protein models of an > organism are stored). This file would be used by me as some kind "local > database" (sorry if I mistake a few concepts...) > I?ve been reading the BioPerl HOWTOs and I came across the > Bio::Index::Fasta tool. > If I didn?t misunderstood what I read (which can be easy because my low > level on programming) this Indexing tool should do the job. > I wrote a couple of scripts based on the documentation i read about this > tool, but I don?t seem to be able to create the index file to be used > later (to retrieve the sequences from). > -First of all, I want to ask the people in this forum if the > Bio::Index::Fasta is the right one to chose for this tasks. > -Then I?ll beg you to take a look at my scripts, because I don?t seem to > catch the bug... > > Best wishes to you all and thanks in advance ;) > > -- > Jos? Luis Lav?n Trueba, PhD > > Dpto. de Producci?n Agraria > Grupo de Gen?tica y Microbiolog?a > Universidad P?blica de Navarra > 31006 Pamplona > Navarra > SPAIN > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > -- Dr. Jos? Luis Lav?n Trueba Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN -- Dr. Jos? Luis Lav?n Trueba Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From jluis.lavin at unavarra.es Thu Nov 5 11:21:57 2009 From: jluis.lavin at unavarra.es (jluis.lavin at unavarra.es) Date: Thu, 5 Nov 2009 17:21:57 +0100 (CET) Subject: [Bioperl-l] [Fwd: Re: A question about iBio::Index: and its correct use] In-Reply-To: <1984ED07F36C446284B25F617964B6C6@NewLife> References: <2642.130.206.164.153.1257435996.squirrel@webmail.unavarra.es> <1984ED07F36C446284B25F617964B6C6@NewLife> Message-ID: <2969.130.206.164.153.1257438117.squirrel@webmail.unavarra.es> Thank you very much Mark, that?s a good point :$ I guess your correction is referred to the second script, isn?t it? If it is so, there is still a problem with the first script, it doesn?t create the PC9.fasta.idx file, instead it creates two files named: -PC9.fasta.idx.pag -PC9.fasta.idx.dir which seem to be clearly related with some kind of indexing process...but, unless the PC9.fasta.idx file is only virtual or remains hidden, I can?t find it anywhere... Forgive me if I?m talking nosense... Thank you very much again for your help ;) El Jue, 5 de Noviembre de 2009, 17:02, Mark A. Jensen escribi?: > Hey Jos?, > The first thing that jumps out it the index file name. Looks > like you create it as > PC9.fasta.idx > But you read it as > PC9.fasta > Not an unusual mistake. Do > my $inx = Bio::Index::Fasta->new('PC9.fasta.idx'); > and see if it works. > MAJ > ----- Original Message ----- > From: > To: > Sent: Thursday, November 05, 2009 10:46 AM > Subject: [Bioperl-l] [Fwd: Re: A question about iBio::Index: and its > correct > use] > > > > > ---------------------------- Mensaje original ---------------------------- > Subject: Re: [Bioperl-l] A question about iBio::Index: and its correct use > From: jluis.lavin at unavarra.es > Fecha: Jue, 5 de Noviembre de 2009, 16:46 > To: "Mark A. Jensen" > -------------------------------------------------------------------------- > > Hi Mark, > > I?ve actually got two scripts, the first one is to create the index and > the second one is to retrieve the sequence lis from the indexed file. > > 1)Here is the Index creation script: > > #!/c:/Perl -w > use strict; > use Bio::Index::Fasta; > use strict; > > print "Enter file for indexing: \n"; > my $Index_File_Name = ; > my $inx = Bio::Index::Fasta->new(-filename => $Index_File_Name.".idx", > -write_flag => 1); > $inx->make_index(my $File_Name); > > 2)And here is the sequence retrieval script: > > #!/c:/Perl -w > use Bio::Index::Fasta; > use strict; > #PC9.fasta is my genomic file > my $Index_File_Name ="PC9.fasta"; > my $inx = Bio::Index::Fasta->new($Index_File_Name); > #LCS.txt is my sequences list > @ARGV = ; > foreach my $id (@ARGV) { > if ($id eq ''){ > die ("empty list") > } > else { > my $seqobj = $inx->fetch($id); > my $out = new Bio::SeqIO (-file => ">>extracted_seqs.fasta", > -format => 'fasta'); > $out->write_seq($seqobj); > } > } > exit; > } > > I hope this code is not a total scum... > > Thanks in advance ;) > > > > El Jue, 5 de Noviembre de 2009, 16:39, Mark A. Jensen escribi?: >> Jos? -- It looks like this is a good solution to your problem. Please >> send >> you >> script so we can look at it- >> cheers Mark >> ----- Original Message ----- >> From: >> To: >> Sent: Thursday, November 05, 2009 10:28 AM >> Subject: [Bioperl-l] A question about iBio::Index: and its correct use >> >> >> >> Hello to all, >> >> I?m trying to write a script to retrieve a list of sequences from a >> local >> FASTA file (for example a fasta archive where all the protein models of >> an >> organism are stored). This file would be used by me as some kind "local >> database" (sorry if I mistake a few concepts...) >> I?ve been reading the BioPerl HOWTOs and I came across the >> Bio::Index::Fasta tool. >> If I didn?t misunderstood what I read (which can be easy because my low >> level on programming) this Indexing tool should do the job. >> I wrote a couple of scripts based on the documentation i read about this >> tool, but I don?t seem to be able to create the index file to be used >> later (to retrieve the sequences from). >> -First of all, I want to ask the people in this forum if the >> Bio::Index::Fasta is the right one to chose for this tasks. >> -Then I?ll beg you to take a look at my scripts, because I don?t seem to >> catch the bug... >> >> Best wishes to you all and thanks in advance ;) >> >> -- >> Jos? Luis Lav?n Trueba, PhD >> >> Dpto. de Producci?n Agraria >> Grupo de Gen?tica y Microbiolog?a >> Universidad P?blica de Navarra >> 31006 Pamplona >> Navarra >> SPAIN >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > > > -- > Dr. Jos? Luis Lav?n Trueba > > Dpto. de Producci?n Agraria > Grupo de Gen?tica y Microbiolog?a > Universidad P?blica de Navarra > 31006 Pamplona > Navarra > SPAIN > > > > -- > Dr. Jos? Luis Lav?n Trueba > > Dpto. de Producci?n Agraria > Grupo de Gen?tica y Microbiolog?a > Universidad P?blica de Navarra > 31006 Pamplona > Navarra > SPAIN > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > -- Dr. Jos? Luis Lav?n Trueba Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN From maj at fortinbras.us Thu Nov 5 11:39:09 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 5 Nov 2009 11:39:09 -0500 Subject: [Bioperl-l] [Fwd: Re: A question about iBio::Index: and its correct use] In-Reply-To: <2969.130.206.164.153.1257438117.squirrel@webmail.unavarra.es> References: <2642.130.206.164.153.1257435996.squirrel@webmail.unavarra.es> <1984ED07F36C446284B25F617964B6C6@NewLife> <2969.130.206.164.153.1257438117.squirrel@webmail.unavarra.es> Message-ID: Yes, these are files created by the SDBM, Perl's internal db manager. You should be able to open the index by simply $inx = Bio::Index::Fasta->new('PC9.fasta.idx'); and the dbm will know what to do-- cheers MAJ ----- Original Message ----- From: To: "Mark A. Jensen" Cc: ; Sent: Thursday, November 05, 2009 11:21 AM Subject: Re: [Bioperl-l] [Fwd: Re: A question about iBio::Index: and its correct use] > Thank you very much Mark, that?s a good point :$ > I guess your correction is referred to the second script, isn?t it? > > If it is so, there is still a problem with the first script, it doesn?t > create the PC9.fasta.idx file, instead it creates two files named: > -PC9.fasta.idx.pag > -PC9.fasta.idx.dir > > which seem to be clearly related with some kind of indexing process...but, > unless the PC9.fasta.idx file is only virtual or remains hidden, I can?t > find it anywhere... > Forgive me if I?m talking nosense... > > Thank you very much again for your help ;) > > > El Jue, 5 de Noviembre de 2009, 17:02, Mark A. Jensen escribi?: >> Hey Jos?, >> The first thing that jumps out it the index file name. Looks >> like you create it as >> PC9.fasta.idx >> But you read it as >> PC9.fasta >> Not an unusual mistake. Do >> my $inx = Bio::Index::Fasta->new('PC9.fasta.idx'); >> and see if it works. >> MAJ >> ----- Original Message ----- >> From: >> To: >> Sent: Thursday, November 05, 2009 10:46 AM >> Subject: [Bioperl-l] [Fwd: Re: A question about iBio::Index: and its >> correct >> use] >> >> >> >> >> ---------------------------- Mensaje original ---------------------------- >> Subject: Re: [Bioperl-l] A question about iBio::Index: and its correct use >> From: jluis.lavin at unavarra.es >> Fecha: Jue, 5 de Noviembre de 2009, 16:46 >> To: "Mark A. Jensen" >> -------------------------------------------------------------------------- >> >> Hi Mark, >> >> I?ve actually got two scripts, the first one is to create the index and >> the second one is to retrieve the sequence lis from the indexed file. >> >> 1)Here is the Index creation script: >> >> #!/c:/Perl -w >> use strict; >> use Bio::Index::Fasta; >> use strict; >> >> print "Enter file for indexing: \n"; >> my $Index_File_Name = ; >> my $inx = Bio::Index::Fasta->new(-filename => $Index_File_Name.".idx", >> -write_flag => 1); >> $inx->make_index(my $File_Name); >> >> 2)And here is the sequence retrieval script: >> >> #!/c:/Perl -w >> use Bio::Index::Fasta; >> use strict; >> #PC9.fasta is my genomic file >> my $Index_File_Name ="PC9.fasta"; >> my $inx = Bio::Index::Fasta->new($Index_File_Name); >> #LCS.txt is my sequences list >> @ARGV = ; >> foreach my $id (@ARGV) { >> if ($id eq ''){ >> die ("empty list") >> } >> else { >> my $seqobj = $inx->fetch($id); >> my $out = new Bio::SeqIO (-file => ">>extracted_seqs.fasta", >> -format => 'fasta'); >> $out->write_seq($seqobj); >> } >> } >> exit; >> } >> >> I hope this code is not a total scum... >> >> Thanks in advance ;) >> >> >> >> El Jue, 5 de Noviembre de 2009, 16:39, Mark A. Jensen escribi?: >>> Jos? -- It looks like this is a good solution to your problem. Please >>> send >>> you >>> script so we can look at it- >>> cheers Mark >>> ----- Original Message ----- >>> From: >>> To: >>> Sent: Thursday, November 05, 2009 10:28 AM >>> Subject: [Bioperl-l] A question about iBio::Index: and its correct use >>> >>> >>> >>> Hello to all, >>> >>> I?m trying to write a script to retrieve a list of sequences from a >>> local >>> FASTA file (for example a fasta archive where all the protein models of >>> an >>> organism are stored). This file would be used by me as some kind "local >>> database" (sorry if I mistake a few concepts...) >>> I?ve been reading the BioPerl HOWTOs and I came across the >>> Bio::Index::Fasta tool. >>> If I didn?t misunderstood what I read (which can be easy because my low >>> level on programming) this Indexing tool should do the job. >>> I wrote a couple of scripts based on the documentation i read about this >>> tool, but I don?t seem to be able to create the index file to be used >>> later (to retrieve the sequences from). >>> -First of all, I want to ask the people in this forum if the >>> Bio::Index::Fasta is the right one to chose for this tasks. >>> -Then I?ll beg you to take a look at my scripts, because I don?t seem to >>> catch the bug... >>> >>> Best wishes to you all and thanks in advance ;) >>> >>> -- >>> Jos? Luis Lav?n Trueba, PhD >>> >>> Dpto. de Producci?n Agraria >>> Grupo de Gen?tica y Microbiolog?a >>> Universidad P?blica de Navarra >>> 31006 Pamplona >>> Navarra >>> SPAIN >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >> >> >> -- >> Dr. Jos? Luis Lav?n Trueba >> >> Dpto. de Producci?n Agraria >> Grupo de Gen?tica y Microbiolog?a >> Universidad P?blica de Navarra >> 31006 Pamplona >> Navarra >> SPAIN >> >> >> >> -- >> Dr. Jos? Luis Lav?n Trueba >> >> Dpto. de Producci?n Agraria >> Grupo de Gen?tica y Microbiolog?a >> Universidad P?blica de Navarra >> 31006 Pamplona >> Navarra >> SPAIN >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > > > -- > Dr. Jos? Luis Lav?n Trueba > > Dpto. de Producci?n Agraria > Grupo de Gen?tica y Microbiolog?a > Universidad P?blica de Navarra > 31006 Pamplona > Navarra > SPAIN > > > > From jluis.lavin at unavarra.es Thu Nov 5 12:48:12 2009 From: jluis.lavin at unavarra.es (jluis.lavin at unavarra.es) Date: Thu, 5 Nov 2009 18:48:12 +0100 (CET) Subject: [Bioperl-l] A question about iBio::Index: and its correct use In-Reply-To: References: <2120.130.206.164.153.1257434903.squirrel@webmail.unavarra.es> Message-ID: <3313.130.206.164.153.1257443292.squirrel@webmail.unavarra.es> Thanks a lot for your help Hans, It's a little bit to hard to understand and turn into script this awesome information you've just given me...I hope I can use it in a near future anyway ;) The issue here is that the sequences I,m indexing are not generated by the NCBI nor stored there...although I belive you?re just refering to the tool itself and not to a retrieval from the NCBI. Thanks again you?re all great giving advice to newbies like me ;) Best wishes to you all El Jue, 5 de Noviembre de 2009, 17:02, Hotz, Hans-Rudolf escribi?: > > > > Jluis > >> -Then I?ll beg you to take a look at my scripts, because I don?t seem to >> catch the bug... > > you haven't attached/included any scripts, have you? > > > Anyway, have you considered using BLAST indices (created with the > additional > flag "-o") together with the tool 'fastacmd' (which also included in the > NCBI blast binaries) as a simple (and very fast) alternative for fetching > sequences. > > > Regards, Hans > > > -- Dr. Jos? Luis Lav?n Trueba Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN From florent.angly at gmail.com Thu Nov 5 13:00:19 2009 From: florent.angly at gmail.com (Florent Angly) Date: Thu, 05 Nov 2009 10:00:19 -0800 Subject: [Bioperl-l] A question about iBio::Index: and its correct use In-Reply-To: <3313.130.206.164.153.1257443292.squirrel@webmail.unavarra.es> References: <2120.130.206.164.153.1257434903.squirrel@webmail.unavarra.es> <3313.130.206.164.153.1257443292.squirrel@webmail.unavarra.es> Message-ID: <4AF312B3.9060009@gmail.com> Hans-Rudolf was talking about a way to retrieve sequences from a BLAST database. If you use BLAST locally, then your database is local too. More info here: http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/formatdb_fastacmd.html Florent jluis.lavin at unavarra.es wrote: > Thanks a lot for your help Hans, > It's a little bit to hard to understand and turn into script this awesome > information you've just given me...I hope I can use it in a near future > anyway ;) > The issue here is that the sequences I,m indexing are not generated by the > NCBI nor stored there...although I belive you?re just refering to the tool > itself and not to a retrieval from the NCBI. > > Thanks again you?re all great giving advice to newbies like me ;) > > Best wishes to you all > > > El Jue, 5 de Noviembre de 2009, 17:02, Hotz, Hans-Rudolf escribi?: > >> >> Jluis >> >> >>> -Then I?ll beg you to take a look at my scripts, because I don?t seem to >>> catch the bug... >>> >> you haven't attached/included any scripts, have you? >> >> >> Anyway, have you considered using BLAST indices (created with the >> additional >> flag "-o") together with the tool 'fastacmd' (which also included in the >> NCBI blast binaries) as a simple (and very fast) alternative for fetching >> sequences. >> >> >> Regards, Hans >> >> >> >> > > > From valiente at lsi.upc.edu Fri Nov 6 03:06:48 2009 From: valiente at lsi.upc.edu (valiente at lsi.upc.edu) Date: Fri, 6 Nov 2009 09:06:48 +0100 (CET) Subject: [Bioperl-l] Bio::SeqIO::genbank.pm Message-ID: <45737.147.83.59.225.1257494808.squirrel@webmail.lsi.upc.edu> There is a line in Bio::SeqIO::genbank.pm to convert data in classification lines into a classification array by splitting only on ';' or '.' so that a classification that is 2 or more words will still get matched,my @class = map { s/^\s+//; s/\s+$//; s/\s{2,}/ /g; $_; } split /(? References: <2120.130.206.164.153.1257434903.squirrel@webmail.unavarra.es> < C718B5B8.5561%hrh@fmi.ch> <3313.130.206.164.153.1257443292.squirrel@webmail.unavarra.es> <4AF312B3.9060009@gmail.com> Message-ID: <1222.130.206.164.153.1257497085.squirrel@webmail.unavarra.es> Thank you for the info Florent! I?ll try to read al the information on the link you provided and try to figure out how to make it work and if it is worthy for me, I mean, I work with several sequence files that come from multiple databases (JGI, BROAD, Genolevures or NCBI). Protein IDs from each of those databases is different from NCBI. Maybe it could be easier to write a script that allows me to enter a fasta file with all the protein models of a single organism, parse it and then extract the sequences of a given list (using the "ID style" of the particular database) than creating a BLAST index for each organism I need to work with...Did I explain the issue correctly? Anyway, since I don?t know anything about this tool Hans and you provided me, I can easily be wrong... Thank you for showing me the local BLAST Index tool, I?ll read the documentation carefully and study all its possibilities. Best wishes JL El Jue, 5 de Noviembre de 2009, 19:00, Florent Angly escribi?: > Hans-Rudolf was talking about a way to retrieve sequences from a BLAST > database. If you use BLAST locally, then your database is local too. > More info here: > http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/formatdb_fastacmd.html > Florent > > > jluis.lavin at unavarra.es wrote: >> Thanks a lot for your help Hans, >> It's a little bit to hard to understand and turn into script this >> awesome >> information you've just given me...I hope I can use it in a near future >> anyway ;) >> The issue here is that the sequences I,m indexing are not generated by >> the >> NCBI nor stored there...although I belive you?re just refering to the >> tool >> itself and not to a retrieval from the NCBI. >> >> Thanks again you?re all great giving advice to newbies like me ;) >> >> Best wishes to you all >> >> >> El Jue, 5 de Noviembre de 2009, 17:02, Hotz, Hans-Rudolf escribi?: >> >>> >>> Jluis >>> >>> >>>> -Then I?ll beg you to take a look at my scripts, because I don?t seem >>>> to >>>> catch the bug... >>>> >>> you haven't attached/included any scripts, have you? >>> >>> >>> Anyway, have you considered using BLAST indices (created with the >>> additional >>> flag "-o") together with the tool 'fastacmd' (which also included in >>> the >>> NCBI blast binaries) as a simple (and very fast) alternative for >>> fetching >>> sequences. >>> >>> >>> Regards, Hans >>> >>> >>> >>> >> >> >> > > -- Dr. Jos? Luis Lav?n Trueba Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN From maj at fortinbras.us Fri Nov 6 07:45:01 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 6 Nov 2009 07:45:01 -0500 Subject: [Bioperl-l] Bioperl In-Reply-To: <16842715.26316.1257510446095.JavaMail.root@durga.amrita.ac.in> References: <16842715.26316.1257510446095.JavaMail.root@durga.amrita.ac.in> Message-ID: Hi Resmi- You should look at http://bioperl.org/ under "Installation" for information on getting and installing BioPerl. An introduction to working with trees in BioPerl is at this link: http://www.bioperl.org/wiki/HOWTO:Trees cheers, Mark ----- Original Message ----- From: Resmi S. To: maj at fortinbras.us Sent: Friday, November 06, 2009 7:27 AM Subject: Bioperl Respected Sir, I am Resmi S studying II MSc Bioinformatics.Now am doing my project in Phylogenetic Tree Construction using BioPerl.I am not much familiar on BioPerl modules.So could please send me the names of the Bioperl modules needed for my project.I also need to know , from where i will get these modules.If that is from CPAN,then send me the location or link.I kindly request you to send me the details soon. Yours Sincerely, Resmi S, II MSc Bioinformatics, School of Biotechnology, Amrita Vishwa Vidyapeetham, Email : amm08bi019 at students.amrita.ac.in ------------------------------------------------------------------------------ ------------------------------------------------------------------- This mail has been scanned by Amrita GAV Server, Amrita Vishwa Vidyapeetham, Amritapuri Campus From robert.bradbury at gmail.com Fri Nov 6 12:35:22 2009 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Fri, 6 Nov 2009 12:35:22 -0500 Subject: [Bioperl-l] Function that determines serious mutations Message-ID: Is there a function in the library (or has someone written one) that can take a genbank entry and determine which mutations are harmful? It would be used to produce a table summary of: GENE # SNP # BadSNP One kind of gets this from NCBI if you lookup in the "GENE" db a gene name and then go to the "GeneView" om dbSNP page it has the information I want but largely in a graphical format while I simply want numbers I can dump into a spreadsheet. I don't think it would be hard, fetch the gene, run through the features for the SNP database, figure out whether they are good or bad SNPs, accumulate the statistics and dump it. I think the functions available are flexible enough to do it but I can't believe nobody has already done it. It could be a bit more complex in that one could do an analysis to see if the mutations are in a conserved domain or mutations that code for Cysteine or Methionine (or othe potentially "critical" amino acids) but since "critical" is in the eye of the beholder there would have to be some kind of callback to a scoring function. Thanks, Robert From nevoband at igb.uiuc.edu Fri Nov 6 15:58:05 2009 From: nevoband at igb.uiuc.edu (kleenix) Date: Fri, 6 Nov 2009 12:58:05 -0800 (PST) Subject: [Bioperl-l] StandAloneBlast Unallowed parameter Message-ID: <26230896.post@talk.nabble.com> I'm not sure if i'm doing this wrong. I am trying to use the -m parameter in blastall using the StandAloneBlast bioperl class. when i add 'm'=>0 to @params i get Unallowed parameter: error. Am I adding the parameter wrong? i'm using StandAloneBlast version 1.51 Thanks -Nevo -- View this message in context: http://old.nabble.com/StandAloneBlast-Unallowed-parameter-tp26230896p26230896.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From veronica.xiaoyu at gmail.com Fri Nov 6 17:25:04 2009 From: veronica.xiaoyu at gmail.com (Xiaoyu Liang) Date: Fri, 6 Nov 2009 17:25:04 -0500 Subject: [Bioperl-l] Parsing BLAST out file to HTML. How to change the description's name of each hit? Message-ID: Hi, I'm using Bio::SearchIO::Writer HTMLResultWriter help me parse BLAST out file into HTML. Anybody knows how to parse and change the description name of each hit? By using hit->description can call hits' description, but it is not allowed to be modified. Thank you very much, Xiaoyu From maj at fortinbras.us Fri Nov 6 19:40:17 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 6 Nov 2009 19:40:17 -0500 Subject: [Bioperl-l] Parsing BLAST out file to HTML. How to change thedescription's name of each hit? In-Reply-To: References: Message-ID: <11592B31D9924FA7A8638D90AE4A3F4A@NewLife> Xiaoyu- That method should work to change the description; are you doing $hit->description('This is my new description'); This method returns the old description when you change the value: $hit->description('old'); $str = $hit->description('new'); # $str eq 'old' $str = $hit->description; # $str eq 'new' MAJ ----- Original Message ----- From: "Xiaoyu Liang" To: Sent: Friday, November 06, 2009 5:25 PM Subject: [Bioperl-l] Parsing BLAST out file to HTML. How to change thedescription's name of each hit? > Hi, > > I'm using Bio::SearchIO::Writer HTMLResultWriter help me parse BLAST out > file into HTML. > > Anybody knows how to parse and change the description name of each hit? > > By using hit->description can call hits' description, but it is not allowed > to be modified. > > Thank you very much, > Xiaoyu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From Daniel.Lang at biologie.uni-freiburg.de Sun Nov 8 09:50:48 2009 From: Daniel.Lang at biologie.uni-freiburg.de (Daniel Lang) Date: Sun, 08 Nov 2009 15:50:48 +0100 Subject: [Bioperl-l] arguments to call back functions in GBrowse2 Message-ID: <4AF6DAC8.8070204@biologie.uni-freiburg.de> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi Lincoln, a while back (May 29, 2009; 09:08pm) you replied to an even older thread ("Re: Access the parent of a Bio::DB::SeqFeature within a gbrowse config callback function"). I missed your reply and did follow it up back then, sorry! I'm currently facing the same issue again with gbrowse2. I have a callback function for "balloon click". Following your last reply I expected 5 arguments, but I am getting only three: $feature,$panel,$track. In principle, I am using the latest releases/checkouts... Which modules do I need to look at/update for this functionality? Furthermore, is there a possibility to share global variables between gbrowse2 and slaves? Should this work via init_code? Should modules initialized in a conf be in the scope of a slave? If not can I introduce modules via the slave config files, or do I need to alter the slave scripts? Thanks, again! Cheers, Daniel PS: gbrowse2 rocks! -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkr22sUACgkQmJnbCpJAG3A2MgCdG61bNRGMFVWExagzMFejKMjO FiUAn16nQNemDGSy8nJBS5dUHQMnDgrP =ODxn -----END PGP SIGNATURE----- From maj at fortinbras.us Sun Nov 8 11:09:43 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sun, 8 Nov 2009 11:09:43 -0500 Subject: [Bioperl-l] GuessSeqFormat: fastq? Message-ID: Hi All- Any plans in the works for a _possibly_fastq sequence guesser? MAJ From maj at fortinbras.us Sun Nov 8 11:20:55 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sun, 8 Nov 2009 11:20:55 -0500 Subject: [Bioperl-l] GuessSeqFormat: fastq? In-Reply-To: References: Message-ID: Never mind; got it covered-- MAJ ----- Original Message ----- From: "Mark A. Jensen" To: "bioperl-l" Sent: Sunday, November 08, 2009 11:09 AM Subject: [Bioperl-l] GuessSeqFormat: fastq? > Hi All- > Any plans in the works for a _possibly_fastq sequence guesser? > MAJ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From saikari78 at gmail.com Mon Nov 9 10:47:10 2009 From: saikari78 at gmail.com (saikari keitele) Date: Mon, 9 Nov 2009 15:47:10 +0000 Subject: [Bioperl-l] Retrieving link to protein from PubChem Message-ID: Hi, I'm using Bioperl to retrieve records from PubChem. I'm trying to find a way-but have been unsuccessful- to retrieve from a compound record, the reference to the protein(s) that can synthesize the compound. Thanks very much. saikari From saikari78 at gmail.com Mon Nov 9 11:05:57 2009 From: saikari78 at gmail.com (saikari keitele) Date: Mon, 9 Nov 2009 16:05:57 +0000 Subject: [Bioperl-l] [bioperl newbie] Retrieving link to protein from PubChem Message-ID: Hi, I'm using Bioperl to retrieve records from PubChem. I'm trying to find a way-but have been unsuccessful- to retrieve from a compound record, the reference to the protein(s) that can synthesize the compound. Thanks very much. saikari From cjfields at illinois.edu Mon Nov 9 11:27:10 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 9 Nov 2009 10:27:10 -0600 Subject: [Bioperl-l] [bioperl newbie] Retrieving link to protein from PubChem In-Reply-To: References: Message-ID: <1ECC543A-F923-4D5E-A0C1-5BBD35ECAAE8@illinois.edu> On Nov 9, 2009, at 10:05 AM, saikari keitele wrote: > Hi, > > I'm using Bioperl to retrieve records from PubChem. > I'm trying to find a way-but have been unsuccessful- to retrieve > from a > compound record, the reference to the protein(s) that can synthesize > the > compound. > Thanks very much. > > saikari The below bioperl script returns the GI for proteins that correspond to the substance passed on the command line; invoke using 'perl pc_substance.pl substance_requested'. It probably needs more fiddling to catch everything but it should get you started. For other bits and pieces (such as how to retrieve the raw sequence files), please see the EUtilities HOWTO: http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook chris ---------------------------------------- #!/usr/bin/perl -w use 5.010; use strict; use warnings; use Bio::DB::EUtilities; my $substance = shift; my $eutil = Bio::DB::EUtilities->new(-eutil => 'esearch', -db => 'pcsubstance', -term => $substance, -usehistory => 'y'); my $hist = $eutil->next_History || die; $eutil->reset_parameters(-eutil => 'elink', -history => $hist, -db => 'protein', -dbfrom => 'pcsubstance', -retmax => 1000); say join(',',$eutil->get_ids); From saikari78 at gmail.com Mon Nov 9 11:41:20 2009 From: saikari78 at gmail.com (saikari keitele) Date: Mon, 9 Nov 2009 16:41:20 +0000 Subject: [Bioperl-l] [bioperl newbie] Retrieving link to protein from PubChem In-Reply-To: <1ECC543A-F923-4D5E-A0C1-5BBD35ECAAE8@illinois.edu> References: <1ECC543A-F923-4D5E-A0C1-5BBD35ECAAE8@illinois.edu> Message-ID: Fabulous!. Huge help. saikari On Mon, Nov 9, 2009 at 4:27 PM, Chris Fields wrote: > On Nov 9, 2009, at 10:05 AM, saikari keitele wrote: > > Hi, >> >> I'm using Bioperl to retrieve records from PubChem. >> I'm trying to find a way-but have been unsuccessful- to retrieve from a >> compound record, the reference to the protein(s) that can synthesize the >> compound. >> Thanks very much. >> >> saikari >> > > The below bioperl script returns the GI for proteins that correspond to the > substance passed on the command line; invoke using 'perl pc_substance.plsubstance_requested'. It probably needs more fiddling to catch everything > but it should get you started. > > For other bits and pieces (such as how to retrieve the raw sequence files), > please see the EUtilities HOWTO: > > http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook > > chris > > ---------------------------------------- > > #!/usr/bin/perl -w > > use 5.010; > use strict; > use warnings; > use Bio::DB::EUtilities; > > my $substance = shift; > > my $eutil = Bio::DB::EUtilities->new(-eutil => 'esearch', > -db => 'pcsubstance', > -term => $substance, > -usehistory => 'y'); > > my $hist = $eutil->next_History || die; > > $eutil->reset_parameters(-eutil => 'elink', > -history => $hist, > -db => 'protein', > -dbfrom => 'pcsubstance', > -retmax => 1000); > > say join(',',$eutil->get_ids); > From gc11song at gmail.com Mon Nov 9 13:08:48 2009 From: gc11song at gmail.com (Guangchun Song) Date: Mon, 9 Nov 2009 12:08:48 -0600 Subject: [Bioperl-l] how to get the protein sequences from DNA sequences around novel SNPs? Message-ID: <794eafc20911091008g1f98b944ncbd66ac4962a85a3@mail.gmail.com> Hello, I'm new bioperl user. I' working on a project: To determine the status of all tutative SNPs such as non-synonymous vs. synonymous, and predict the tranlational effect of non-synonymous mutations as benign or malicious. I'm trying to use bioperl to get the DNA sequence and translate to protein sequence for the SNPs that are in gene's coding region. Could someone tell me how to do it? Thanks, -Guangchun Song From robert.bradbury at gmail.com Mon Nov 9 16:15:33 2009 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Mon, 9 Nov 2009 16:15:33 -0500 Subject: [Bioperl-l] how to get the protein sequences from DNA sequences around novel SNPs? In-Reply-To: <794eafc20911091008g1f98b944ncbd66ac4962a85a3@mail.gmail.com> References: <794eafc20911091008g1f98b944ncbd66ac4962a85a3@mail.gmail.com> Message-ID: On Mon, Nov 9, 2009 at 1:08 PM, Guangchun Song wrote: > > I'm new bioperl user. I' working on a project: To determine the > status of all tutative SNPs such as non-synonymous vs. synonymous, and > predict the tranlational effect of non-synonymous mutations as benign > or malicious. I'm trying to use bioperl to get the DNA sequence and > translate to protein sequence for the SNPs that are in gene's coding > region. Could someone tell me how to do it? > > I too would like to know if this information is available. I've recently been working with the dbSNP results from NCBI but they display the results in a graphical format rather than data that one can play with and ask questions of like "What is the most disease causing gene in the Human Genome?" or "What are the critical proteins damaged by gene defects in the Human Genome?" ... "In terms of premature deaths, extended health care requirements, loss of quality of life, etc.?" The same types of questions can be applied to the dog and cat genomes where there is emotional value or the cow, horse, pig, etc. genomes where there is economic value? The value of BioPerl would increase significantly if there were functionality that would allow easy access to "these mutations may have negative/positive impact" (which means you need a function that qualifies mutations by degree) and allow for impact to be subjectively determined (implying there must be some callback function to provide a user quality/impact rating). For example: $/@differences = protein_compare($mygene, $refseq_gene, @critical_aa, @critical_domain, $callback) Where $callback could "rate" differences about the protein and position and the "type of interest" (e.g. metal binding amino acids, structural changing amino acids, critical catalysis amino acids, etc.). A default callback would be based on some evolving definition of "critical" changes which result in human disease for example. This is a "required" capability to be able to determine things like the "adaptability" of a species -- those with fewest critical mutation points may have better adaptability to mutation increasing circumstances. Please pardon any errors in perl syntax/usage its been a while since I've written perl and I'd really rather be coding in C. Robert From maj at fortinbras.us Mon Nov 9 16:56:24 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 9 Nov 2009 16:56:24 -0500 Subject: [Bioperl-l] how to get the protein sequences from DNA sequencesaround novel SNPs? In-Reply-To: References: <794eafc20911091008g1f98b944ncbd66ac4962a85a3@mail.gmail.com> Message-ID: <3ED3D387B5DE4248A218D42882369925@NewLife> I agree that BioPerl would significantly increase in value with such a module; in fact, the BioTeam would probably buy us out. My opinion is that the entire GWAS enterprise is the search for such a callback function, for humans anyway. For those engaged in this quest, if BioPerl doesn't provide a Maserati, it at least provides good italian-made (among others) parts. MAJ ----- Original Message ----- From: "Robert Bradbury" To: "Guangchun Song" Cc: Sent: Monday, November 09, 2009 4:15 PM Subject: Re: [Bioperl-l] how to get the protein sequences from DNA sequencesaround novel SNPs? > On Mon, Nov 9, 2009 at 1:08 PM, Guangchun Song wrote: >> >> I'm new bioperl user. I' working on a project: To determine the >> status of all tutative SNPs such as non-synonymous vs. synonymous, and >> predict the tranlational effect of non-synonymous mutations as benign >> or malicious. I'm trying to use bioperl to get the DNA sequence and >> translate to protein sequence for the SNPs that are in gene's coding >> region. Could someone tell me how to do it? >> >> > I too would like to know if this information is available. I've recently > been working with the dbSNP results from NCBI but they display the results > in a graphical format rather than data that one can play with and ask > questions of like "What is the most disease causing gene in the Human > Genome?" or "What are the critical proteins damaged by gene defects in the > Human Genome?" ... "In terms of premature deaths, extended health care > requirements, loss of quality of life, etc.?" > > The same types of questions can be applied to the dog and cat genomes where > there is emotional value or the cow, horse, pig, etc. genomes where there is > economic value? > > The value of BioPerl would increase significantly if there were > functionality that would allow easy access to "these mutations may have > negative/positive impact" (which means you need a function that qualifies > mutations by degree) and allow for impact to be subjectively determined > (implying there must be some callback function to provide a user > quality/impact rating). > > For example: > $/@differences = protein_compare($mygene, $refseq_gene, @critical_aa, > @critical_domain, $callback) > Where $callback could "rate" differences about the protein and position and > the "type of interest" (e.g. metal binding amino acids, structural changing > amino acids, critical catalysis amino acids, etc.). > > A default callback would be based on some evolving definition of "critical" > changes which result in human disease for example. > > This is a "required" capability to be able to determine things like the > "adaptability" of a species -- those with fewest critical mutation points > may have better adaptability to mutation increasing circumstances. > > Please pardon any errors in perl syntax/usage its been a while since I've > written perl and I'd really rather be coding in C. > > Robert > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From alexl at users.sourceforge.net Mon Nov 9 18:44:07 2009 From: alexl at users.sourceforge.net (Alex Lancaster) Date: Mon, 09 Nov 2009 18:44:07 -0500 Subject: [Bioperl-l] version of ExtUtils::Manifest too strict? In-Reply-To: <1D9E943F-2EDC-49AB-83DE-78DED5A8AC23@illinois.edu> (Chris Fields's message of "Wed, 4 Nov 2009 07:53:35 -0600") References: <1D9E943F-2EDC-49AB-83DE-78DED5A8AC23@illinois.edu> Message-ID: >>>>> Chris Fields writes: > Alex, Not sure why ExtUtils::Manifest can't be bundled as a separate > perl package alone. It is part of perl core but it's also available > on CPAN separately from perl itself: > http://search.cpan.org/~rkobes/ExtUtils-Manifest-1.57/lib/ExtUtils/Manifest.pm Hi Chris, Yes, in principle it would be possible to have this split out as a separate package (currently it's a "subpackage" under the main perl package), unfortunately that's just not the way it's currently done in Fedora (probably because it's part of the core set and they like to update all relevant packages in one step) and I have little control over that. As I suspected, the perl maintainer is not at all enthusiastic for updating the whole of perl just for that package (except for rawhide which would mean that bioperl 1.6.1 would not be available until F-13, about 6 months from now). See: http://bugzilla.redhat.com/show_bug.cgi?id=533562#c1 Obviously I am not happy with this situation either, because it will freeze bioperl on Fedora at 1.6.0 for about 6 months, so can you recommend any temporary workarounds in the meantime? > This is the commit message for that BTW. This allows spaces in file > names for the MANIFEST. v1.52 is a bug fix and is required. > http://code.open-bio.org/svnweb/index.cgi/bioperl/revision/?rev=15673 Perhaps I could create a patch that renamed files with spaces in them to ones with no spaces and then rename them again upon installation. Can you point me to which files are the problematic ones that triggered the dependency for 1.52? Perhaps I can figure a workaround. Meanwhile I will press the maintainer of perl in Fedora to perhaps reconsider his position (e.g. if another update for perl is going out for another reason, like a security update, perhaps he could roll in the 1.52 update at the same time). Cheers, Alex From cjfields at illinois.edu Mon Nov 9 19:50:00 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 9 Nov 2009 18:50:00 -0600 Subject: [Bioperl-l] version of ExtUtils::Manifest too strict? In-Reply-To: References: <1D9E943F-2EDC-49AB-83DE-78DED5A8AC23@illinois.edu> Message-ID: <29EA2398-F60B-48F2-AFE7-39A44011C451@illinois.edu> On Nov 9, 2009, at 5:44 PM, Alex Lancaster wrote: >>>>>> Chris Fields writes: > >> Alex, Not sure why ExtUtils::Manifest can't be bundled as a separate >> perl package alone. It is part of perl core but it's also available >> on CPAN separately from perl itself: > >> http://search.cpan.org/~rkobes/ExtUtils-Manifest-1.57/lib/ExtUtils/Manifest.pm > > Hi Chris, > > Yes, in principle it would be possible to have this split out as a > separate package (currently it's a "subpackage" under the main perl > package), unfortunately that's just not the way it's currently done in > Fedora (probably because it's part of the core set and they like to > update all relevant packages in one step) and I have little control > over > that. > > As I suspected, the perl maintainer is not at all enthusiastic for > updating the whole of perl just for that package (except for rawhide > which would mean that bioperl 1.6.1 would not be available until F-13, > about 6 months from now). See: > > http://bugzilla.redhat.com/show_bug.cgi?id=533562#c1 > > Obviously I am not happy with this situation either, because it will > freeze bioperl on Fedora at 1.6.0 for about 6 months, so can you > recommend any temporary workarounds in the meantime? Well, if you don't absolutely require the MANIFEST for the final package you can forego the requirement. The file in question that triggered the requirement is a data file used only for testing: t/data/test 2.txt >> This is the commit message for that BTW. This allows spaces in file >> names for the MANIFEST. v1.52 is a bug fix and is required. > >> http://code.open-bio.org/svnweb/index.cgi/bioperl/revision/?rev=15673 > > Perhaps I could create a patch that renamed files with spaces in > them to > ones with no spaces and then rename them again upon installation. > > Can you point me to which files are the problematic ones that > triggered > the dependency for 1.52? Perhaps I can figure a workaround. > > Meanwhile I will press the maintainer of perl in Fedora to perhaps > reconsider his position (e.g. if another update for perl is going out > for another reason, like a security update, perhaps he could roll in > the > 1.52 update at the same time). > > Cheers, > Alex I would point out that this is a fairly significant bug fix for ExtUtils::Manifest. A newer point release of perl is now available (5.10.1) that contains the fix and has a fix for a performance regression that popped up in 5.10.0. chris From jay at jays.net Mon Nov 9 19:05:51 2009 From: jay at jays.net (Jay Hannah) Date: Mon, 9 Nov 2009 18:05:51 -0600 Subject: [Bioperl-l] Bio::Index::GenBank - by organism? Message-ID: <3B01A09C-198E-4691-B807-7ED3250BB81A@jays.net> Many thanks to Ewan Birney et. al. for Bio::Index::* I can throw away my awful grep based index-by-accession stuff. :) Any chance someone has also written an organism based index mechanism? Something like... while (my $seq = $inx?>get_Seq_by_organism('*Xanthomonas*')) { print $seq->display_id . "\n"; } Thanks, j From cjfields at illinois.edu Mon Nov 9 22:55:01 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 9 Nov 2009 21:55:01 -0600 Subject: [Bioperl-l] Bio::Index::GenBank - by organism? In-Reply-To: <3B01A09C-198E-4691-B807-7ED3250BB81A@jays.net> References: <3B01A09C-198E-4691-B807-7ED3250BB81A@jays.net> Message-ID: <12DFD22E-42DC-4626-9873-0DE3EBB5CFBD@illinois.edu> On Nov 9, 2009, at 6:05 PM, Jay Hannah wrote: > Many thanks to Ewan Birney et. al. for Bio::Index::* > > I can throw away my awful grep based index-by-accession stuff. :) > > Any chance someone has also written an organism based index > mechanism? Something like... > > while (my $seq = $inx?>get_Seq_by_organism('*Xanthomonas*')) { > print $seq->display_id . "\n"; > } > > Thanks, > > j It should work via id_parser(); from Bio::Index::GenBank: $inx->id_parser(\&get_id); # make the index $inx->make_index($file_name); # here is where the retrieval key is specified sub get_id { my $line = shift; $line =~ /clone="(\S+)"/; $1; } Change the code ref deal with the line you want and parse the name out. Caveat: this may not be absolutely perfect (it only passes in a line at a time, and some species lines will wrap). Also not sure how this would work in cases where multiple sequences from the same species are present. The other option is to preparse everything and tie a hash to store a species->UID map, then use that along with your Bio::Index index to grab what you need. chris From cjfields at illinois.edu Mon Nov 9 23:58:32 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 9 Nov 2009 22:58:32 -0600 Subject: [Bioperl-l] how to get the protein sequences from DNA sequences around novel SNPs? In-Reply-To: References: <794eafc20911091008g1f98b944ncbd66ac4962a85a3@mail.gmail.com> Message-ID: <435BA1A8-2CCB-4D7A-8909-84F8135C439F@illinois.edu> On Nov 9, 2009, at 3:15 PM, Robert Bradbury wrote: > On Mon, Nov 9, 2009 at 1:08 PM, Guangchun Song > wrote: >> >> I'm new bioperl user. I' working on a project: To determine the >> status of all tutative SNPs such as non-synonymous vs. synonymous, >> and >> predict the tranlational effect of non-synonymous mutations as benign >> or malicious. I'm trying to use bioperl to get the DNA sequence and >> translate to protein sequence for the SNPs that are in gene's coding >> region. Could someone tell me how to do it? >> >> > I too would like to know if this information is available. I've > recently > been working with the dbSNP results from NCBI but they display the > results > in a graphical format rather than data that one can play with and ask > questions of like "What is the most disease causing gene in the Human > Genome?" or "What are the critical proteins damaged by gene defects > in the > Human Genome?" ... "In terms of premature deaths, extended health care > requirements, loss of quality of life, etc.?" > > The same types of questions can be applied to the dog and cat > genomes where > there is emotional value or the cow, horse, pig, etc. genomes where > there is > economic value? > > The value of BioPerl would increase significantly if there were > functionality that would allow easy access to "these mutations may > have > negative/positive impact" (which means you need a function that > qualifies > mutations by degree) and allow for impact to be subjectively > determined > (implying there must be some callback function to provide a user > quality/impact rating). > > For example: > $/@differences = protein_compare($mygene, $refseq_gene, > @critical_aa, > @critical_domain, $callback) > Where $callback could "rate" differences about the protein and > position and > the "type of interest" (e.g. metal binding amino acids, structural > changing > amino acids, critical catalysis amino acids, etc.). > > A default callback would be based on some evolving definition of > "critical" > changes which result in human disease for example. > > This is a "required" capability to be able to determine things like > the > "adaptability" of a species -- those with fewest critical mutation > points > may have better adaptability to mutation increasing circumstances. > > Please pardon any errors in perl syntax/usage its been a while since > I've > written perl and I'd really rather be coding in C. > > Robert I will say that most of the information from the SNP database is available in various formats (see following link under 'Retrieval Types'): http://www.ncbi.nlm.nih.gov/corehtml/query/static/efetchseq_help.html You can access this information, as well as the full XML, using something like the following script. chris ------------------------------------------------ #!/usr/bin/perl -w use 5.010; use strict; use warnings; use Bio::DB::EUtilities; my $term = shift; my $eutil = Bio::DB::EUtilities->new(-eutil => 'esearch', -db => 'snp', -term => $term, -usehistory => 'y', -retmax => 100); my $hist = $eutil->next_History || die "No history returned"; # for SNP XML, change retmode to 'xml' $eutil->set_parameters(-eutil => 'efetch', -history => $hist, -retmode => 'text', -rettype => 'flt'); # dumps to STDOUT say $eutil->get_Response->content; From jluis.lavin at unavarra.es Tue Nov 10 05:43:40 2009 From: jluis.lavin at unavarra.es (jluis.lavin at unavarra.es) Date: Tue, 10 Nov 2009 11:43:40 +0100 (CET) Subject: [Bioperl-l] [Fwd: Re: A question about iBio::Index: and itscorrect use] In-Reply-To: References: <2642.130.206.164.153.1257435996.squirrel@webmail.unavarra.es><1 984ED07F36C446284B25F617964B6C6@NewLife><2969.130.206.164.153.1257438117.squirrel@webmail.unavarra.es> Message-ID: <3471.130.206.164.153.1257849820.squirrel@webmail.unavarra.es> Hello again, I tried what Mark told me modifying the code line he told me but there?s still a problem that I believe must be due to the sequences name. My secuences header on the Fasta file have this format: >PleosPC9_1_103820|fgenesh1_pg.3_#_1 Th part on the right of the pipe changes depending on the program used to create the gene model, for example: >PleosPC9_1_103820|fgenesh1_pg.3_#_1 >PleosPC9_1_123413|genemark.2731_g >PleosPC9_1_52065|e_gw1.3.64.1 So I guess I need to parse my ids somehow for thr program to detect only the first part of the fasta header (the "protein name") and not to get messed with the other side of the pipe... This is the corrected code I wrote following Mark?s indications, but I still don?t have any idea about the parsing issue... #!/c:/Perl -w use Bio::Index::Fasta; use strict; #PC9.fasta is my genomic file my $Index_File_Name ="PC9.fasta"; my $inx = Bio::Index::Fasta->new('PC9.fasta.idx'); #LCS.txt is my sequences list @ARGV = ; foreach my $id (@ARGV) { if ($id eq ''){ die ("empty list") } else { my $seqobj = $inx->fetch($id); my $out = new Bio::SeqIO (-file => ">>index_extracted.fasta", -format => 'fasta'); $out->write_seq($seqobj); } } exit; } Thanks in advance PD. May it be a faster way of extracting those sequences using plain PERL? El Jue, 5 de Noviembre de 2009, 17:39, Mark A. Jensen escribi?: > Yes, these are files created by the SDBM, Perl's internal db manager. You > should > be able to > open the index by simply > $inx = Bio::Index::Fasta->new('PC9.fasta.idx'); > and the dbm will know what to do-- > cheers MAJ > ----- Original Message ----- > From: > To: "Mark A. Jensen" > Cc: ; > Sent: Thursday, November 05, 2009 11:21 AM > Subject: Re: [Bioperl-l] [Fwd: Re: A question about iBio::Index: and its > correct > use] > > >> Thank you very much Mark, that?s a good point :$ >> I guess your correction is referred to the second script, isn?t it? >> >> If it is so, there is still a problem with the first script, it doesn?t >> create the PC9.fasta.idx file, instead it creates two files named: >> -PC9.fasta.idx.pag >> -PC9.fasta.idx.dir >> >> which seem to be clearly related with some kind of indexing >> process...but, >> unless the PC9.fasta.idx file is only virtual or remains hidden, I can?t >> find it anywhere... >> Forgive me if I?m talking nosense... >> >> Thank you very much again for your help ;) >> >> >> El Jue, 5 de Noviembre de 2009, 17:02, Mark A. Jensen escribi?: >>> Hey Jos?, >>> The first thing that jumps out it the index file name. Looks >>> like you create it as >>> PC9.fasta.idx >>> But you read it as >>> PC9.fasta >>> Not an unusual mistake. Do >>> my $inx = Bio::Index::Fasta->new('PC9.fasta.idx'); >>> and see if it works. >>> MAJ >>> ----- Original Message ----- >>> From: >>> To: >>> Sent: Thursday, November 05, 2009 10:46 AM >>> Subject: [Bioperl-l] [Fwd: Re: A question about iBio::Index: and its >>> correct >>> use] >>> >>> >>> >>> >>> ---------------------------- Mensaje original >>> ---------------------------- >>> Subject: Re: [Bioperl-l] A question about iBio::Index: and its correct >>> use >>> From: jluis.lavin at unavarra.es >>> Fecha: Jue, 5 de Noviembre de 2009, 16:46 >>> To: "Mark A. Jensen" >>> -------------------------------------------------------------------------- >>> >>> Hi Mark, >>> >>> I?ve actually got two scripts, the first one is to create the index and >>> the second one is to retrieve the sequence lis from the indexed file. >>> >>> 1)Here is the Index creation script: >>> >>> #!/c:/Perl -w >>> use strict; >>> use Bio::Index::Fasta; >>> use strict; >>> >>> print "Enter file for indexing: \n"; >>> my $Index_File_Name = ; >>> my $inx = Bio::Index::Fasta->new(-filename => $Index_File_Name.".idx", >>> -write_flag => 1); >>> $inx->make_index(my $File_Name); >>> >>> 2)And here is the sequence retrieval script: >>> >>> #!/c:/Perl -w >>> use Bio::Index::Fasta; >>> use strict; >>> #PC9.fasta is my genomic file >>> my $Index_File_Name ="PC9.fasta"; >>> my $inx = Bio::Index::Fasta->new($Index_File_Name); >>> #LCS.txt is my sequences list >>> @ARGV = ; >>> foreach my $id (@ARGV) { >>> if ($id eq ''){ >>> die ("empty list") >>> } >>> else { >>> my $seqobj = $inx->fetch($id); >>> my $out = new Bio::SeqIO (-file => ">>extracted_seqs.fasta", >>> -format => 'fasta'); >>> $out->write_seq($seqobj); >>> } >>> } >>> exit; >>> } >>> >>> I hope this code is not a total scum... >>> >>> Thanks in advance ;) >>> >>> >>> >>> El Jue, 5 de Noviembre de 2009, 16:39, Mark A. Jensen escribi?: >>>> Jos? -- It looks like this is a good solution to your problem. Please >>>> send >>>> you >>>> script so we can look at it- >>>> cheers Mark >>>> ----- Original Message ----- >>>> From: >>>> To: >>>> Sent: Thursday, November 05, 2009 10:28 AM >>>> Subject: [Bioperl-l] A question about iBio::Index: and its correct use >>>> >>>> >>>> >>>> Hello to all, >>>> >>>> I?m trying to write a script to retrieve a list of sequences from a >>>> local >>>> FASTA file (for example a fasta archive where all the protein models >>>> of >>>> an >>>> organism are stored). This file would be used by me as some kind >>>> "local >>>> database" (sorry if I mistake a few concepts...) >>>> I?ve been reading the BioPerl HOWTOs and I came across the >>>> Bio::Index::Fasta tool. >>>> If I didn?t misunderstood what I read (which can be easy because my >>>> low >>>> level on programming) this Indexing tool should do the job. >>>> I wrote a couple of scripts based on the documentation i read about >>>> this >>>> tool, but I don?t seem to be able to create the index file to be used >>>> later (to retrieve the sequences from). >>>> -First of all, I want to ask the people in this forum if the >>>> Bio::Index::Fasta is the right one to chose for this tasks. >>>> -Then I?ll beg you to take a look at my scripts, because I don?t seem >>>> to >>>> catch the bug... >>>> >>>> Best wishes to you all and thanks in advance ;) >>>> >>>> -- >>>> Jos? Luis Lav?n Trueba, PhD >>>> >>>> Dpto. de Producci?n Agraria >>>> Grupo de Gen?tica y Microbiolog?a >>>> Universidad P?blica de Navarra >>>> 31006 Pamplona >>>> Navarra >>>> SPAIN >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> >>> >>> >>> -- >>> Dr. Jos? Luis Lav?n Trueba >>> >>> Dpto. de Producci?n Agraria >>> Grupo de Gen?tica y Microbiolog?a >>> Universidad P?blica de Navarra >>> 31006 Pamplona >>> Navarra >>> SPAIN >>> >>> >>> >>> -- >>> Dr. Jos? Luis Lav?n Trueba >>> >>> Dpto. de Producci?n Agraria >>> Grupo de Gen?tica y Microbiolog?a >>> Universidad P?blica de Navarra >>> 31006 Pamplona >>> Navarra >>> SPAIN >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >> >> >> -- >> Dr. Jos? Luis Lav?n Trueba >> >> Dpto. de Producci?n Agraria >> Grupo de Gen?tica y Microbiolog?a >> Universidad P?blica de Navarra >> 31006 Pamplona >> Navarra >> SPAIN >> >> >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Dr. Jos? Luis Lav?n Trueba Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN From saikari78 at gmail.com Tue Nov 10 06:41:11 2009 From: saikari78 at gmail.com (saikari keitele) Date: Tue, 10 Nov 2009 11:41:11 +0000 Subject: [Bioperl-l] [bioperl newbie] Retrieving link to protein from PubChem In-Reply-To: References: <1ECC543A-F923-4D5E-A0C1-5BBD35ECAAE8@illinois.edu> Message-ID: Thanks again very much for your help and the script. i've been trying it, however I fail to find any protein record linked to a record in the pcsubstance database. Do you think that its is because no links have been defined between the 2 databases, or that I am just unlucky and that no link exists for the particular records I'm testing? Thanks again saikari On Mon, Nov 9, 2009 at 4:41 PM, saikari keitele wrote: > Fabulous!. Huge help. > saikari > > On Mon, Nov 9, 2009 at 4:27 PM, Chris Fields wrote: > >> On Nov 9, 2009, at 10:05 AM, saikari keitele wrote: >> >> Hi, >>> >>> I'm using Bioperl to retrieve records from PubChem. >>> I'm trying to find a way-but have been unsuccessful- to retrieve from a >>> compound record, the reference to the protein(s) that can synthesize the >>> compound. >>> Thanks very much. >>> >>> saikari >>> >> >> The below bioperl script returns the GI for proteins that correspond to >> the substance passed on the command line; invoke using 'perl >> pc_substance.pl substance_requested'. It probably needs more fiddling to >> catch everything but it should get you started. >> >> For other bits and pieces (such as how to retrieve the raw sequence >> files), please see the EUtilities HOWTO: >> >> http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook >> >> chris >> >> ---------------------------------------- >> >> #!/usr/bin/perl -w >> >> use 5.010; >> use strict; >> use warnings; >> use Bio::DB::EUtilities; >> >> my $substance = shift; >> >> my $eutil = Bio::DB::EUtilities->new(-eutil => 'esearch', >> -db => 'pcsubstance', >> -term => $substance, >> -usehistory => 'y'); >> >> my $hist = $eutil->next_History || die; >> >> $eutil->reset_parameters(-eutil => 'elink', >> -history => $hist, >> -db => 'protein', >> -dbfrom => 'pcsubstance', >> -retmax => 1000); >> >> say join(',',$eutil->get_ids); >> > > From heyne at informatik.uni-freiburg.de Tue Nov 10 07:55:06 2009 From: heyne at informatik.uni-freiburg.de (Steffen Heyne) Date: Tue, 10 Nov 2009 13:55:06 +0100 Subject: [Bioperl-l] problem with alignments and sequence locations Message-ID: <4AF962AA.7060908@informatik.uni-freiburg.de> Hi, I'm using Bioperl for my research and it is very useful! Thank you! Currently I have a problem with locations tags of sequences. I read in seed alignments of Rfam (in stockholm format, but I think it is similar to other formats). If the location is like: AB194432.1/908-846 the start/end values are changed to $seq->start = 846 $seq->end = 908 and therefore the new location (e.g.$seq->get_nse) is: AB194432.1/846-908 The $seq->strand tag is correctly set to -1 in this case, but if the alignment is written out again (clustal, stockholm,...) this strand info is lost and the sequences have this "wrong" location. But this information is important in respect to the sequence accession number. Is there a way to set the location back to the original one or is this behavior desired? Any manually setting with $seq->start($val) failed due to automatic checking. I'm using bioperl 1.6.1 Thanks! steffen -- --- Steffen Heyne, Dipl.-Bioinf. Lehrstuhl f?r Bioinformatik Institut f?r Informatik Albert-Ludwigs-Universit?t Freiburg Georges-K?hler-Allee 106 79110 Freiburg, Germany Tel: (+49) 761 203 8239 Fax: (+49) 761 203 7462 Mail: heyne at informatik.uni-freiburg.de From cjfields at illinois.edu Tue Nov 10 08:58:52 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 10 Nov 2009 07:58:52 -0600 Subject: [Bioperl-l] problem with alignments and sequence locations In-Reply-To: <4AF962AA.7060908@informatik.uni-freiburg.de> References: <4AF962AA.7060908@informatik.uni-freiburg.de> Message-ID: On Nov 10, 2009, at 6:55 AM, Steffen Heyne wrote: > Hi, > > I'm using Bioperl for my research and it is very useful! Thank you! > > Currently I have a problem with locations tags of sequences. I read > in seed alignments of Rfam (in stockholm format, but I think it is > similar to other formats). > > If the location is like: > > AB194432.1/908-846 > > the start/end values are changed to > > $seq->start = 846 > $seq->end = 908 > > and therefore the new location (e.g.$seq->get_nse) is: > > AB194432.1/846-908 > > The $seq->strand tag is correctly set to -1 in this case, but if the > alignment is written out again (clustal, stockholm,...) this strand > info is lost and the sequences have this "wrong" location. But this > information is important in respect to the sequence accession number. > > Is there a way to set the location back to the original one or is > this behavior desired? Any manually setting with $seq->start($val) > failed due to automatic checking. > > I'm using bioperl 1.6.1 > > Thanks! > > steffen This is a definite bug. We recently discussed amending the NSE format due to this (the subject came up over the last few months or so); it's fallen through the cracks. Fortunaely it is very easy to fix (the relevant method is in LocatableSeq). Does anyone have a problem with me adding this in? It will change output for only those instances where the strand is -1, so AB194432.1/908-846 would be start = 846, end = 908, strand = -1 AB194432.1/846-908 would be start = 846, end = 908, strand = 1 chris From cjfields at illinois.edu Tue Nov 10 09:05:51 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 10 Nov 2009 08:05:51 -0600 Subject: [Bioperl-l] [bioperl newbie] Retrieving link to protein from PubChem In-Reply-To: References: <1ECC543A-F923-4D5E-A0C1-5BBD35ECAAE8@illinois.edu> Message-ID: <738F6320-B87A-4541-B9FA-20273ABA96B9@illinois.edu> On Nov 10, 2009, at 5:41 AM, saikari keitele wrote: > Thanks again very much for your help and the script. > i've been trying it, however I fail to find any protein record > linked to a > record in the pcsubstance database. > Do you think that its is because no links have been defined between > the 2 > databases, or that I am just unlucky and that no link exists for the > particular records I'm testing? > Thanks again > > saikari It's probably that no links have been defined. I have found similar problems in the past with pubchem, in that not all substances have proteins associated with them. Most proteins linked to are those with a deposited structure. There are a few other databases to check out; KEGG, the BioCyc dbs (like EcoCyc), come to mind. I don't think we have a generic remote query engine set up for any of those unfortunately (unless there is one I'm unaware of), but I know BioCyc comes with it's own set of tools (including perl- and java-based query tools) and can be set up locally, which is likely much faster and more in lines with what you need. chris ... From vebaev at gmail.com Tue Nov 10 12:38:54 2009 From: vebaev at gmail.com (Vesselin Baev) Date: Tue, 10 Nov 2009 09:38:54 -0800 (PST) Subject: [Bioperl-l] Invitation to connect on LinkedIn Message-ID: <1983273212.597925.1257874734811.JavaMail.app@ech3-cdn07.prod> LinkedIn ------------ Vesselin Baev requested to add you as a connection on LinkedIn: ------------------------------------------ Bolotin,, I'd like to add you to my professional network on LinkedIn. - Vesselin Accept invitation from Vesselin Baev http://www.linkedin.com/e/uez6TYkHzbaXxXM-lUk23auFwJZodcPlXc2UWC0Ao8h/blk/I1572789477_2/pmpxnSRJrSdvj4R5fnhv9ClRsDgZp6lQs6lzoQ5AomZIpn8_cBYTdPgVe3sOdPkNiiZFlAN1oPlOp2YMdPsTcz8OdjwLrCBxbOYWrSlI/EML_comm_afe/ View invitation from Vesselin Baev http://www.linkedin.com/e/uez6TYkHzbaXxXM-lUk23auFwJZodcPlXc2UWC0Ao8h/blk/I1572789477_2/39vdPsQejwTczsRckALqnpPbOYWrSlI/svi/ ------------------------------------------ DID YOU KNOW your LinkedIn profile helps you control your public image when people search for you? Setting your profile as public means your LinkedIn profile will come up when people enter your name in leading search engines. Take control of your image! http://www.linkedin.com/e/ewp/inv-22/ ------ (c) 2009, LinkedIn Corporation From jason at bioperl.org Tue Nov 10 13:47:02 2009 From: jason at bioperl.org (Jason Stajich) Date: Tue, 10 Nov 2009 10:47:02 -0800 Subject: [Bioperl-l] [Fwd: Re: A question about iBio::Index: and itscorrect use] In-Reply-To: <3471.130.206.164.153.1257849820.squirrel@webmail.unavarra.es> References: <2642.130.206.164.153.1257435996.squirrel@webmail.unavarra.es><1 984ED07F36C446284B25F617964B6C6@NewLife><2969.130.206.164.153.1257438117.squirrel@webmail.unavarra.es> <3471.130.206.164.153.1257849820.squirrel@webmail.unavarra.es> Message-ID: Page 44 has the custom ID info or look at documentation for Bio::DB::Fasta - there is a similar syntax for Bio::Index::Fasta if you read the perldoc for the module. http://jason.open-bio.org/Bioperl_Tutorials/ProgrammingBiology2008/ProgBiology_BioPerl_I.pdf Don't re-opening SeqIO each time just do it once at the beginning outside of the loop and then call write_seq within the loop. This is one nuance of doing OO programming vs procedural is that there is some outside state information that can persist in an object, but conceptually, you want to open a filehandle once and just keep writing to it. -jason On Nov 10, 2009, at 2:43 AM, jluis.lavin at unavarra.es wrote: > Hello again, > > I tried what Mark told me modifying the code line he told me but > there?s > still a problem that I believe must be due to the sequences name. > My secuences header on the Fasta file have this format: > >> PleosPC9_1_103820|fgenesh1_pg.3_#_1 > > Th part on the right of the pipe changes depending on the program > used to > create the gene model, for example: > >> PleosPC9_1_103820|fgenesh1_pg.3_#_1 >> PleosPC9_1_123413|genemark.2731_g >> PleosPC9_1_52065|e_gw1.3.64.1 > > So I guess I need to parse my ids somehow for thr program to detect > only > the first part of the fasta header (the "protein name") and not to get > messed with the other side of the pipe... > > This is the corrected code I wrote following Mark?s indications, but I > still don?t have any idea about the parsing issue... > > #!/c:/Perl -w > use Bio::Index::Fasta; > use strict; > #PC9.fasta is my genomic file > my $Index_File_Name ="PC9.fasta"; > my $inx = Bio::Index::Fasta->new('PC9.fasta.idx'); > #LCS.txt is my sequences list > @ARGV = ; > foreach my $id (@ARGV) { > if ($id eq ''){ > die ("empty list") > } > else { > my $seqobj = $inx->fetch($id); > my $out = new Bio::SeqIO (-file => ">>index_extracted.fasta", > -format => 'fasta'); > $out->write_seq($seqobj); > } > } > exit; > } > > Thanks in advance > > PD. May it be a faster way of extracting those sequences using plain > PERL? > > > > > El Jue, 5 de Noviembre de 2009, 17:39, Mark A. Jensen escribi?: >> Yes, these are files created by the SDBM, Perl's internal db >> manager. You >> should >> be able to >> open the index by simply >> $inx = Bio::Index::Fasta->new('PC9.fasta.idx'); >> and the dbm will know what to do-- >> cheers MAJ >> ----- Original Message ----- >> From: >> To: "Mark A. Jensen" >> Cc: ; >> Sent: Thursday, November 05, 2009 11:21 AM >> Subject: Re: [Bioperl-l] [Fwd: Re: A question about iBio::Index: >> and its >> correct >> use] >> >> >>> Thank you very much Mark, that?s a good point :$ >>> I guess your correction is referred to the second script, isn?t it? >>> >>> If it is so, there is still a problem with the first script, it >>> doesn?t >>> create the PC9.fasta.idx file, instead it creates two files named: >>> -PC9.fasta.idx.pag >>> -PC9.fasta.idx.dir >>> >>> which seem to be clearly related with some kind of indexing >>> process...but, >>> unless the PC9.fasta.idx file is only virtual or remains hidden, I >>> can?t >>> find it anywhere... >>> Forgive me if I?m talking nosense... >>> >>> Thank you very much again for your help ;) >>> >>> >>> El Jue, 5 de Noviembre de 2009, 17:02, Mark A. Jensen escribi?: >>>> Hey Jos?, >>>> The first thing that jumps out it the index file name. Looks >>>> like you create it as >>>> PC9.fasta.idx >>>> But you read it as >>>> PC9.fasta >>>> Not an unusual mistake. Do >>>> my $inx = Bio::Index::Fasta->new('PC9.fasta.idx'); >>>> and see if it works. >>>> MAJ >>>> ----- Original Message ----- >>>> From: >>>> To: >>>> Sent: Thursday, November 05, 2009 10:46 AM >>>> Subject: [Bioperl-l] [Fwd: Re: A question about iBio::Index: and >>>> its >>>> correct >>>> use] >>>> >>>> >>>> >>>> >>>> ---------------------------- Mensaje original >>>> ---------------------------- >>>> Subject: Re: [Bioperl-l] A question about iBio::Index: and its >>>> correct >>>> use >>>> From: jluis.lavin at unavarra.es >>>> Fecha: Jue, 5 de Noviembre de 2009, 16:46 >>>> To: "Mark A. Jensen" >>>> -------------------------------------------------------------------------- >>>> >>>> Hi Mark, >>>> >>>> I?ve actually got two scripts, the first one is to create the >>>> index and >>>> the second one is to retrieve the sequence lis from the indexed >>>> file. >>>> >>>> 1)Here is the Index creation script: >>>> >>>> #!/c:/Perl -w >>>> use strict; >>>> use Bio::Index::Fasta; >>>> use strict; >>>> >>>> print "Enter file for indexing: \n"; >>>> my $Index_File_Name = ; >>>> my $inx = Bio::Index::Fasta->new(-filename => >>>> $Index_File_Name.".idx", >>>> -write_flag => 1); >>>> $inx->make_index(my $File_Name); >>>> >>>> 2)And here is the sequence retrieval script: >>>> >>>> #!/c:/Perl -w >>>> use Bio::Index::Fasta; >>>> use strict; >>>> #PC9.fasta is my genomic file >>>> my $Index_File_Name ="PC9.fasta"; >>>> my $inx = Bio::Index::Fasta->new($Index_File_Name); >>>> #LCS.txt is my sequences list >>>> @ARGV = ; >>>> foreach my $id (@ARGV) { >>>> if ($id eq ''){ >>>> die ("empty list") >>>> } >>>> else { >>>> my $seqobj = $inx->fetch($id); >>>> my $out = new Bio::SeqIO (-file => ">>extracted_seqs.fasta", >>>> -format => 'fasta'); >>>> $out->write_seq($seqobj); >>>> } >>>> } >>>> exit; >>>> } >>>> >>>> I hope this code is not a total scum... >>>> >>>> Thanks in advance ;) >>>> >>>> >>>> >>>> El Jue, 5 de Noviembre de 2009, 16:39, Mark A. Jensen escribi?: >>>>> Jos? -- It looks like this is a good solution to your problem. >>>>> Please >>>>> send >>>>> you >>>>> script so we can look at it- >>>>> cheers Mark >>>>> ----- Original Message ----- >>>>> From: >>>>> To: >>>>> Sent: Thursday, November 05, 2009 10:28 AM >>>>> Subject: [Bioperl-l] A question about iBio::Index: and its >>>>> correct use >>>>> >>>>> >>>>> >>>>> Hello to all, >>>>> >>>>> I?m trying to write a script to retrieve a list of sequences >>>>> from a >>>>> local >>>>> FASTA file (for example a fasta archive where all the protein >>>>> models >>>>> of >>>>> an >>>>> organism are stored). This file would be used by me as some kind >>>>> "local >>>>> database" (sorry if I mistake a few concepts...) >>>>> I?ve been reading the BioPerl HOWTOs and I came across the >>>>> Bio::Index::Fasta tool. >>>>> If I didn?t misunderstood what I read (which can be easy because >>>>> my >>>>> low >>>>> level on programming) this Indexing tool should do the job. >>>>> I wrote a couple of scripts based on the documentation i read >>>>> about >>>>> this >>>>> tool, but I don?t seem to be able to create the index file to be >>>>> used >>>>> later (to retrieve the sequences from). >>>>> -First of all, I want to ask the people in this forum if the >>>>> Bio::Index::Fasta is the right one to chose for this tasks. >>>>> -Then I?ll beg you to take a look at my scripts, because I don?t >>>>> seem >>>>> to >>>>> catch the bug... >>>>> >>>>> Best wishes to you all and thanks in advance ;) >>>>> >>>>> -- >>>>> Jos? Luis Lav?n Trueba, PhD >>>>> >>>>> Dpto. de Producci?n Agraria >>>>> Grupo de Gen?tica y Microbiolog?a >>>>> Universidad P?blica de Navarra >>>>> 31006 Pamplona >>>>> Navarra >>>>> SPAIN >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Dr. Jos? Luis Lav?n Trueba >>>> >>>> Dpto. de Producci?n Agraria >>>> Grupo de Gen?tica y Microbiolog?a >>>> Universidad P?blica de Navarra >>>> 31006 Pamplona >>>> Navarra >>>> SPAIN >>>> >>>> >>>> >>>> -- >>>> Dr. Jos? Luis Lav?n Trueba >>>> >>>> Dpto. de Producci?n Agraria >>>> Grupo de Gen?tica y Microbiolog?a >>>> Universidad P?blica de Navarra >>>> 31006 Pamplona >>>> Navarra >>>> SPAIN >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> >>> >>> >>> -- >>> Dr. Jos? Luis Lav?n Trueba >>> >>> Dpto. de Producci?n Agraria >>> Grupo de Gen?tica y Microbiolog?a >>> Universidad P?blica de Navarra >>> 31006 Pamplona >>> Navarra >>> SPAIN >>> >>> >>> >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > -- > Dr. Jos? Luis Lav?n Trueba > > Dpto. de Producci?n Agraria > Grupo de Gen?tica y Microbiolog?a > Universidad P?blica de Navarra > 31006 Pamplona > Navarra > SPAIN > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From jason at bioperl.org Tue Nov 10 13:50:00 2009 From: jason at bioperl.org (Jason Stajich) Date: Tue, 10 Nov 2009 10:50:00 -0800 Subject: [Bioperl-l] Bio::Index::GenBank - by organism? In-Reply-To: <12DFD22E-42DC-4626-9873-0DE3EBB5CFBD@illinois.edu> References: <3B01A09C-198E-4691-B807-7ED3250BB81A@jays.net> <12DFD22E-42DC-4626-9873-0DE3EBB5CFBD@illinois.edu> Message-ID: <2BA451B1-6E18-483E-B655-74D1146772CC@bioperl.org> You might also look at what mygenbank does: http://homepage.mac.com/iankorf/mygenbank.html On Nov 9, 2009, at 7:55 PM, Chris Fields wrote: > On Nov 9, 2009, at 6:05 PM, Jay Hannah wrote: > >> Many thanks to Ewan Birney et. al. for Bio::Index::* >> >> I can throw away my awful grep based index-by-accession stuff. :) >> >> Any chance someone has also written an organism based index >> mechanism? Something like... >> >> while (my $seq = $inx?>get_Seq_by_organism('*Xanthomonas*')) { >> print $seq->display_id . "\n"; >> } >> >> Thanks, >> >> j > > It should work via id_parser(); from Bio::Index::GenBank: > > $inx->id_parser(\&get_id); > # make the index > $inx->make_index($file_name); > > # here is where the retrieval key is specified > sub get_id { > my $line = shift; > $line =~ /clone="(\S+)"/; > $1; > } > > Change the code ref deal with the line you want and parse the name > out. Caveat: this may not be absolutely perfect (it only passes in > a line at a time, and some species lines will wrap). Also not sure > how this would work in cases where multiple sequences from the same > species are present. > > The other option is to preparse everything and tie a hash to store a > species->UID map, then use that along with your Bio::Index index to > grab what you need. > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From jluis.lavin at unavarra.es Wed Nov 11 10:01:18 2009 From: jluis.lavin at unavarra.es (jluis.lavin at unavarra.es) Date: Wed, 11 Nov 2009 16:01:18 +0100 (CET) Subject: [Bioperl-l] [Fwd: Re: A question about iBio::Index: anditscorrect use] In-Reply-To: References: <2642.130.206.164.153.1257435996.squirrel@webmail.unavarra.es><1 984ED07F36C446284B25F617964B6C6@NewLife><2969.130.206.164.153.1257438117.sq uirrel@webmail.unavarra.es><3471. 130.206.164.153.1257849820.squirrel@webmail.unavarra.es> Message-ID: <2979.130.206.164.153.1257951678.squirrel@webmail.unavarra.es> Hi once again, I have modified the script following the instructions Jason gave me (at last what I understood, remember it is my first time trying to learn a programming language...and I?m not the smartest guy in the class, hehe)but it seems I didn?t fix the problem... Here?s the new code I wrote: #!/c:/Perl -w use strict; use Bio::Index::Fasta; use Bio::DB::Fasta; use Bio::SeqIO; use IO::File; # assign files to scalars my $index_file = 'PC91.fasta'; my $id_list = 'LCS2.txt'; # open index file my $db = Bio::DB::Fasta->new($index_file) or die; # open the id list my $in = IO::File->new($id_list) or die; # open FASTA to write my $out = new Bio::SeqIO (-file => ">>index_extracted.fasta", -format => 'fasta'); # retrieve ids loop foreach my $id ($in) { if ($id eq ''){ die ("empty list") } else { my $seqobj = my $inx->fetch($id); $out->write_seq($seqobj); } } # parse fasta headers sub my_makeid { my $id = shift; if ( $id =~ /^>[^:]+:(\S+)/ ) { return $1; } elsif ($id =~ /^>(\S+)/) { return $1; } else { warn("cannot parse ID for $id\n"); } } exit; Would anyone, please take a look at it ... Thanks in advance ;) El Mar, 10 de Noviembre de 2009, 19:47, Jason Stajich escribi?: > Page 44 has the custom ID info or look at documentation for > Bio::DB::Fasta - there is a similar syntax for Bio::Index::Fasta if > you read the perldoc for the module. > > http://jason.open-bio.org/Bioperl_Tutorials/ProgrammingBiology2008/ProgBiology_BioPerl_I.pdf > > Don't re-opening SeqIO each time just do it once at the beginning > outside of the loop and then call write_seq within the loop. > > This is one nuance of doing OO programming vs procedural is that there > is some outside state information that can persist in an object, but > conceptually, you want to open a filehandle once and just keep writing > to it. > > -jason > On Nov 10, 2009, at 2:43 AM, jluis.lavin at unavarra.es wrote: > >> Hello again, >> >> I tried what Mark told me modifying the code line he told me but >> there?s >> still a problem that I believe must be due to the sequences name. >> My secuences header on the Fasta file have this format: >> >>> PleosPC9_1_103820|fgenesh1_pg.3_#_1 >> >> Th part on the right of the pipe changes depending on the program >> used to >> create the gene model, for example: >> >>> PleosPC9_1_103820|fgenesh1_pg.3_#_1 >>> PleosPC9_1_123413|genemark.2731_g >>> PleosPC9_1_52065|e_gw1.3.64.1 >> >> So I guess I need to parse my ids somehow for thr program to detect >> only >> the first part of the fasta header (the "protein name") and not to get >> messed with the other side of the pipe... >> >> This is the corrected code I wrote following Mark?s indications, but I >> still don?t have any idea about the parsing issue... >> >> #!/c:/Perl -w >> use Bio::Index::Fasta; >> use strict; >> #PC9.fasta is my genomic file >> my $Index_File_Name ="PC9.fasta"; >> my $inx = Bio::Index::Fasta->new('PC9.fasta.idx'); >> #LCS.txt is my sequences list >> @ARGV = ; >> foreach my $id (@ARGV) { >> if ($id eq ''){ >> die ("empty list") >> } >> else { >> my $seqobj = $inx->fetch($id); >> my $out = new Bio::SeqIO (-file => ">>index_extracted.fasta", >> -format => 'fasta'); >> $out->write_seq($seqobj); >> } >> } >> exit; >> } >> >> Thanks in advance >> >> PD. May it be a faster way of extracting those sequences using plain >> PERL? >> >> >> >> >> El Jue, 5 de Noviembre de 2009, 17:39, Mark A. Jensen escribi?: >>> Yes, these are files created by the SDBM, Perl's internal db >>> manager. You >>> should >>> be able to >>> open the index by simply >>> $inx = Bio::Index::Fasta->new('PC9.fasta.idx'); >>> and the dbm will know what to do-- >>> cheers MAJ >>> ----- Original Message ----- >>> From: >>> To: "Mark A. Jensen" >>> Cc: ; >>> Sent: Thursday, November 05, 2009 11:21 AM >>> Subject: Re: [Bioperl-l] [Fwd: Re: A question about iBio::Index: >>> and its >>> correct >>> use] >>> >>> >>>> Thank you very much Mark, that?s a good point :$ >>>> I guess your correction is referred to the second script, isn?t it? >>>> >>>> If it is so, there is still a problem with the first script, it >>>> doesn?t >>>> create the PC9.fasta.idx file, instead it creates two files named: >>>> -PC9.fasta.idx.pag >>>> -PC9.fasta.idx.dir >>>> >>>> which seem to be clearly related with some kind of indexing >>>> process...but, >>>> unless the PC9.fasta.idx file is only virtual or remains hidden, I >>>> can?t >>>> find it anywhere... >>>> Forgive me if I?m talking nosense... >>>> >>>> Thank you very much again for your help ;) >>>> >>>> >>>> El Jue, 5 de Noviembre de 2009, 17:02, Mark A. Jensen escribi?: >>>>> Hey Jos?, >>>>> The first thing that jumps out it the index file name. Looks >>>>> like you create it as >>>>> PC9.fasta.idx >>>>> But you read it as >>>>> PC9.fasta >>>>> Not an unusual mistake. Do >>>>> my $inx = Bio::Index::Fasta->new('PC9.fasta.idx'); >>>>> and see if it works. >>>>> MAJ >>>>> ----- Original Message ----- >>>>> From: >>>>> To: >>>>> Sent: Thursday, November 05, 2009 10:46 AM >>>>> Subject: [Bioperl-l] [Fwd: Re: A question about iBio::Index: and >>>>> its >>>>> correct >>>>> use] >>>>> >>>>> >>>>> >>>>> >>>>> ---------------------------- Mensaje original >>>>> ---------------------------- >>>>> Subject: Re: [Bioperl-l] A question about iBio::Index: and its >>>>> correct >>>>> use >>>>> From: jluis.lavin at unavarra.es >>>>> Fecha: Jue, 5 de Noviembre de 2009, 16:46 >>>>> To: "Mark A. Jensen" >>>>> -------------------------------------------------------------------------- >>>>> >>>>> Hi Mark, >>>>> >>>>> I?ve actually got two scripts, the first one is to create the >>>>> index and >>>>> the second one is to retrieve the sequence lis from the indexed >>>>> file. >>>>> >>>>> 1)Here is the Index creation script: >>>>> >>>>> #!/c:/Perl -w >>>>> use strict; >>>>> use Bio::Index::Fasta; >>>>> use strict; >>>>> >>>>> print "Enter file for indexing: \n"; >>>>> my $Index_File_Name = ; >>>>> my $inx = Bio::Index::Fasta->new(-filename => >>>>> $Index_File_Name.".idx", >>>>> -write_flag => 1); >>>>> $inx->make_index(my $File_Name); >>>>> >>>>> 2)And here is the sequence retrieval script: >>>>> >>>>> #!/c:/Perl -w >>>>> use Bio::Index::Fasta; >>>>> use strict; >>>>> #PC9.fasta is my genomic file >>>>> my $Index_File_Name ="PC9.fasta"; >>>>> my $inx = Bio::Index::Fasta->new($Index_File_Name); >>>>> #LCS.txt is my sequences list >>>>> @ARGV = ; >>>>> foreach my $id (@ARGV) { >>>>> if ($id eq ''){ >>>>> die ("empty list") >>>>> } >>>>> else { >>>>> my $seqobj = $inx->fetch($id); >>>>> my $out = new Bio::SeqIO (-file => ">>extracted_seqs.fasta", >>>>> -format => 'fasta'); >>>>> $out->write_seq($seqobj); >>>>> } >>>>> } >>>>> exit; >>>>> } >>>>> >>>>> I hope this code is not a total scum... >>>>> >>>>> Thanks in advance ;) >>>>> >>>>> >>>>> >>>>> El Jue, 5 de Noviembre de 2009, 16:39, Mark A. Jensen escribi?: >>>>>> Jos? -- It looks like this is a good solution to your problem. >>>>>> Please >>>>>> send >>>>>> you >>>>>> script so we can look at it- >>>>>> cheers Mark >>>>>> ----- Original Message ----- >>>>>> From: >>>>>> To: >>>>>> Sent: Thursday, November 05, 2009 10:28 AM >>>>>> Subject: [Bioperl-l] A question about iBio::Index: and its >>>>>> correct use >>>>>> >>>>>> >>>>>> >>>>>> Hello to all, >>>>>> >>>>>> I?m trying to write a script to retrieve a list of sequences >>>>>> from a >>>>>> local >>>>>> FASTA file (for example a fasta archive where all the protein >>>>>> models >>>>>> of >>>>>> an >>>>>> organism are stored). This file would be used by me as some kind >>>>>> "local >>>>>> database" (sorry if I mistake a few concepts...) >>>>>> I?ve been reading the BioPerl HOWTOs and I came across the >>>>>> Bio::Index::Fasta tool. >>>>>> If I didn?t misunderstood what I read (which can be easy because >>>>>> my >>>>>> low >>>>>> level on programming) this Indexing tool should do the job. >>>>>> I wrote a couple of scripts based on the documentation i read >>>>>> about >>>>>> this >>>>>> tool, but I don?t seem to be able to create the index file to be >>>>>> used >>>>>> later (to retrieve the sequences from). >>>>>> -First of all, I want to ask the people in this forum if the >>>>>> Bio::Index::Fasta is the right one to chose for this tasks. >>>>>> -Then I?ll beg you to take a look at my scripts, because I don?t >>>>>> seem >>>>>> to >>>>>> catch the bug... >>>>>> >>>>>> Best wishes to you all and thanks in advance ;) >>>>>> >>>>>> -- >>>>>> Jos? Luis Lav?n Trueba, PhD >>>>>> >>>>>> Dpto. de Producci?n Agraria >>>>>> Grupo de Gen?tica y Microbiolog?a >>>>>> Universidad P?blica de Navarra >>>>>> 31006 Pamplona >>>>>> Navarra >>>>>> SPAIN >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Dr. Jos? Luis Lav?n Trueba >>>>> >>>>> Dpto. de Producci?n Agraria >>>>> Grupo de Gen?tica y Microbiolog?a >>>>> Universidad P?blica de Navarra >>>>> 31006 Pamplona >>>>> Navarra >>>>> SPAIN >>>>> >>>>> >>>>> >>>>> -- >>>>> Dr. Jos? Luis Lav?n Trueba >>>>> >>>>> Dpto. de Producci?n Agraria >>>>> Grupo de Gen?tica y Microbiolog?a >>>>> Universidad P?blica de Navarra >>>>> 31006 Pamplona >>>>> Navarra >>>>> SPAIN >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Dr. Jos? Luis Lav?n Trueba >>>> >>>> Dpto. de Producci?n Agraria >>>> Grupo de Gen?tica y Microbiolog?a >>>> Universidad P?blica de Navarra >>>> 31006 Pamplona >>>> Navarra >>>> SPAIN >>>> >>>> >>>> >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> >> -- >> Dr. Jos? Luis Lav?n Trueba >> >> Dpto. de Producci?n Agraria >> Grupo de Gen?tica y Microbiolog?a >> Universidad P?blica de Navarra >> 31006 Pamplona >> Navarra >> SPAIN >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Dr. Jos? Luis Lav?n Trueba Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN From maj at fortinbras.us Wed Nov 11 18:48:33 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 11 Nov 2009 18:48:33 -0500 Subject: [Bioperl-l] Maq assembly wrapper ready for beta testing Message-ID: <4057E5A862B845EA8BB153888075590C@NewLife> Hi All- New modules are available in the core and in bioperl-run for working with Heng Li's short read assembler "maq" (http://maq.sourceforge.net/maq-man.shtml). Bio::Tools::Run::Maq allows a quick assembly call with a canned a maq pipeline, and also allows individual maq commands to be called separately. It uses Bio::Assembly::IO::maq (a read-only module) to deliver a Bio::Assembly::Scaffold from maq output. If you're interested, see http://www.bioperl.org/wiki/HOWTO:Short-read_assemblies_with_maq and update your core and bioperl-run. The code inherits from Florent's excellent new Bio::Tools::Run::AssemblerBase -- kudos to him!! tests are in bioperl-run/trunk/t/Maq.t, see them for myriad examples send me the bugs MAJ From clarsen at vecna.com Thu Nov 12 12:22:26 2009 From: clarsen at vecna.com (Chris Larsen) Date: Thu, 12 Nov 2009 12:22:26 -0500 Subject: [Bioperl-l] Polyproteins, ribo slippage, and mat_peptide in viruses? In-Reply-To: <320fb6e00910271029m26f07564l727fb78adae81c11@mail.gmail.com> References: <320fb6e00910271029m26f07564l727fb78adae81c11@mail.gmail.com> Message-ID: <7BBAE077-4D76-46C2-BF66-363F5A017278@vecna.com> All, This is a short followup on the prior thread of discussion, regarding computing mature peptide sequences for viruses. The topic has gone underwater for the time being as we solve some problems with source data. While the biopython effort and contributors on this board have given good guidance, and we now have scripts that function (thanks mostly to pcock), however, the source data on which everything relies is suspect: mat_peptide 15118..16914 <=== /product="nsp13" /note="helicase" I can tell you the virus community does not want to rely heavily, on those position numbers. Furthermore we have found fewer compete source genomes for viruses than bacteria, more virus-to-virus variation in the data fields annotated in the GBK file, (Gene, CDS, ORF, Protein, Polyprotein, mat_peptide, db_xref) and in fact the community will have to come together significantly on how these molecules are defined in public repositories, before a mature scripting effort becomes reliable, public and well received. Because of the variation in viruses, it's not even clear at this point what a 'gene' is. I will let you know how we proceed when more sequence data has been fully analyzed, and we can think about making any perl based solution a new viral protein module. Thanks, Chris -- Christopher Larsen, Ph.D. Sr. Scientist / Grants Manager Vecna Technologies 6404 Ivy Lane #500 Greenbelt, MD 20770 Phone: (240) 965-4525 Fax: (240) 547-6133 240-737-4525 From David.Messina at sbc.su.se Thu Nov 12 14:20:54 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 12 Nov 2009 20:20:54 +0100 Subject: [Bioperl-l] highest PAML version supported? Message-ID: <628aabb70911121120w4c609056v50204b9bd9e5c3fb@mail.gmail.com> Hi everyone, What is the latest version of PAML (specifically codeml) that I can use with bioperl-live and bioperl-run? I looked around and couldn't find where (or if) this is documented. With PAML version 4.3a against the current trunk of both -live and -run I see this: ------------- EXCEPTION Bio::Root::NotImplemented ------------- MSG: Unknown format of PAML output did not see seqtype STACK Bio::Tools::Phylo::PAML::_parse_summary /Users/dave/src/bioperl-live/Bio/Tools/Phylo/PAML.pm:461 STACK Bio::Tools::Phylo::PAML::next_result /Users/dave/src/bioperl-live/Bio/Tools/Phylo/PAML.pm:270 STACK toplevel ../bin/cluster_kaks:251 --------------------------------------------------------------- ...which I suspect (but haven't confirmed) is due to a change in the file format. Dave From jason at bioperl.org Thu Nov 12 14:29:22 2009 From: jason at bioperl.org (Jason Stajich) Date: Thu, 12 Nov 2009 11:29:22 -0800 Subject: [Bioperl-l] highest PAML version supported? In-Reply-To: <628aabb70911121120w4c609056v50204b9bd9e5c3fb@mail.gmail.com> References: <628aabb70911121120w4c609056v50204b9bd9e5c3fb@mail.gmail.com> Message-ID: prolly 3.15 or so. it really needs a maintainer!!! On Nov 12, 2009, at 11:20 AM, Dave Messina wrote: > Hi everyone, > > What is the latest version of PAML (specifically codeml) that I can > use with > bioperl-live and bioperl-run? > > I looked around and couldn't find where (or if) this is documented. > > > With PAML version 4.3a against the current trunk of both -live and - > run I > see this: > ------------- EXCEPTION Bio::Root::NotImplemented ------------- > MSG: Unknown format of PAML output did not see seqtype > STACK Bio::Tools::Phylo::PAML::_parse_summary > /Users/dave/src/bioperl-live/Bio/Tools/Phylo/PAML.pm:461 > STACK Bio::Tools::Phylo::PAML::next_result > /Users/dave/src/bioperl-live/Bio/Tools/Phylo/PAML.pm:270 > STACK toplevel ../bin/cluster_kaks:251 > --------------------------------------------------------------- > > ...which I suspect (but haven't confirmed) is due to a change in the > file > format. > > > Dave > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From scott at scottcain.net Fri Nov 13 09:48:43 2009 From: scott at scottcain.net (Scott Cain) Date: Fri, 13 Nov 2009 09:48:43 -0500 Subject: [Bioperl-l] January GMOD meeting announcement Message-ID: <4536f7700911130648j40eb2d82g2594adaccf476d73@mail.gmail.com> Hello, I am pleased to announce that the January GMOD meeting will be taking place on January 14 and 15 in San Diego at the Best Western Seven Seas (the same location as last year). Please see this page for registration information: http://gmod.org/wiki/January_2010_GMOD_Meeting When you go to that page, please take a moment to add suggestions for the agenda. There is no registration fee for this meeting, however there is limited space, so please register early. The proprietors of the Best Western have given us an excellent room rate, and extended it to the previous week, so that people attending the GMOD meeting and the Plant and Animal Genome meeting before it may stay at the Best Western the entire time. Please direct follow up questions to the gmod-devel mailing list: https://lists.sourceforge.net/lists/listinfo/gmod-devel Thanks and I look forward to seeing you in San Diego! Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From j.inoue at ucl.ac.uk Sat Nov 14 14:20:29 2009 From: j.inoue at ucl.ac.uk (Jun Inoue) Date: Sat, 14 Nov 2009 19:20:29 +0000 Subject: [Bioperl-l] Bio::TreeIO, Root-tip branch lengths Message-ID: Dear All, I just started to learn BioPerl for phylogenetics. Usually I am using perl v5.10.0 on my Mac OS 10.5.8. I would like to ask you a hint to calculate the Branch lengths from root to tip for all species in NEWICK TREE format. Please see the following web site. I am explaining what I want to do and showing my easy script (not completed). http://www.geocities.jp/ancientfishtree/BioPerl_BLRootTip.html Thank you for your help. Best, Jun Inoue http://www.geocities.jp/ancientfishtree/index_eng.html From maj at fortinbras.us Sat Nov 14 16:47:37 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sat, 14 Nov 2009 16:47:37 -0500 Subject: [Bioperl-l] Bio::TreeIO, Root-tip branch lengths In-Reply-To: References: Message-ID: <3BC179984D5E49868C4F12D181D82B8D@NewLife> Hi Jun, Some hints: incorporate @leaves = $tree->get_leaf_nodes; and use Bio::Tree::TreeFunctionsI; $distance = $tree->distance( $node_a, $node_b ); cheers, Mark ----- Original Message ----- From: "Jun Inoue" To: Cc: "?? ?" Sent: Saturday, November 14, 2009 2:20 PM Subject: [Bioperl-l] Bio::TreeIO, Root-tip branch lengths > Dear All, > > I just started to learn BioPerl for phylogenetics. > Usually I am using perl v5.10.0 on my Mac OS 10.5.8. > I would like to ask you a hint to calculate the Branch lengths > from root to tip for all species in NEWICK TREE format. > > Please see the following web site. > I am explaining what I want to do and > showing my easy script (not completed). > http://www.geocities.jp/ancientfishtree/BioPerl_BLRootTip.html > > Thank you for your help. > > Best, > Jun Inoue > http://www.geocities.jp/ancientfishtree/index_eng.html > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From jay at jays.net Sun Nov 15 20:23:38 2009 From: jay at jays.net (Jay Hannah) Date: Sun, 15 Nov 2009 19:23:38 -0600 Subject: [Bioperl-l] Bio::Index::GenBank - by organism? In-Reply-To: <12DFD22E-42DC-4626-9873-0DE3EBB5CFBD@illinois.edu> References: <3B01A09C-198E-4691-B807-7ED3250BB81A@jays.net> <12DFD22E-42DC-4626-9873-0DE3EBB5CFBD@illinois.edu> Message-ID: On Nov 9, 2009, at 9:55 PM, Chris Fields wrote: > It should work via id_parser(); from Bio::Index::GenBank: > > $inx->id_parser(\&get_id); > # make the index > $inx->make_index($file_name); > > # here is where the retrieval key is specified > sub get_id { > my $line = shift; > $line =~ /clone="(\S+)"/; > $1; > } This worked great for me today (tackling a different problem than the original). Thanks!! j From veronica.xiaoyu at gmail.com Fri Nov 13 15:35:48 2009 From: veronica.xiaoyu at gmail.com (Xiaoyu Liang) Date: Fri, 13 Nov 2009 15:35:48 -0500 Subject: [Bioperl-l] Bio::Graphics::Panel question Message-ID: Hi, I'm using Bio::Graphics to parse the blast result and generate images. But, sometimes, in the middle of the output image, the hit's color is white, eventhough I set it to other colors. I attached the picture here for an example. This doesn't occur all the time, usually, it works well. I'm wondering if I did something wrong? or depends on the blast result? Thank you, Xiaoyu -------------- next part -------------- A non-text attachment was scrubbed... Name: BLAST_problem.jpg Type: image/jpeg Size: 51888 bytes Desc: not available URL: From ryan_bogard at hms.harvard.edu Sun Nov 15 22:30:22 2009 From: ryan_bogard at hms.harvard.edu (rbogard) Date: Sun, 15 Nov 2009 19:30:22 -0800 (PST) Subject: [Bioperl-l] Problems with bioperl in Mac OS X 10.6 (Perl 5.10.0) Message-ID: <26366421.post@talk.nabble.com> In advance, any advice would be grealy appreciated! I have installed bioperl-588pm via fink but I am having difficulties calling the modules in script. The following is added to .profile (bash): PERL5LIB=/sw/lib/perl5/5.8.8:$PERL5LIB If I change this to /sw/lib/perl5 then I get an @INC error, as use Bio::PERL cannot be located. The environment variables are as follows: MANPATH=/sw/share/man:/usr/share/man:/usr/X11/man:/sw/lib/perl5/5.10.0/man:/usr/X11R6/man:/sw/lib/perl5-core/5.8.8/man:/sw/lib/perl5/5.8.8/man PERL5LIB=/sw/lib/perl5/5.8.8:/sw/lib/perl5:/sw/lib/perl5/darwin:/sw/lib/perl5/5.8.8 PATH=/sw/bin:/sw/sbin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin:/usr/X11R6/bin INFOPATH=/sw/share/info:/sw/info:/usr/share/info This is the perl script I'm attempting to run: #!/sw/bin/perl5.8.8 use strict; use Bio::Perl; $seq_object = get_sequence('swiss',"ROA1_HUMAN"); write_sequence(">roa1.fasta",'fasta',$seq_object); Here is the error output: dyld: lazy symbol binding failed: Symbol not found: _Perl_Tstack_sp_ptr Referenced from: /sw/lib/perl5/5.8.8/darwin-thread-multi-2level/auto/IO/IO.bundle Expected in: dynamic lookup dyld: Symbol not found: _Perl_Tstack_sp_ptr Referenced from: /sw/lib/perl5/5.8.8/darwin-thread-multi-2level/auto/IO/IO.bundle Expected in: dynamic lookup Trace/BPT trap I have looked through many forum postings and attempted the solutions offered in those instances, but none seem to work in my case. I'm not sure if it's because I have perl 5.10.0 installed while attempting to call bioperl 5.8.8; however, others seem to have it working just fine. Thank you, Ryan -- View this message in context: http://old.nabble.com/Problems-with-bioperl-in-Mac-OS-X-10.6-%28Perl-5.10.0%29-tp26366421p26366421.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From e.osimo at gmail.com Mon Nov 16 02:04:40 2009 From: e.osimo at gmail.com (Emanuele Osimo) Date: Mon, 16 Nov 2009 08:04:40 +0100 Subject: [Bioperl-l] Problems with bioperl in Mac OS X 10.6 (Perl 5.10.0) In-Reply-To: <26366421.post@talk.nabble.com> References: <26366421.post@talk.nabble.com> Message-ID: <2ac05d0f0911152304v58985cb5x6ea0501bff7a41ab@mail.gmail.com> Hello Ryan, unfortunately, if you upgraded to 10.6 without formatting, I have to tell you that you'll be in big trouble with perl and with everything you installed from the commandline... Because in the upgrade process everything in the system folders, perl and bioperl being some of these things, is erased without being uninstalled, so you'll find a lot of folders with the same name but no contents. I suggest you, as I did, to format your pc and reinstall 10.6 from scratch. Then youl'll be able to install mysql (I had to install mysql-5.4.3-beta-osx10.5, the only to work on 10.6), and, working with perl 5.10 that is already installed, you'll install bioperl with no effort. Bye Emanuele On Mon, Nov 16, 2009 at 04:30, rbogard wrote: > > In advance, any advice would be grealy appreciated! I have installed > bioperl-588pm via fink but I am having difficulties calling the modules in > script. The following is added to .profile (bash): > PERL5LIB=/sw/lib/perl5/5.8.8:$PERL5LIB > > If I change this to /sw/lib/perl5 then I get an @INC error, as use > Bio::PERL > cannot be located. > > The environment variables are as follows: > > > MANPATH=/sw/share/man:/usr/share/man:/usr/X11/man:/sw/lib/perl5/5.10.0/man:/usr/X11R6/man:/sw/lib/perl5-core/5.8.8/man:/sw/lib/perl5/5.8.8/man > > PERL5LIB=/sw/lib/perl5/5.8.8:/sw/lib/perl5:/sw/lib/perl5/darwin:/sw/lib/perl5/5.8.8 > > PATH=/sw/bin:/sw/sbin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin:/usr/X11R6/bin > INFOPATH=/sw/share/info:/sw/info:/usr/share/info > > > This is the perl script I'm attempting to run: > #!/sw/bin/perl5.8.8 > use strict; > use Bio::Perl; > $seq_object = get_sequence('swiss',"ROA1_HUMAN"); > write_sequence(">roa1.fasta",'fasta',$seq_object); > > Here is the error output: > > dyld: lazy symbol binding failed: Symbol not found: _Perl_Tstack_sp_ptr > Referenced from: > /sw/lib/perl5/5.8.8/darwin-thread-multi-2level/auto/IO/IO.bundle > Expected in: dynamic lookup > > dyld: Symbol not found: _Perl_Tstack_sp_ptr > Referenced from: > /sw/lib/perl5/5.8.8/darwin-thread-multi-2level/auto/IO/IO.bundle > Expected in: dynamic lookup > > Trace/BPT trap > > I have looked through many forum postings and attempted the solutions > offered in those instances, but none seem to work in my case. I'm not sure > if it's because I have perl 5.10.0 installed while attempting to call > bioperl 5.8.8; however, others seem to have it working just fine. > > Thank you, Ryan > -- > View this message in context: > http://old.nabble.com/Problems-with-bioperl-in-Mac-OS-X-10.6-%28Perl-5.10.0%29-tp26366421p26366421.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From ryan_bogard at hms.harvard.edu Mon Nov 16 08:43:19 2009 From: ryan_bogard at hms.harvard.edu (rbogard) Date: Mon, 16 Nov 2009 05:43:19 -0800 (PST) Subject: [Bioperl-l] Problems with bioperl in Mac OS X 10.6 (Perl 5.10.0) In-Reply-To: <2ac05d0f0911152304v58985cb5x6ea0501bff7a41ab@mail.gmail.com> References: <26366421.post@talk.nabble.com> <2ac05d0f0911152304v58985cb5x6ea0501bff7a41ab@mail.gmail.com> Message-ID: <26372079.post@talk.nabble.com> The Mac OS X 10.6 was a fresh install on a new Mac Book Pro. Not sure if I will have the same issues, but it's worth a shot as I have little on my computer and reinstalling to start over wouldn't be too difficult. What method did you use to install bioperl? I used fink and I am not sure the available stable version is the one I need. I will install from the command line this time around, and let you know how it turns out. Thank you! Emanuele Osimo wrote: > > Hello Ryan, > unfortunately, if you upgraded to 10.6 without formatting, I have to tell > you that you'll be in big trouble with perl and with everything you > installed from the commandline... Because in the upgrade process > everything > in the system folders, perl and bioperl being some of these things, is > erased without being uninstalled, so you'll find a lot of folders with the > same name but no contents. > I suggest you, as I did, to format your pc and reinstall 10.6 from > scratch. > Then youl'll be able to install mysql (I had to install > mysql-5.4.3-beta-osx10.5, the only to work on 10.6), and, working with > perl > 5.10 that is already installed, you'll install bioperl with no effort. > Bye > Emanuele > > On Mon, Nov 16, 2009 at 04:30, rbogard > wrote: > >> >> In advance, any advice would be grealy appreciated! I have installed >> bioperl-588pm via fink but I am having difficulties calling the modules >> in >> script. The following is added to .profile (bash): >> PERL5LIB=/sw/lib/perl5/5.8.8:$PERL5LIB >> >> If I change this to /sw/lib/perl5 then I get an @INC error, as use >> Bio::PERL >> cannot be located. >> >> The environment variables are as follows: >> >> >> MANPATH=/sw/share/man:/usr/share/man:/usr/X11/man:/sw/lib/perl5/5.10.0/man:/usr/X11R6/man:/sw/lib/perl5-core/5.8.8/man:/sw/lib/perl5/5.8.8/man >> >> PERL5LIB=/sw/lib/perl5/5.8.8:/sw/lib/perl5:/sw/lib/perl5/darwin:/sw/lib/perl5/5.8.8 >> >> PATH=/sw/bin:/sw/sbin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin:/usr/X11R6/bin >> INFOPATH=/sw/share/info:/sw/info:/usr/share/info >> >> >> This is the perl script I'm attempting to run: >> #!/sw/bin/perl5.8.8 >> use strict; >> use Bio::Perl; >> $seq_object = get_sequence('swiss',"ROA1_HUMAN"); >> write_sequence(">roa1.fasta",'fasta',$seq_object); >> >> Here is the error output: >> >> dyld: lazy symbol binding failed: Symbol not found: _Perl_Tstack_sp_ptr >> Referenced from: >> /sw/lib/perl5/5.8.8/darwin-thread-multi-2level/auto/IO/IO.bundle >> Expected in: dynamic lookup >> >> dyld: Symbol not found: _Perl_Tstack_sp_ptr >> Referenced from: >> /sw/lib/perl5/5.8.8/darwin-thread-multi-2level/auto/IO/IO.bundle >> Expected in: dynamic lookup >> >> Trace/BPT trap >> >> I have looked through many forum postings and attempted the solutions >> offered in those instances, but none seem to work in my case. I'm not >> sure >> if it's because I have perl 5.10.0 installed while attempting to call >> bioperl 5.8.8; however, others seem to have it working just fine. >> >> Thank you, Ryan >> -- >> View this message in context: >> http://old.nabble.com/Problems-with-bioperl-in-Mac-OS-X-10.6-%28Perl-5.10.0%29-tp26366421p26366421.html >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://old.nabble.com/Problems-with-bioperl-in-Mac-OS-X-10.6-%28Perl-5.10.0%29-tp26366421p26372079.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From maj at fortinbras.us Mon Nov 16 08:48:17 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 16 Nov 2009 08:48:17 -0500 Subject: [Bioperl-l] Problems with bioperl in Mac OS X 10.6 (Perl 5.10.0) In-Reply-To: <26372079.post@talk.nabble.com> References: <26366421.post@talk.nabble.com><2ac05d0f0911152304v58985cb5x6ea0501bff7a41ab@mail.gmail.com> <26372079.post@talk.nabble.com> Message-ID: <8D822081B13F49C2A37677D3A47F38B4@NewLife> Ryan, I'm not a mac person, but Koen has said (see http://www.bioperl.org/wiki/Getting_BioPerl#Mac_OS_X_using_fink ) to use the unstable tree to get BioPerl 1.6.1, which is likely to be what you want. cheers Mark ----- Original Message ----- From: "rbogard" To: Sent: Monday, November 16, 2009 8:43 AM Subject: Re: [Bioperl-l] Problems with bioperl in Mac OS X 10.6 (Perl 5.10.0) > > The Mac OS X 10.6 was a fresh install on a new Mac Book Pro. Not sure if I > will have the same issues, but it's worth a shot as I have little on my > computer and reinstalling to start over wouldn't be too difficult. What > method did you use to install bioperl? I used fink and I am not sure the > available stable version is the one I need. I will install from the command > line this time around, and let you know how it turns out. > > Thank you! > > > > Emanuele Osimo wrote: >> >> Hello Ryan, >> unfortunately, if you upgraded to 10.6 without formatting, I have to tell >> you that you'll be in big trouble with perl and with everything you >> installed from the commandline... Because in the upgrade process >> everything >> in the system folders, perl and bioperl being some of these things, is >> erased without being uninstalled, so you'll find a lot of folders with the >> same name but no contents. >> I suggest you, as I did, to format your pc and reinstall 10.6 from >> scratch. >> Then youl'll be able to install mysql (I had to install >> mysql-5.4.3-beta-osx10.5, the only to work on 10.6), and, working with >> perl >> 5.10 that is already installed, you'll install bioperl with no effort. >> Bye >> Emanuele >> >> On Mon, Nov 16, 2009 at 04:30, rbogard >> wrote: >> >>> >>> In advance, any advice would be grealy appreciated! I have installed >>> bioperl-588pm via fink but I am having difficulties calling the modules >>> in >>> script. The following is added to .profile (bash): >>> PERL5LIB=/sw/lib/perl5/5.8.8:$PERL5LIB >>> >>> If I change this to /sw/lib/perl5 then I get an @INC error, as use >>> Bio::PERL >>> cannot be located. >>> >>> The environment variables are as follows: >>> >>> >>> MANPATH=/sw/share/man:/usr/share/man:/usr/X11/man:/sw/lib/perl5/5.10.0/man:/usr/X11R6/man:/sw/lib/perl5-core/5.8.8/man:/sw/lib/perl5/5.8.8/man >>> >>> PERL5LIB=/sw/lib/perl5/5.8.8:/sw/lib/perl5:/sw/lib/perl5/darwin:/sw/lib/perl5/5.8.8 >>> >>> PATH=/sw/bin:/sw/sbin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin:/usr/X11R6/bin >>> INFOPATH=/sw/share/info:/sw/info:/usr/share/info >>> >>> >>> This is the perl script I'm attempting to run: >>> #!/sw/bin/perl5.8.8 >>> use strict; >>> use Bio::Perl; >>> $seq_object = get_sequence('swiss',"ROA1_HUMAN"); >>> write_sequence(">roa1.fasta",'fasta',$seq_object); >>> >>> Here is the error output: >>> >>> dyld: lazy symbol binding failed: Symbol not found: _Perl_Tstack_sp_ptr >>> Referenced from: >>> /sw/lib/perl5/5.8.8/darwin-thread-multi-2level/auto/IO/IO.bundle >>> Expected in: dynamic lookup >>> >>> dyld: Symbol not found: _Perl_Tstack_sp_ptr >>> Referenced from: >>> /sw/lib/perl5/5.8.8/darwin-thread-multi-2level/auto/IO/IO.bundle >>> Expected in: dynamic lookup >>> >>> Trace/BPT trap >>> >>> I have looked through many forum postings and attempted the solutions >>> offered in those instances, but none seem to work in my case. I'm not >>> sure >>> if it's because I have perl 5.10.0 installed while attempting to call >>> bioperl 5.8.8; however, others seem to have it working just fine. >>> >>> Thank you, Ryan >>> -- >>> View this message in context: >>> http://old.nabble.com/Problems-with-bioperl-in-Mac-OS-X-10.6-%28Perl-5.10.0%29-tp26366421p26366421.html >>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > -- > View this message in context: > http://old.nabble.com/Problems-with-bioperl-in-Mac-OS-X-10.6-%28Perl-5.10.0%29-tp26366421p26372079.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Mon Nov 16 10:00:09 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 16 Nov 2009 09:00:09 -0600 Subject: [Bioperl-l] Problems with bioperl in Mac OS X 10.6 (Perl 5.10.0) In-Reply-To: <2ac05d0f0911152304v58985cb5x6ea0501bff7a41ab@mail.gmail.com> References: <26366421.post@talk.nabble.com> <2ac05d0f0911152304v58985cb5x6ea0501bff7a41ab@mail.gmail.com> Message-ID: <49681E01-E95D-4FC6-AE42-6E57ED43AAA2@illinois.edu> On Nov 16, 2009, at 1:04 AM, Emanuele Osimo wrote: > Hello Ryan, > unfortunately, if you upgraded to 10.6 without formatting, I have to tell > you that you'll be in big trouble with perl and with everything you > installed from the commandline... Because in the upgrade process everything > in the system folders, perl and bioperl being some of these things, is > erased without being uninstalled, so you'll find a lot of folders with the > same name but no contents. > I suggest you, as I did, to format your pc and reinstall 10.6 from scratch. > Then youl'll be able to install mysql (I had to install > mysql-5.4.3-beta-osx10.5, the only to work on 10.6), and, working with perl > 5.10 that is already installed, you'll install bioperl with no effort. > Bye > Emanuele Just starting from scratch isn't always the best solution (though it is the cleanest). In this case I don't think anything you mention applies, as there are conflicting symbols being reported. My guess is conflicting perl builds, probably between your system 5.10.0 (snow leopard) and your fink-installed perl 5.8.8 (they are binary incompatible). Also, remember that snow leopard is primarily 64-bit, so it might be best to try working out whether your fink is attempting to compile 64- vs 32-bit. In this case, I would just uninstall the fink-based perl and either use the system one (snow leopard = 5.10.0), or roll your own and install 5.10.1 locally or in /usr/local. Do NOT replace the system one, as that will likely break your OS. In my experience, and not to bash on fink or MacPorts, I never had much luck with their perl installs. Unless I plan on only using fink or macports for my OS (not likely in my case), I find they tend to cause problems in the long term unless one uses them to install packages with very few dependencies, and even then you need to make sure fink is configure to compile the correct binary. For instance, they're fairly good for gd, libxml2, etc., but beyond that one may get into issues with odd, version-specific dependencies with some packages, such as relying on perl 5.8.8 (but not perl 5.10.x), db42 (instead of db44), etc. I've ended up in the past with 2-3 different perl versions, berkeley db versions, etc. chris > On Mon, Nov 16, 2009 at 04:30, rbogard wrote: > >> >> In advance, any advice would be grealy appreciated! I have installed >> bioperl-588pm via fink but I am having difficulties calling the modules in >> script. The following is added to .profile (bash): >> PERL5LIB=/sw/lib/perl5/5.8.8:$PERL5LIB >> >> If I change this to /sw/lib/perl5 then I get an @INC error, as use >> Bio::PERL >> cannot be located. >> >> The environment variables are as follows: >> >> >> MANPATH=/sw/share/man:/usr/share/man:/usr/X11/man:/sw/lib/perl5/5.10.0/man:/usr/X11R6/man:/sw/lib/perl5-core/5.8.8/man:/sw/lib/perl5/5.8.8/man >> >> PERL5LIB=/sw/lib/perl5/5.8.8:/sw/lib/perl5:/sw/lib/perl5/darwin:/sw/lib/perl5/5.8.8 >> >> PATH=/sw/bin:/sw/sbin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin:/usr/X11R6/bin >> INFOPATH=/sw/share/info:/sw/info:/usr/share/info >> >> >> This is the perl script I'm attempting to run: >> #!/sw/bin/perl5.8.8 >> use strict; >> use Bio::Perl; >> $seq_object = get_sequence('swiss',"ROA1_HUMAN"); >> write_sequence(">roa1.fasta",'fasta',$seq_object); >> >> Here is the error output: >> >> dyld: lazy symbol binding failed: Symbol not found: _Perl_Tstack_sp_ptr >> Referenced from: >> /sw/lib/perl5/5.8.8/darwin-thread-multi-2level/auto/IO/IO.bundle >> Expected in: dynamic lookup >> >> dyld: Symbol not found: _Perl_Tstack_sp_ptr >> Referenced from: >> /sw/lib/perl5/5.8.8/darwin-thread-multi-2level/auto/IO/IO.bundle >> Expected in: dynamic lookup >> >> Trace/BPT trap >> >> I have looked through many forum postings and attempted the solutions >> offered in those instances, but none seem to work in my case. I'm not sure >> if it's because I have perl 5.10.0 installed while attempting to call >> bioperl 5.8.8; however, others seem to have it working just fine. >> >> Thank you, Ryan >> -- >> View this message in context: >> http://old.nabble.com/Problems-with-bioperl-in-Mac-OS-X-10.6-%28Perl-5.10.0%29-tp26366421p26366421.html >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Nov 16 10:01:01 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 16 Nov 2009 09:01:01 -0600 Subject: [Bioperl-l] Problems with bioperl in Mac OS X 10.6 (Perl 5.10.0) In-Reply-To: <8D822081B13F49C2A37677D3A47F38B4@NewLife> References: <26366421.post@talk.nabble.com><2ac05d0f0911152304v58985cb5x6ea0501bff7a41ab@mail.gmail.com> <26372079.post@talk.nabble.com> <8D822081B13F49C2A37677D3A47F38B4@NewLife> Message-ID: <58912861-CD59-4AFC-8F30-B0AA2E77AECB@illinois.edu> Actually, why not just install via CPAN? Any particular reason? chris On Nov 16, 2009, at 7:48 AM, Mark A. Jensen wrote: > Ryan, > I'm not a mac person, but Koen has said (see http://www.bioperl.org/wiki/Getting_BioPerl#Mac_OS_X_using_fink ) > to use the unstable tree to get BioPerl 1.6.1, which is likely to be what you want. > cheers > Mark > ----- Original Message ----- From: "rbogard" > To: > Sent: Monday, November 16, 2009 8:43 AM > Subject: Re: [Bioperl-l] Problems with bioperl in Mac OS X 10.6 (Perl 5.10.0) > > >> >> The Mac OS X 10.6 was a fresh install on a new Mac Book Pro. Not sure if I >> will have the same issues, but it's worth a shot as I have little on my >> computer and reinstalling to start over wouldn't be too difficult. What >> method did you use to install bioperl? I used fink and I am not sure the >> available stable version is the one I need. I will install from the command >> line this time around, and let you know how it turns out. >> >> Thank you! >> >> >> >> Emanuele Osimo wrote: >>> >>> Hello Ryan, >>> unfortunately, if you upgraded to 10.6 without formatting, I have to tell >>> you that you'll be in big trouble with perl and with everything you >>> installed from the commandline... Because in the upgrade process >>> everything >>> in the system folders, perl and bioperl being some of these things, is >>> erased without being uninstalled, so you'll find a lot of folders with the >>> same name but no contents. >>> I suggest you, as I did, to format your pc and reinstall 10.6 from >>> scratch. >>> Then youl'll be able to install mysql (I had to install >>> mysql-5.4.3-beta-osx10.5, the only to work on 10.6), and, working with >>> perl >>> 5.10 that is already installed, you'll install bioperl with no effort. >>> Bye >>> Emanuele >>> >>> On Mon, Nov 16, 2009 at 04:30, rbogard >>> wrote: >>> >>>> >>>> In advance, any advice would be grealy appreciated! I have installed >>>> bioperl-588pm via fink but I am having difficulties calling the modules >>>> in >>>> script. The following is added to .profile (bash): >>>> PERL5LIB=/sw/lib/perl5/5.8.8:$PERL5LIB >>>> >>>> If I change this to /sw/lib/perl5 then I get an @INC error, as use >>>> Bio::PERL >>>> cannot be located. >>>> >>>> The environment variables are as follows: >>>> >>>> >>>> MANPATH=/sw/share/man:/usr/share/man:/usr/X11/man:/sw/lib/perl5/5.10.0/man:/usr/X11R6/man:/sw/lib/perl5-core/5.8.8/man:/sw/lib/perl5/5.8.8/man >>>> >>>> PERL5LIB=/sw/lib/perl5/5.8.8:/sw/lib/perl5:/sw/lib/perl5/darwin:/sw/lib/perl5/5.8.8 >>>> >>>> PATH=/sw/bin:/sw/sbin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin:/usr/X11R6/bin >>>> INFOPATH=/sw/share/info:/sw/info:/usr/share/info >>>> >>>> >>>> This is the perl script I'm attempting to run: >>>> #!/sw/bin/perl5.8.8 >>>> use strict; >>>> use Bio::Perl; >>>> $seq_object = get_sequence('swiss',"ROA1_HUMAN"); >>>> write_sequence(">roa1.fasta",'fasta',$seq_object); >>>> >>>> Here is the error output: >>>> >>>> dyld: lazy symbol binding failed: Symbol not found: _Perl_Tstack_sp_ptr >>>> Referenced from: >>>> /sw/lib/perl5/5.8.8/darwin-thread-multi-2level/auto/IO/IO.bundle >>>> Expected in: dynamic lookup >>>> >>>> dyld: Symbol not found: _Perl_Tstack_sp_ptr >>>> Referenced from: >>>> /sw/lib/perl5/5.8.8/darwin-thread-multi-2level/auto/IO/IO.bundle >>>> Expected in: dynamic lookup >>>> >>>> Trace/BPT trap >>>> >>>> I have looked through many forum postings and attempted the solutions >>>> offered in those instances, but none seem to work in my case. I'm not >>>> sure >>>> if it's because I have perl 5.10.0 installed while attempting to call >>>> bioperl 5.8.8; however, others seem to have it working just fine. >>>> >>>> Thank you, Ryan >>>> -- >>>> View this message in context: >>>> http://old.nabble.com/Problems-with-bioperl-in-Mac-OS-X-10.6-%28Perl-5.10.0%29-tp26366421p26366421.html >>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> -- >> View this message in context: http://old.nabble.com/Problems-with-bioperl-in-Mac-OS-X-10.6-%28Perl-5.10.0%29-tp26366421p26372079.html >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Kevin.M.Brown at asu.edu Mon Nov 16 10:49:13 2009 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Mon, 16 Nov 2009 08:49:13 -0700 Subject: [Bioperl-l] Bio::Graphics::Panel question In-Reply-To: References: Message-ID: <1A4207F8295607498283FE9E93B775B40663EDB9@EX02.asurite.ad.asu.edu> To really be able to tell if this was a bug, I (and probably the real devs) would need to see that part of your code and the Blast file that is having this issue as it could be your callback for color choice vs the blast object (e.g. your color picker is missing an option that the data comes in with and so returns with a blank value). -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Xiaoyu Liang Sent: Friday, November 13, 2009 1:36 PM To: Bioperl-l at lists.open-bio.org Subject: [Bioperl-l] Bio::Graphics::Panel question Hi, I'm using Bio::Graphics to parse the blast result and generate images. But, sometimes, in the middle of the output image, the hit's color is white, eventhough I set it to other colors. I attached the picture here for an example. This doesn't occur all the time, usually, it works well. I'm wondering if I did something wrong? or depends on the blast result? Thank you, Xiaoyu From ryan_bogard at hms.harvard.edu Mon Nov 16 11:57:16 2009 From: ryan_bogard at hms.harvard.edu (rbogard) Date: Mon, 16 Nov 2009 08:57:16 -0800 (PST) Subject: [Bioperl-l] Problems with bioperl in Mac OS X 10.6 (Perl 5.10.0) In-Reply-To: <58912861-CD59-4AFC-8F30-B0AA2E77AECB@illinois.edu> References: <26366421.post@talk.nabble.com> <2ac05d0f0911152304v58985cb5x6ea0501bff7a41ab@mail.gmail.com> <26372079.post@talk.nabble.com> <8D822081B13F49C2A37677D3A47F38B4@NewLife> <58912861-CD59-4AFC-8F30-B0AA2E77AECB@illinois.edu> Message-ID: <26375418.post@talk.nabble.com> I read that posting by Koen and used the unstable tree after the first attempt; however, the errors still persisted. I just finished a fresh install and I will just follow Mr. Fields advice and use CPAN. Thank you all for the help! Chris Fields-5 wrote: > > Actually, why not just install via CPAN? Any particular reason? > > chris > > On Nov 16, 2009, at 7:48 AM, Mark A. Jensen wrote: > >> Ryan, >> I'm not a mac person, but Koen has said (see >> http://www.bioperl.org/wiki/Getting_BioPerl#Mac_OS_X_using_fink ) >> to use the unstable tree to get BioPerl 1.6.1, which is likely to be what >> you want. >> cheers >> Mark >> ----- Original Message ----- From: "rbogard" >> >> To: >> Sent: Monday, November 16, 2009 8:43 AM >> Subject: Re: [Bioperl-l] Problems with bioperl in Mac OS X 10.6 (Perl >> 5.10.0) >> >> >>> >>> The Mac OS X 10.6 was a fresh install on a new Mac Book Pro. Not sure if >>> I >>> will have the same issues, but it's worth a shot as I have little on my >>> computer and reinstalling to start over wouldn't be too difficult. What >>> method did you use to install bioperl? I used fink and I am not sure the >>> available stable version is the one I need. I will install from the >>> command >>> line this time around, and let you know how it turns out. >>> >>> Thank you! >>> >>> >>> >>> Emanuele Osimo wrote: >>>> >>>> Hello Ryan, >>>> unfortunately, if you upgraded to 10.6 without formatting, I have to >>>> tell >>>> you that you'll be in big trouble with perl and with everything you >>>> installed from the commandline... Because in the upgrade process >>>> everything >>>> in the system folders, perl and bioperl being some of these things, is >>>> erased without being uninstalled, so you'll find a lot of folders with >>>> the >>>> same name but no contents. >>>> I suggest you, as I did, to format your pc and reinstall 10.6 from >>>> scratch. >>>> Then youl'll be able to install mysql (I had to install >>>> mysql-5.4.3-beta-osx10.5, the only to work on 10.6), and, working with >>>> perl >>>> 5.10 that is already installed, you'll install bioperl with no effort. >>>> Bye >>>> Emanuele >>>> >>>> On Mon, Nov 16, 2009 at 04:30, rbogard >>>> wrote: >>>> >>>>> >>>>> In advance, any advice would be grealy appreciated! I have installed >>>>> bioperl-588pm via fink but I am having difficulties calling the >>>>> modules >>>>> in >>>>> script. The following is added to .profile (bash): >>>>> PERL5LIB=/sw/lib/perl5/5.8.8:$PERL5LIB >>>>> >>>>> If I change this to /sw/lib/perl5 then I get an @INC error, as use >>>>> Bio::PERL >>>>> cannot be located. >>>>> >>>>> The environment variables are as follows: >>>>> >>>>> >>>>> MANPATH=/sw/share/man:/usr/share/man:/usr/X11/man:/sw/lib/perl5/5.10.0/man:/usr/X11R6/man:/sw/lib/perl5-core/5.8.8/man:/sw/lib/perl5/5.8.8/man >>>>> >>>>> PERL5LIB=/sw/lib/perl5/5.8.8:/sw/lib/perl5:/sw/lib/perl5/darwin:/sw/lib/perl5/5.8.8 >>>>> >>>>> PATH=/sw/bin:/sw/sbin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin:/usr/X11R6/bin >>>>> INFOPATH=/sw/share/info:/sw/info:/usr/share/info >>>>> >>>>> >>>>> This is the perl script I'm attempting to run: >>>>> #!/sw/bin/perl5.8.8 >>>>> use strict; >>>>> use Bio::Perl; >>>>> $seq_object = get_sequence('swiss',"ROA1_HUMAN"); >>>>> write_sequence(">roa1.fasta",'fasta',$seq_object); >>>>> >>>>> Here is the error output: >>>>> >>>>> dyld: lazy symbol binding failed: Symbol not found: >>>>> _Perl_Tstack_sp_ptr >>>>> Referenced from: >>>>> /sw/lib/perl5/5.8.8/darwin-thread-multi-2level/auto/IO/IO.bundle >>>>> Expected in: dynamic lookup >>>>> >>>>> dyld: Symbol not found: _Perl_Tstack_sp_ptr >>>>> Referenced from: >>>>> /sw/lib/perl5/5.8.8/darwin-thread-multi-2level/auto/IO/IO.bundle >>>>> Expected in: dynamic lookup >>>>> >>>>> Trace/BPT trap >>>>> >>>>> I have looked through many forum postings and attempted the solutions >>>>> offered in those instances, but none seem to work in my case. I'm not >>>>> sure >>>>> if it's because I have perl 5.10.0 installed while attempting to call >>>>> bioperl 5.8.8; however, others seem to have it working just fine. >>>>> >>>>> Thank you, Ryan >>>>> -- >>>>> View this message in context: >>>>> http://old.nabble.com/Problems-with-bioperl-in-Mac-OS-X-10.6-%28Perl-5.10.0%29-tp26366421p26366421.html >>>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>> >>> -- >>> View this message in context: >>> http://old.nabble.com/Problems-with-bioperl-in-Mac-OS-X-10.6-%28Perl-5.10.0%29-tp26366421p26372079.html >>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://old.nabble.com/Problems-with-bioperl-in-Mac-OS-X-10.6-%28Perl-5.10.0%29-tp26366421p26375418.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From krishna.aneesh at gmail.com Mon Nov 16 02:00:15 2009 From: krishna.aneesh at gmail.com (Aneesh K) Date: Mon, 16 Nov 2009 12:30:15 +0530 Subject: [Bioperl-l] Regarding Bio::TreeIO Object Message-ID: <9cb9dfd70911152300y34789f88qc69dd14bf505f57d@mail.gmail.com> Hi, I just started to use Bioperl modules. It's really useful and interesting. Now I have in stuck with "Tree objects and phylogenetic trees". I couldn't get any documentation/examples about reading/parsing phylip tree files. Please tell me from where I can get some sample codes for this. Waiting for your reply. Thanks Aneesh.K Mob. 09646181517 From David.Messina at sbc.su.se Mon Nov 16 12:33:36 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 16 Nov 2009 18:33:36 +0100 Subject: [Bioperl-l] highest PAML version supported? In-Reply-To: References: <628aabb70911121120w4c609056v50204b9bd9e5c3fb@mail.gmail.com> Message-ID: Hi everyone, I just committed support for parsing codeml 4.3a (August 2009) to bioperl-live. I added new tests and all PAML-related tests pass, but please report any problems you have to the list. Note that I haven't tested the other PAML 4.3a executables to see if there are format changes with those. If you get the chance to try any and it doesn't work, let me know and I'll try to add support for them. (Note that these changes are only to the PAML parsing code; Bio::Tools::Run already appears to handle 4.3a just fine.) Dave From jason at bioperl.org Mon Nov 16 12:34:57 2009 From: jason at bioperl.org (Jason Stajich) Date: Mon, 16 Nov 2009 09:34:57 -0800 Subject: [Bioperl-l] Regarding Bio::TreeIO Object In-Reply-To: <9cb9dfd70911152300y34789f88qc69dd14bf505f57d@mail.gmail.com> References: <9cb9dfd70911152300y34789f88qc69dd14bf505f57d@mail.gmail.com> Message-ID: Is this at all helpful to your questions. http://www.bioperl.org/wiki/HOWTO:Trees The trees are in 'newick' or new hampshire format though I don't think there is a phylip format for trees. -jason On Nov 15, 2009, at 11:00 PM, Aneesh K wrote: > Hi, > > I just started to use Bioperl modules. It's really useful and > interesting. > Now I have in stuck with "Tree objects and phylogenetic trees". > I couldn't get any documentation/examples about reading/parsing > phylip tree > files. > > Please tell me from where I can get some sample codes for this. > > Waiting for your reply. > > Thanks > Aneesh.K > Mob. 09646181517 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From roy.chaudhuri at gmail.com Mon Nov 16 12:31:49 2009 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Mon, 16 Nov 2009 17:31:49 +0000 Subject: [Bioperl-l] Regarding Bio::TreeIO Object In-Reply-To: <9cb9dfd70911152300y34789f88qc69dd14bf505f57d@mail.gmail.com> References: <9cb9dfd70911152300y34789f88qc69dd14bf505f57d@mail.gmail.com> Message-ID: <4B018C85.6020801@gmail.com> Hi Aneesh, See the Bioperl trees howto: http://www.bioperl.org/wiki/HOWTO:Trees Roy. Aneesh K wrote: > Hi, > > I just started to use Bioperl modules. It's really useful and interesting. > Now I have in stuck with "Tree objects and phylogenetic trees". > I couldn't get any documentation/examples about reading/parsing phylip tree > files. > > Please tell me from where I can get some sample codes for this. > > Waiting for your reply. > > Thanks > Aneesh.K > Mob. 09646181517 -- Dr. Roy Chaudhuri Department of Veterinary Medicine University of Cambridge, U.K. From Kevin.M.Brown at asu.edu Mon Nov 16 13:22:07 2009 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Mon, 16 Nov 2009 11:22:07 -0700 Subject: [Bioperl-l] FW: Bio::Graphics::Panel question Message-ID: <1A4207F8295607498283FE9E93B775B40663EE37@EX02.asurite.ad.asu.edu> Please keep your responses on the list for more timely help. Kevin Brown Center for Innovations in Medicine Biodesign Institute Arizona State University ________________________________ From: Xiaoyu Liang [mailto:veronica.xiaoyu at gmail.com] Sent: Monday, November 16, 2009 9:34 AM To: Kevin Brown Subject: Re: [Bioperl-l] Bio::Graphics::Panel question Hi Kevin, Thank you for ur quick response. I attached the BLAST .out file here. And the follow is my code part. I have an array keeping the color for each hit, and I printed it out the array, there is no missing. my $track = $panel->add_track( -glyph => 'graded_segments', -label => 1, -connector => 'dashed', -font2color => 'red', -sort_order => 'high_score', -description => sub { $feature = shift; #print "--".$feature."\n"; return unless $feature->has_tag('description'); my ($description) = $feature->each_tag_value('description'); my ($id) = $feature->display_name; my @records= split(/\|/,$description); my $score = $feature->score; #print $id.":".$score."\n"; if($score >=200){ push (@color_array,1); }elsif($score >=80){ push (@color_array,2); }elsif($score >=50){ push (@color_array,3); }elsif($score >= 40){ push (@color_array,4); }else{ push (@color_array,5); } if($type == 1){ "Species:Arabidopsis TF Family:$records[1] Score=$score"; }elsif($type == 2){ if(scalar(@records)==5){ "Species:$records[1] TF Family:$records[2] Accepted Name:$records[3] Score=$score"; }else{ "Species:$records[1] TF Family:$records[2] Score=$score"; } }else{ ""; } }, -bgcolor => sub{ return unless $feature->has_tag('description'); if($color_array[$index] == 1 ){ $color = 'red'; } if($color_array[$index]== 2){ $color = 'orange'; } if($color_array[$index]== 3){ $color = 'green'; } if($color_array[$index]== 4){ $color = 'blue'; } if($color_array[$index]== 5){ $color = 'black'; } #if ($index == 20){ # $color = 'black'; #} #print $index."--".$color_array[$index]."\n"; $index++; #print $feature."\n"; #print $feature->display_name."\n"; return $color; }, ); Best regrads, Xiaoyu On Mon, Nov 16, 2009 at 10:49 AM, Kevin Brown wrote: To really be able to tell if this was a bug, I (and probably the real devs) would need to see that part of your code and the Blast file that is having this issue as it could be your callback for color choice vs the blast object (e.g. your color picker is missing an option that the data comes in with and so returns with a blank value). -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Xiaoyu Liang Sent: Friday, November 13, 2009 1:36 PM To: Bioperl-l at lists.open-bio.org Subject: [Bioperl-l] Bio::Graphics::Panel question Hi, I'm using Bio::Graphics to parse the blast result and generate images. But, sometimes, in the middle of the output image, the hit's color is white, eventhough I set it to other colors. I attached the picture here for an example. This doesn't occur all the time, usually, it works well. I'm wondering if I did something wrong? or depends on the blast result? Thank you, Xiaoyu _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l -------------- next part -------------- A non-text attachment was scrubbed... Name: 1258388779.out Type: application/octet-stream Size: 32599 bytes Desc: 1258388779.out URL: From paolo.pavan at gmail.com Mon Nov 16 14:06:06 2009 From: paolo.pavan at gmail.com (Paolo Pavan) Date: Mon, 16 Nov 2009 20:06:06 +0100 Subject: [Bioperl-l] bioperl-ext installation issue Message-ID: <56be91b60911161106w69e20fd9k133a465e8d4f8a3f@mail.gmail.com> Hi everybody, I have problems installing the bioperl-ext package, any help is much appreciated. 1) - I start trying with cpan i /bioperl-ext/ the only resource available is /B/BI/BIRNEY/bioperl-ext-1.4 (is it ok?) - I install Inline::MakeMaker and Inline::C then - i/BIRNEY/bioperl-ext-1.4/ fails bacause I don't have staden package 2) I try to install io_lib-1.8.10.tar as suggested by the README ( ftp://ftp.mrc-lmb.cam.ac.uk/pub/staden/io_lib/), installation fails after: ... gcc -g -O2 -o makeSCF makeSCF.o ../read/.libs/libread.a -lz -lm ../read/.libs/libread.a(compress.o): In function `fopen_compressed': /root/Download/staden/io_lib-1.8.10/utils/compress.c:321: warning: the use of `tempnam' is dangerous, better use `mkstemp' gcc -DHAVE_CONFIG_H -I. -I. -I.. -I.. -I../read -I../alf -I../abi -I../ctf -I../ztr -I../plain -I../scf -I../exp_file -I../utils -I/usr/local/include -g -O2 -c -o extract_seq.o `test -f extract_seq.c || echo './'`extract_seq.c /bin/sh ../libtool --mode=link gcc -g -O2 -o extract_seq extract_seq.o ../read/libread.la gcc -g -O2 -o extract_seq extract_seq.o ../read/.libs/libread.a -lz -lm ../read/.libs/libread.a(compress.o): In function `fopen_compressed': /root/Download/staden/io_lib-1.8.10/utils/compress.c:321: warning: the use of `tempnam' is dangerous, better use `mkstemp' gcc -DHAVE_CONFIG_H -I. -I. -I.. -I.. -I../read -I../alf -I../abi -I../ctf -I../ztr -I../plain -I../scf -I../exp_file -I../utils -I/usr/local/include -g -O2 -c -o index_tar.o `test -f index_tar.c || echo './'`index_tar.c index_tar.c: In function ?main?: index_tar.c:12: error: two or more data types in declaration specifiers make[2]: *** [index_tar.o] Error 1 make[2]: Leaving directory `/home/root/Download/staden/io_lib-1.8.10/progs' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/home/root/Download/staden/io_lib-1.8.10' make: *** [all-recursive-am] Error 2 3) I give up staden, because I actually need pSW, and try to install from Makefile.PL in Bio/Ext/Align but installation fails after: ... Align.xs:18: warning: ?not_here? defined but not used Running Mkbootstrap for Bio::Ext::Align () chmod 644 Align.bs rm -f ../blib/arch/auto/Bio/Ext/Align/Align.so gcc -shared -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic Align.o -o ../blib/arch/auto/Bio/Ext/Align/Align.so libs/libsw.a \ -lm \ /usr/bin/ld: libs/libsw.a(aln.o): relocation R_X86_64_32 against `a local symbol' can not be used when making a shared object; recompile with -fPIC libs/libsw.a: could not read symbols: Bad value collect2: ld returned 1 exit status make[1]: *** [../blib/arch/auto/Bio/Ext/Align/Align.so] Error 1 make[1]: Leaving directory `/home/root/.cpan/sources/authors/id/B/BI/BIRNEY/bioperl-ext-1.4/Bio/Ext/Align' make: *** [subdirs] Error 2 I have also made some other tries such force install Bio::Ext:Align without success but I'm sure I miss something trivial that I can't catch. Can someone help me? Thank you, Paolo From lincoln.stein at gmail.com Mon Nov 16 15:08:20 2009 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Mon, 16 Nov 2009 15:08:20 -0500 Subject: [Bioperl-l] FW: Bio::Graphics::Panel question In-Reply-To: <1A4207F8295607498283FE9E93B775B40663EE37@EX02.asurite.ad.asu.edu> References: <1A4207F8295607498283FE9E93B775B40663EE37@EX02.asurite.ad.asu.edu> Message-ID: <6dce9a0b0911161208q2f826d83s319184f0cacca097@mail.gmail.com> Hi, I think you should modify your color selection code as follows: if($color_array[$index] == 1 ){ $color = 'red'; } elsif($color_array[$index]== 2){ $color = 'orange'; } elsif($color_array[$index]== 3){ $color = 'green'; } elsif($color_array[$index]== 4){ $color = 'blue'; } elsif($color_array[$index]== 5){ $color = 'black'; } else { die "unexpected color array value $color_array[$index]" } Lincoln On Mon, Nov 16, 2009 at 1:22 PM, Kevin Brown wrote: > Please keep your responses on the list for more timely help. > > > Kevin Brown > Center for Innovations in Medicine > Biodesign Institute > Arizona State University > > > > ________________________________ > > From: Xiaoyu Liang [mailto:veronica.xiaoyu at gmail.com] > Sent: Monday, November 16, 2009 9:34 AM > To: Kevin Brown > Subject: Re: [Bioperl-l] Bio::Graphics::Panel question > > > Hi Kevin, > > Thank you for ur quick response. I attached the BLAST .out file here. > And the follow is my code part. I have an array keeping the color for > each hit, and I printed it out the array, there is no missing. > > my $track = $panel->add_track( > -glyph => 'graded_segments', > -label => 1, > -connector => 'dashed', > -font2color => 'red', > -sort_order => 'high_score', > -description => sub { > $feature = shift; > #print "--".$feature."\n"; > return unless > $feature->has_tag('description'); > my ($description) = > $feature->each_tag_value('description'); > my ($id) = $feature->display_name; > my @records= split(/\|/,$description); > my $score = $feature->score; > #print $id.":".$score."\n"; > if($score >=200){ > push (@color_array,1); > }elsif($score >=80){ > push (@color_array,2); > }elsif($score >=50){ > push (@color_array,3); > }elsif($score >= 40){ > push (@color_array,4); > }else{ > push (@color_array,5); > } > > if($type == 1){ > "Species:Arabidopsis TF > Family:$records[1] Score=$score"; > }elsif($type == 2){ > if(scalar(@records)==5){ > "Species:$records[1] TF > Family:$records[2] Accepted Name:$records[3] Score=$score"; > }else{ > "Species:$records[1] TF > Family:$records[2] Score=$score"; > } > }else{ > ""; > } > }, > -bgcolor => sub{ > return unless > $feature->has_tag('description'); > if($color_array[$index] == 1 ){ > $color = 'red'; > } > if($color_array[$index]== 2){ > $color = 'orange'; > } > if($color_array[$index]== 3){ > $color = 'green'; > } > if($color_array[$index]== 4){ > $color = 'blue'; > } > if($color_array[$index]== 5){ > $color = 'black'; > } > #if ($index == 20){ > # $color = 'black'; > #} > #print > $index."--".$color_array[$index]."\n"; > $index++; > > #print $feature."\n"; > #print > $feature->display_name."\n"; > return $color; > }, > ); > > > Best regrads, > Xiaoyu > > > On Mon, Nov 16, 2009 at 10:49 AM, Kevin Brown > wrote: > > > To really be able to tell if this was a bug, I (and probably the > real > devs) would need to see that part of your code and the Blast > file that > is having this issue as it could be your callback for color > choice vs > the blast object (e.g. your color picker is missing an option > that the > data comes in with and so returns with a blank value). > > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Xiaoyu Liang > Sent: Friday, November 13, 2009 1:36 PM > To: Bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Bio::Graphics::Panel question > > Hi, > > I'm using Bio::Graphics to parse the blast result and generate > images. > But, sometimes, in the middle of the output image, the hit's > color is > white, eventhough I set it to other colors. I attached the > picture here > for an example. This doesn't occur all the time, usually, it > works well. > I'm wondering if I did something wrong? or depends on the blast > result? > > Thank you, > Xiaoyu > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa From ryan_bogard at hms.harvard.edu Mon Nov 16 16:44:25 2009 From: ryan_bogard at hms.harvard.edu (rbogard) Date: Mon, 16 Nov 2009 13:44:25 -0800 (PST) Subject: [Bioperl-l] Problems with bioperl in Mac OS X 10.6 (Perl 5.10.0) In-Reply-To: <26366421.post@talk.nabble.com> References: <26366421.post@talk.nabble.com> Message-ID: <26379710.post@talk.nabble.com> Thank you all for your help! I was able to get bioperl working via manual download and install. It was a combination of permissions issues and X86_64 vs. X86_32 compatibility issues. Using fink to download and install seems to have given me a combination of 32 and 64 associated files (I probably did something wrong in config). rbogard wrote: > > In advance, any advice would be grealy appreciated! I have installed > bioperl-588pm via fink but I am having difficulties calling the modules in > script. The following is added to .profile (bash): > PERL5LIB=/sw/lib/perl5/5.8.8:$PERL5LIB > > If I change this to /sw/lib/perl5 then I get an @INC error, as use > Bio::PERL cannot be located. > > The environment variables are as follows: > > MANPATH=/sw/share/man:/usr/share/man:/usr/X11/man:/sw/lib/perl5/5.10.0/man:/usr/X11R6/man:/sw/lib/perl5-core/5.8.8/man:/sw/lib/perl5/5.8.8/man > PERL5LIB=/sw/lib/perl5/5.8.8:/sw/lib/perl5:/sw/lib/perl5/darwin:/sw/lib/perl5/5.8.8 > PATH=/sw/bin:/sw/sbin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin:/usr/X11R6/bin > INFOPATH=/sw/share/info:/sw/info:/usr/share/info > > > This is the perl script I'm attempting to run: > #!/sw/bin/perl5.8.8 > use strict; > use Bio::Perl; > $seq_object = get_sequence('swiss',"ROA1_HUMAN"); > write_sequence(">roa1.fasta",'fasta',$seq_object); > > Here is the error output: > > dyld: lazy symbol binding failed: Symbol not found: _Perl_Tstack_sp_ptr > Referenced from: > /sw/lib/perl5/5.8.8/darwin-thread-multi-2level/auto/IO/IO.bundle > Expected in: dynamic lookup > > dyld: Symbol not found: _Perl_Tstack_sp_ptr > Referenced from: > /sw/lib/perl5/5.8.8/darwin-thread-multi-2level/auto/IO/IO.bundle > Expected in: dynamic lookup > > Trace/BPT trap > > I have looked through many forum postings and attempted the solutions > offered in those instances, but none seem to work in my case. I'm not sure > if it's because I have perl 5.10.0 installed while attempting to call > bioperl 5.8.8; however, others seem to have it working just fine. > > Thank you, Ryan > -- View this message in context: http://old.nabble.com/Problems-with-bioperl-in-Mac-OS-X-10.6-%28Perl-5.10.0%29-tp26366421p26379710.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From jay at jays.net Mon Nov 16 17:02:10 2009 From: jay at jays.net (Jay Hannah) Date: Mon, 16 Nov 2009 16:02:10 -0600 Subject: [Bioperl-l] Bio::Index::GenBank - by organism? In-Reply-To: <2BA451B1-6E18-483E-B655-74D1146772CC@bioperl.org> References: <3B01A09C-198E-4691-B807-7ED3250BB81A@jays.net> <12DFD22E-42DC-4626-9873-0DE3EBB5CFBD@illinois.edu> <2BA451B1-6E18-483E-B655-74D1146772CC@bioperl.org> Message-ID: <60ADD3A9-D38B-4A39-A5CE-C8118DEC1242@jays.net> On Nov 10, 2009, at 12:50 PM, Jason Stajich wrote: > You might also look at what mygenbank does: > http://homepage.mac.com/iankorf/mygenbank.html It appears, perhaps, that BioSQL can provide *foo* searching like so: http://www.biosql.org/wiki/Schema_Overview#TAXON.2C_TAXON_NAME SELECT DISTINCT include.ncbi_taxon_id FROM taxon INNER JOIN taxon AS include ON (include.left_value BETWEEN taxon.left_value AND taxon.right_value) WHERE taxon.taxon_id IN (SELECT taxon_id FROM taxon_name WHERE name LIKE '%fungi%') So I think we're going to chase that for a while. I didn't see a *foo* search in MyGenBank? Thanks, j http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah From roy.chaudhuri at gmail.com Tue Nov 17 06:24:07 2009 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Tue, 17 Nov 2009 11:24:07 +0000 Subject: [Bioperl-l] Regarding Bio::TreeIO Object In-Reply-To: <9cb9dfd70911162117nfac0e52gea3d638e34337b16@mail.gmail.com> References: <9cb9dfd70911152300y34789f88qc69dd14bf505f57d@mail.gmail.com> <4B018C85.6020801@gmail.com> <9cb9dfd70911162117nfac0e52gea3d638e34337b16@mail.gmail.com> Message-ID: <4B0287D7.5050702@gmail.com> Hi Aneesh, Please keep your replies on the mailing list, that way someone else can respond, which would be particularly useful in this case since I know nothing about MapIO. Roy. Aneesh K wrote: > Thanks for your reply. > > I would like to know about "Genetic Maps" also. I would like to > use MapIO object. > But I'm not aware about genetic maps and the mapmaker format. > > Please tell me from where I can get some examples for mapmaker format > and some example scripts to use MapIO object. > > Hoping your reply. > > Aneesh.K > Mob. 09646181517 > > > > On Mon, Nov 16, 2009 at 11:01 PM, Roy Chaudhuri > wrote: > > Hi Aneesh, > > See the Bioperl trees howto: > http://www.bioperl.org/wiki/HOWTO:Trees > > Roy. > > > Aneesh K wrote: > > Hi, > > I just started to use Bioperl modules. It's really useful and > interesting. > Now I have in stuck with "Tree objects and phylogenetic trees". > I couldn't get any documentation/examples about reading/parsing > phylip tree > files. > > Please tell me from where I can get some sample codes for this. > > Waiting for your reply. > > Thanks > Aneesh.K > Mob. 09646181517 > > > > -- > Dr. Roy Chaudhuri > Department of Veterinary Medicine > University of Cambridge, U.K. > > From maj at fortinbras.us Tue Nov 17 07:50:06 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 17 Nov 2009 07:50:06 -0500 Subject: [Bioperl-l] Regarding Bio::TreeIO Object In-Reply-To: <4B0287D7.5050702@gmail.com> References: <9cb9dfd70911152300y34789f88qc69dd14bf505f57d@mail.gmail.com><4B018C85.6020801@gmail.com><9cb9dfd70911162117nfac0e52gea3d638e34337b16@mail.gmail.com> <4B0287D7.5050702@gmail.com> Message-ID: <394F62D51F15405BBCF8BB50DA0FF336@NewLife> Aneesh, Have a look in the t/Map directory of the BioPerl distribution. These are test scripts that are also examples of usage. The t/data directory will contain the datafiles that the tests use; these will provide example data. cheers Mark ----- Original Message ----- From: "Roy Chaudhuri" To: "Aneesh K" ; Sent: Tuesday, November 17, 2009 6:24 AM Subject: Re: [Bioperl-l] Regarding Bio::TreeIO Object > Hi Aneesh, > > Please keep your replies on the mailing list, that way someone else can > respond, which would be particularly useful in this case since I know > nothing about MapIO. > > Roy. > > Aneesh K wrote: >> Thanks for your reply. >> >> I would like to know about "Genetic Maps" also. I would like to >> use MapIO object. >> But I'm not aware about genetic maps and the mapmaker format. >> >> Please tell me from where I can get some examples for mapmaker format >> and some example scripts to use MapIO object. >> >> Hoping your reply. >> >> Aneesh.K >> Mob. 09646181517 >> >> >> >> On Mon, Nov 16, 2009 at 11:01 PM, Roy Chaudhuri > > wrote: >> >> Hi Aneesh, >> >> See the Bioperl trees howto: >> http://www.bioperl.org/wiki/HOWTO:Trees >> >> Roy. >> >> >> Aneesh K wrote: >> >> Hi, >> >> I just started to use Bioperl modules. It's really useful and >> interesting. >> Now I have in stuck with "Tree objects and phylogenetic trees". >> I couldn't get any documentation/examples about reading/parsing >> phylip tree >> files. >> >> Please tell me from where I can get some sample codes for this. >> >> Waiting for your reply. >> >> Thanks >> Aneesh.K >> Mob. 09646181517 >> >> >> >> -- >> Dr. Roy Chaudhuri >> Department of Veterinary Medicine >> University of Cambridge, U.K. >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From veronica.xiaoyu at gmail.com Wed Nov 18 12:18:33 2009 From: veronica.xiaoyu at gmail.com (Xiaoyu Liang) Date: Wed, 18 Nov 2009 12:18:33 -0500 Subject: [Bioperl-l] how to visualize multiple sequences alignments Message-ID: Hi, I'm wondering Is there any modules that can be used for visualizing multiple sequences alignments? like the result from ClustalW? Thank you very much, Xiaoyu From jason at bioperl.org Wed Nov 18 13:23:05 2009 From: jason at bioperl.org (Jason Stajich) Date: Wed, 18 Nov 2009 10:23:05 -0800 Subject: [Bioperl-l] how to visualize multiple sequences alignments In-Reply-To: References: Message-ID: try jalview http://www.jalview.org/ On Nov 18, 2009, at 9:18 AM, Xiaoyu Liang wrote: > Hi, > > I'm wondering Is there any modules that can be used for visualizing > multiple > sequences alignments? like the result from ClustalW? > > Thank you very much, > Xiaoyu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From andrew.j.grimm at gmail.com Wed Nov 18 21:52:31 2009 From: andrew.j.grimm at gmail.com (Andrew Grimm) Date: Thu, 19 Nov 2009 13:52:31 +1100 Subject: [Bioperl-l] DANGER: hacking of bioperl wiki? Message-ID: Caution: read the whole email before visiting the bioperl wiki I was doing some bioinformatics-related searching using google, and one of the hits was to the bio dot perl dot org wiki (the FAQ in particular). When I did that, I was redirected to a ferdax dot com web site (a typo-squatting of fedex?). Some people reckon that ferdax hacks web sites and redirects google hits from the victim web site to their own web site. For example, this thread at google's webmaster central http://www.google.com/support/forum/p/Webmasters/thread?tid=37a36c0d1ea99819&hl=en#all (it's talking about zencart, but presumably they've since found other victims) Just going to the website without using google may not trigger the redirect. Apologies if this is a false alarm, but I don't think it is. I won't be in contact between Friday and Monday Australian time (I'll be at railscamp 6 in Melbourne), so I won't be able to answer any replies. Thanks, Andrew Grimm From maj at fortinbras.us Wed Nov 18 22:14:44 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 18 Nov 2009 22:14:44 -0500 Subject: [Bioperl-l] DANGER: hacking of bioperl wiki? In-Reply-To: References: Message-ID: <7761C2223DB54DE6B836F302D2FF6AC0@NewLife> Andrew-- thanks!! We're on it. MAJ ----- Original Message ----- From: "Andrew Grimm" To: Sent: Wednesday, November 18, 2009 9:52 PM Subject: [Bioperl-l] DANGER: hacking of bioperl wiki? > Caution: read the whole email before visiting the bioperl wiki > > I was doing some bioinformatics-related searching using google, and > one of the hits was to the bio dot perl dot org wiki (the FAQ in > particular). > > When I did that, I was redirected to a ferdax dot com web site (a > typo-squatting of fedex?). > > Some people reckon that ferdax hacks web sites and redirects google > hits from the victim web site to their own web site. For example, this > thread at google's webmaster central > http://www.google.com/support/forum/p/Webmasters/thread?tid=37a36c0d1ea99819&hl=en#all > (it's talking about zencart, but presumably they've since found other > victims) > > Just going to the website without using google may not trigger the redirect. > > Apologies if this is a false alarm, but I don't think it is. > > I won't be in contact between Friday and Monday Australian time (I'll > be at railscamp 6 in Melbourne), so I won't be able to answer any > replies. > > Thanks, > > Andrew Grimm > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From sandipan.chowdhury at physiology.wisc.edu Thu Nov 19 01:49:45 2009 From: sandipan.chowdhury at physiology.wisc.edu (Sandipan Chowdhury) Date: Thu, 19 Nov 2009 00:49:45 -0600 Subject: [Bioperl-l] accessing EMBL database Message-ID: <475F74057A618245A773CD325E105D1E033334AC@phy-srv01.physiol.physiology.wisc.edu> Hi, I have 3 questions all related to the retreival of sequences from online databases. (1) I have been trying to download a protein sequence from the EMBL database and trying to write the sequence into a text file, as a string. I am using the following code: use Bio::DB::EMBL; open b,">","s.txt"; $em_obj = Bio::DB::EMBL->new; $seq_obj = $em_obj->get_Seq_by_acc("CAB95729"); $s_str = $seq_obj->seq; print b "$s_str\n"; close b; The script is not working and gives the messege: "MSG: EMBL stream with no ID. Not embl in my book STACK: Error::throw STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm: 368 STACK: Bio::SeqIO::embl::next_seq C:/Perl/site/lib/Bio\SeqIO\embl.pm: 203 STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc C:/Perl/site/lib/Bio/DB/WebDBSeqI.pm: 194 STACK: trial2.pl" I am not sure what this means. A similar version of the script works for the Swissprot, GenBank and RefSeq databases but not for the EMBL. What is the way around this so that I can download the embl sequence? (2) Also, is there anyway I can download sequences from DDBJ (database of Japan)? (3) Can GI numbers be used to retreive the sequences? If so then how? Answers to these questions would be greatly appreciated. I am very new to Perl/Bioperl and am not really familiar with the advanced programming features, so I would need to your help to find my way out of this situation. Many Thanks Sandipan From maj at fortinbras.us Thu Nov 19 08:10:07 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 19 Nov 2009 08:10:07 -0500 Subject: [Bioperl-l] accessing EMBL database In-Reply-To: <475F74057A618245A773CD325E105D1E033334AC@phy-srv01.physiol.physiology.wisc.edu> References: <475F74057A618245A773CD325E105D1E033334AC@phy-srv01.physiol.physiology.wisc.edu> Message-ID: Sandipan-- That id (CAB95729) returns "No entries" from EMBL. I would agree that the error message is not really informative. The module documentation warns: # remember that EMBL_ID does not equal GenBank_ID! so I would check that. MAJ ----- Original Message ----- From: "Sandipan Chowdhury" To: Sent: Thursday, November 19, 2009 1:49 AM Subject: [Bioperl-l] accessing EMBL database > Hi, > > I have 3 questions all related to the retreival of sequences from online > databases. > > (1) I have been trying to download a protein sequence from the EMBL database > and trying to write the sequence into a text file, as a string. I am using the > following code: > > use Bio::DB::EMBL; > open b,">","s.txt"; > $em_obj = Bio::DB::EMBL->new; > $seq_obj = $em_obj->get_Seq_by_acc("CAB95729"); > $s_str = $seq_obj->seq; > print b "$s_str\n"; > close b; > > The script is not working and gives the messege: > "MSG: EMBL stream with no ID. Not embl in my book > STACK: Error::throw > STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm: 368 > STACK: Bio::SeqIO::embl::next_seq C:/Perl/site/lib/Bio\SeqIO\embl.pm: 203 > STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc > C:/Perl/site/lib/Bio/DB/WebDBSeqI.pm: 194 > STACK: trial2.pl" > > I am not sure what this means. A similar version of the script works for the > Swissprot, GenBank and RefSeq databases but not for the EMBL. What is the way > around this so that I can download the embl sequence? > > (2) Also, is there anyway I can download sequences from DDBJ (database of > Japan)? > > (3) Can GI numbers be used to retreive the sequences? If so then how? > > Answers to these questions would be greatly appreciated. I am very new to > Perl/Bioperl and am not really familiar with the advanced programming > features, so I would need to your help to find my way out of this situation. > > Many Thanks > Sandipan > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From hrh at fmi.ch Thu Nov 19 08:23:29 2009 From: hrh at fmi.ch (Hotz, Hans-Rudolf) Date: Thu, 19 Nov 2009 14:23:29 +0100 Subject: [Bioperl-l] accessing EMBL database In-Reply-To: <475F74057A618245A773CD325E105D1E033334AC@phy-srv01.physiol.physiology.wisc.edu> Message-ID: Sandipan > I have 3 questions all related to the retreival of sequences from online > databases. > > (1) I have been trying to download a protein sequence from the EMBL database > and trying to write the sequence into a text file, as a string. I am using the > following code: > > use Bio::DB::EMBL; > open b,">","s.txt"; > $em_obj = Bio::DB::EMBL->new; > $seq_obj = $em_obj->get_Seq_by_acc("CAB95729"); > $s_str = $seq_obj->seq; > print b "$s_str\n"; > close b; > > The script is not working and gives the messege: > "MSG: EMBL stream with no ID. Not embl in my book > STACK: Error::throw > STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm: 368 > STACK: Bio::SeqIO::embl::next_seq C:/Perl/site/lib/Bio\SeqIO\embl.pm: 203 > STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc > C:/Perl/site/lib/Bio/DB/WebDBSeqI.pm: 194 > STACK: trial2.pl" > > I am not sure what this means. A similar version of the script works for the > Swissprot, GenBank and RefSeq databases but not for the EMBL. What is the way > around this so that I can download the embl sequence? "CAB95729" is a protein sequence, ie a translation of the CDS of 'AJ277028.1'. As far as I know, Bio::DB::EMBL is only designed to get EMBL entries, ie the nucleotides sequence > (2) Also, is there anyway I can download sequences from DDBJ (database of > Japan)? Unless, for network/speed reason, why do you want to download data from DDBJ? It contains the same data as GenBank and EMBL. Those three databases exchange their data on a daily basis. > (3) Can GI numbers be used to retreive the sequences? If so then how? Have you looked at Bio::DB::Eutilities ? See the 'HOWTOs' page in the Bioperl Wiki Regards, Hans > Answers to these questions would be greatly appreciated. I am very new to > Perl/Bioperl and am not really familiar with the advanced programming > features, so I would need to your help to find my way out of this situation. > > Many Thanks > Sandipan > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu Nov 19 08:47:16 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 19 Nov 2009 07:47:16 -0600 Subject: [Bioperl-l] accessing EMBL database In-Reply-To: References: Message-ID: <95D416ED-7630-40A1-ABA5-A3C3525D25B1@illinois.edu> On Nov 19, 2009, at 7:23 AM, Hotz, Hans-Rudolf wrote: > > Sandipan > > >> I have 3 questions all related to the retreival of sequences from online >> databases. >> >> (1) I have been trying to download a protein sequence from the EMBL database >> and trying to write the sequence into a text file, as a string. I am using the >> following code: >> >> use Bio::DB::EMBL; >> open b,">","s.txt"; >> $em_obj = Bio::DB::EMBL->new; >> $seq_obj = $em_obj->get_Seq_by_acc("CAB95729"); >> $s_str = $seq_obj->seq; >> print b "$s_str\n"; >> close b; >> >> The script is not working and gives the messege: >> "MSG: EMBL stream with no ID. Not embl in my book >> STACK: Error::throw >> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm: 368 >> STACK: Bio::SeqIO::embl::next_seq C:/Perl/site/lib/Bio\SeqIO\embl.pm: 203 >> STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc >> C:/Perl/site/lib/Bio/DB/WebDBSeqI.pm: 194 >> STACK: trial2.pl" >> >> I am not sure what this means. A similar version of the script works for the >> Swissprot, GenBank and RefSeq databases but not for the EMBL. What is the way >> around this so that I can download the embl sequence? > > "CAB95729" is a protein sequence, ie a translation of the CDS of > 'AJ277028.1'. > > As far as I know, Bio::DB::EMBL is only designed to get EMBL entries, ie the > nucleotides sequence > > > >> (2) Also, is there anyway I can download sequences from DDBJ (database of >> Japan)? > > Unless, for network/speed reason, why do you want to download data from > DDBJ? It contains the same data as GenBank and EMBL. Those three databases > exchange their data on a daily basis. > >> (3) Can GI numbers be used to retreive the sequences? If so then how? > > Have you looked at Bio::DB::Eutilities ? See the 'HOWTOs' page in the > Bioperl Wiki > > > > Regards, Hans > > > >> Answers to these questions would be greatly appreciated. I am very new to >> Perl/Bioperl and am not really familiar with the advanced programming >> features, so I would need to your help to find my way out of this situation. >> >> Many Thanks >> Sandipan To add to that, if you want the protein sequences as a Bio::Seq you can use Bio::DB::GenPept (Bio::DB::EUtilities will retrieve raw data only). chris From David.Messina at sbc.su.se Thu Nov 19 09:04:55 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 19 Nov 2009 15:04:55 +0100 Subject: [Bioperl-l] accessing EMBL database In-Reply-To: References: <475F74057A618245A773CD325E105D1E033334AC@phy-srv01.physiol.physiology.wisc.edu> Message-ID: > I would agree that the error message is not really informative. Agreed that it could be better, but I wonder whether part of the problem with BioPerl error messages is the stack dump. I think a lot of eyes just glaze right over when they see a big wad of complicated stuff, with colons and slashes and line numbers, spewing out at them. Perhaps the stack dump should be turned off by default? Wouldn't this: ERROR: EMBL stream with no ID. Not embl in my book Be a lot clearer than this?: MSG: EMBL stream with no ID. Not embl in my book STACK: Error::throw STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm: 368 STACK: Bio::SeqIO::embl::next_seq C:/Perl/site/lib/Bio\SeqIO\embl.pm: 203 STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc C:/Perl/site/lib/Bio/DB/WebDBSeqI.pm: 194 STACK: trial2.pl Just a thought. This has probably been discussed before. Dave From maj at fortinbras.us Thu Nov 19 09:17:05 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 19 Nov 2009 09:17:05 -0500 Subject: [Bioperl-l] accessing EMBL database In-Reply-To: References: <475F74057A618245A773CD325E105D1E033334AC@phy-srv01.physiol.physiology.wisc.edu> Message-ID: I'm inclined to agree. Lots of responses to questions here that begin "Well, as the error message said, you need to check...", which means people tend towards "I broke it! Write the list!". I do find it hairy when my errors are way down in the object tree. ----- Original Message ----- From: "Dave Messina" To: "Mark A. Jensen" Cc: Sent: Thursday, November 19, 2009 9:04 AM Subject: Re: [Bioperl-l] accessing EMBL database > I would agree that the error message is not really informative. Agreed that it could be better, but I wonder whether part of the problem with BioPerl error messages is the stack dump. I think a lot of eyes just glaze right over when they see a big wad of complicated stuff, with colons and slashes and line numbers, spewing out at them. Perhaps the stack dump should be turned off by default? Wouldn't this: ERROR: EMBL stream with no ID. Not embl in my book Be a lot clearer than this?: MSG: EMBL stream with no ID. Not embl in my book STACK: Error::throw STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm: 368 STACK: Bio::SeqIO::embl::next_seq C:/Perl/site/lib/Bio\SeqIO\embl.pm: 203 STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc C:/Perl/site/lib/Bio/DB/WebDBSeqI.pm: 194 STACK: trial2.pl Just a thought. This has probably been discussed before. Dave From rtbio.2009 at gmail.com Thu Nov 19 09:55:27 2009 From: rtbio.2009 at gmail.com (Roopa Raghuveer) Date: Thu, 19 Nov 2009 15:55:27 +0100 Subject: [Bioperl-l] Remote blast Message-ID: Hello everybody, I have a problem. I would like to use remote blast to find sequences matching for an input sequence. Ex:-I would like to search sequences which match Trypanosoma Brucei sequence. I want the output to be only Trypanosoma Brucei sequences matching with my query.When i tried to use remoteblast to nr database,I got sequences from different organisms like E.coli,Pseudomonas etc., Could you please tell me how can this be solved...? My code is as follows. use Bio::Tools::Run::RemoteBlast; use strict; my $prog = 'blastn'; my $db = 'nr'; my $e_val= '1e-10'; my $organism= 'Trypanosoma Brucei'; my @params = ( '-prog' => $prog, '-data' => $db, '-expect' => $e_val, '-readmethod' => 'SearchIO', '-Organism' => $organism ); my $factory = Bio::Tools::Run::RemoteBlast-> new(@params); #change a paramter #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma brucei[ORGN]' #remove a parameter #delete $Bio::Tools::Run::RemoteBlast::HEADER{'FILTER'}; my $v = 1; #$v is just to turn on and off the messages my $str = Bio::SeqIO->new(-file=>'amino.fa' , '-format' => 'fasta' , '-organism' => 'Trypanosoma Brucei' ); while (my $input = $str->next_seq()){ #Blast a sequence against a database: my $r = $factory->submit_blast($input); #my $r = $factory->submit_blast('amino.fa'); print STDERR "waiting..." if( $v > 0 ); while ( my @rids = $factory->each_rid ) { foreach my $rid ( @rids ) { my $rc = $factory->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $factory->remove_rid($rid); } print STDERR "." if ( $v > 0 ); sleep 5; } else { my $result = $rc->next_result(); #save the output my $filename = $result->query_name()."\.out"; $factory->save_output($filename); $factory->remove_rid($rid); print "\nQuery Name: ", $result->query_name(), "\n"; while ( my $hit = $result->next_hit ) { next unless ( $v > 0); print "\thit name is ", $hit->name, "\n"; while( my $hsp = $hit->next_hsp ) { print "\t\tscore is ", $hsp->score, "\n"; } } } } } } My input sequence is >ref|NC_009512.1|:385-1902 GTGTCAGTGGAACTTTGGCAGCAGTGCGTGGAGCTTCTGCGCGATGAACTGCCTGCCCAGCAATTCAACA CCTGGATCCGTCCGCTACAGGTCGAAGCCGAAGGCGACGAGTTGCGCGTCTATGCGCCTAACCGTTTCGT TCTCGATTGGGTCAATGAAAAGTACCTGGGTCGTTTGCTCGAGCTGTTGGGTGAGAACGGTAGCGGCATT GCACCAGCCCTTTCCTTATTAATAGGTAGCCGCCGCAGCTCGGCCCCAAGGGCTGCACCCAACGCGCCGG TCAGCGCTGCCGTTGCGGCTTCGCTGGCGCAGACTCAGGCGCACAAGACGGCCCCGGCAGCAGCGGTTGA ACCCGTTGCCGTGGCCGCGGCCGAGCCTGTATTGGTCGAGACGTCTTCGCGTGACAGCTTTGATGCCATG GCCGAGCCTGCTGCTGCGCCGCCCAGTGGTGGCCGGGCTGAACAGCGCACCGTGCAGGTTGAAGGTGCGC TCAAGCACACCAGTTACCTGAACCGGACCTTTACCTTTGACACCTTCGTCGAAGGTAAGTCGAACCAGCT CGCCCGCGCGGCTGCCTGGCAGGTTGCGGACAACCCTAAGCATGGCTACAACCCACTGTTCCTTTATGGC GGTGTGGGTTTGGGTAAAACCCACCTTATGCATGCTGTGGGTAACCATCTGCTGAAGAAGAATCCGAACG CCAAGGTGGTGTACCTGCATTCGGAGCGCTTCGTCGCGGACATGGTCAAAGCGTTGCAACTCAACGCCAT CAACGAATTCAAGCGCTTCTACCGCTCGGTGGACGCGTTGCTGATCGACGATATCCAGTTCTTCGCTCGC AAAGAGCGCTCGCAAGAAGAGTTTTTCCACACCTTCAACGCCTTGCTTGAGGGTGGCCAGCAGGTAATCC TTACCTCTGACCGCTATCCCAAGGAAATCGAAGGCCTGGAAGAGCGTCTGAAGTCGCGCTTTGGTTGGGG CCTGACGGTGGCTGTCGAGCCGCCAGAGCTGGAGACCCGCGTAGCGATCCTGATGAAGAAGGCCGACCAG GCCAAAGTCGAGCTCCCGCATGACGCAGCCTTTTTCATCGCTCAGCGCATCCGGTCCAACGTCCGTGAGC TGGAAGGTGCACTGAAGCGAGTTATTGCTCACTCGCACTTCATGGGGCGTGACATCACCATCGAGCTGAT TCGTGAATCGCTCAAGGATCTGTTGGCGCTGCAAGACAAACTGGTCAGTGTGGATAACATTCAGCGTACC GTCGCTGAGTACTACAAGATCAAGATCTCCGATCTGTTGTCCAAGCGTCGTTCGCGTTCTGTCGCGCGCC CGCGTCAGGTAGCCATGGCCCTGTCCAAGGAGTTGACCAACCACAGTCTGCCGGAAATCGGCGACATGTT CGGTGGTCGCGACCATACGACCGTGCTGCACGCCTGCCGCAAAATCAATGAACTGAAGGAATCCGACGCG GACATCCGCGAGGACTACAAGAACCTGCTGCGGACGCTGACGACCTGA Please mail me regarding any queries. Regards, Roopa. From cjfields at illinois.edu Thu Nov 19 10:30:34 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 19 Nov 2009 09:30:34 -0600 Subject: [Bioperl-l] verbosity and error stack, was accessing EMBL database In-Reply-To: References: <475F74057A618245A773CD325E105D1E033334AC@phy-srv01.physiol.physiology.wisc.edu> Message-ID: Mark, Dave, This could be based on verbose(). Level w t d st verbose < 0 - + - -/+ verbose 0 + + - -/+ verbose 1 + + + +/+ verbose > 1 +* -> + + +/+ * converts to throw() w = warn t = throw d = debug st = stack trace warn() is set up that way now, you don't get a stack trace unless verbose() is > 0. throw() could be the same; would be a simple fix, really. My only problem with the current state of things is (I think we've delved down this path before) verbosity level is tied to exception strictness as seen above, and they're really two separate concepts, at least to me. Verbosity of 1 or more doesn't necessarily mean I want an elevated level of strictness along with it. For instance, one might want very strict exceptions w/o the noise, or (conversely) lots of debugging output but no warnings. (aside: another small nit, but I haven't exactly liked that the global level of strictness is designated by a env. variable with DEBUG in the name, but that's just me). I've been thinking it would be nice to have simple separate verbose/strict switches (this is the way it's implemented in Biome). This would allow some finer grained control over output: Level d st verbose 0 - - verbose 1 + + Default = BIOPERLDEBUG || 0 # current situation Level w t strict -1 - + strict 0 + + strict 1 +* -> + * converts to throw() Default = BIOPERLSTRICT || 0 We could even allow finer-grained control of verbosity (states which cover all combinations) w/o affecting strictness. chris On Nov 19, 2009, at 8:17 AM, Mark A. Jensen wrote: > I'm inclined to agree. Lots of responses to questions here that begin > "Well, as the error message said, you need to check...", which means > people tend towards "I broke it! Write the list!". I do find it hairy when > my errors are way down in the object tree. > ----- Original Message ----- From: "Dave Messina" > To: "Mark A. Jensen" > Cc: > Sent: Thursday, November 19, 2009 9:04 AM > Subject: Re: [Bioperl-l] accessing EMBL database > > >> I would agree that the error message is not really informative. > > Agreed that it could be better, but I wonder whether part of the problem with BioPerl error messages is the stack dump. > > I think a lot of eyes just glaze right over when they see a big wad of complicated stuff, with colons and slashes and line numbers, spewing out at them. > > Perhaps the stack dump should be turned off by default? > > Wouldn't this: > > ERROR: EMBL stream with no ID. Not embl in my book > > > > Be a lot clearer than this?: > > MSG: EMBL stream with no ID. Not embl in my book > STACK: Error::throw > STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm: 368 > STACK: Bio::SeqIO::embl::next_seq C:/Perl/site/lib/Bio\SeqIO\embl.pm: 203 > STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc C:/Perl/site/lib/Bio/DB/WebDBSeqI.pm: 194 > STACK: trial2.pl > > > > Just a thought. This has probably been discussed before. > Dave > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From roy.chaudhuri at gmail.com Thu Nov 19 11:10:28 2009 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Thu, 19 Nov 2009 16:10:28 +0000 Subject: [Bioperl-l] Remote blast In-Reply-To: References: Message-ID: <4B056DF4.2030502@gmail.com> Hi Roopa, I think that the -Organism parameter that you specify for Bio::Tools::Run::RemoteBlast is ignored - I can't find any reference to it in the documentation: http://search.cpan.org/~cjfields/BioPerl-1.6.1/Bio/Tools/Run/RemoteBlast.pm You have the correct approach in your code - limiting the search to the Entrez query "Trypanosoma brucei[ORGN]", but the line is commented out. If you uncomment the line (and add a semicolon afterwards), the program runs correctly, but no hits are reported below your threshold e-value. If you change the value of $e_val to 10 then some T.brucei hits are reported. Roy. Roopa Raghuveer wrote: > Hello everybody, > > I have a problem. I would like to use remote blast to find sequences > matching for an input sequence. > > Ex:-I would like to search sequences which match Trypanosoma Brucei > sequence. > > I want the output to be only Trypanosoma Brucei sequences matching with my > query.When i tried to use remoteblast to nr database,I got sequences from > different organisms like E.coli,Pseudomonas etc., > > Could you please tell me how can this be solved...? > > My code is as follows. > > use Bio::Tools::Run::RemoteBlast; > use strict; > my $prog = 'blastn'; > my $db = 'nr'; > my $e_val= '1e-10'; > my $organism= 'Trypanosoma Brucei'; > > my @params = ( '-prog' => $prog, > '-data' => $db, > '-expect' => $e_val, > '-readmethod' => 'SearchIO', > '-Organism' => $organism ); > > my $factory = Bio::Tools::Run::RemoteBlast-> > new(@params); > > #change a paramter > #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma > brucei[ORGN]' > > #remove a parameter > #delete $Bio::Tools::Run::RemoteBlast::HEADER{'FILTER'}; > > my $v = 1; > #$v is just to turn on and off the messages > > my $str = Bio::SeqIO->new(-file=>'amino.fa' , '-format' => 'fasta' , > '-organism' => 'Trypanosoma Brucei' ); > > while (my $input = $str->next_seq()){ > #Blast a sequence against a database: > my $r = $factory->submit_blast($input); > #my $r = $factory->submit_blast('amino.fa'); > > print STDERR "waiting..." if( $v > 0 ); > while ( my @rids = $factory->each_rid ) { > foreach my $rid ( @rids ) { > my $rc = $factory->retrieve_blast($rid); > if( !ref($rc) ) { > if( $rc < 0 ) { > $factory->remove_rid($rid); > } > print STDERR "." if ( $v > 0 ); > sleep 5; > } > else { > my $result = $rc->next_result(); > #save the output > my $filename = $result->query_name()."\.out"; > $factory->save_output($filename); > $factory->remove_rid($rid); > print "\nQuery Name: ", $result->query_name(), "\n"; > while ( my $hit = $result->next_hit ) { > next unless ( $v > 0); > print "\thit name is ", $hit->name, "\n"; > while( my $hsp = $hit->next_hsp ) { > print "\t\tscore is ", $hsp->score, "\n"; > } > } > } > } > } > } > > My input sequence is > >> ref|NC_009512.1|:385-1902 > GTGTCAGTGGAACTTTGGCAGCAGTGCGTGGAGCTTCTGCGCGATGAACTGCCTGCCCAGCAATTCAACA > CCTGGATCCGTCCGCTACAGGTCGAAGCCGAAGGCGACGAGTTGCGCGTCTATGCGCCTAACCGTTTCGT > TCTCGATTGGGTCAATGAAAAGTACCTGGGTCGTTTGCTCGAGCTGTTGGGTGAGAACGGTAGCGGCATT > GCACCAGCCCTTTCCTTATTAATAGGTAGCCGCCGCAGCTCGGCCCCAAGGGCTGCACCCAACGCGCCGG > TCAGCGCTGCCGTTGCGGCTTCGCTGGCGCAGACTCAGGCGCACAAGACGGCCCCGGCAGCAGCGGTTGA > ACCCGTTGCCGTGGCCGCGGCCGAGCCTGTATTGGTCGAGACGTCTTCGCGTGACAGCTTTGATGCCATG > GCCGAGCCTGCTGCTGCGCCGCCCAGTGGTGGCCGGGCTGAACAGCGCACCGTGCAGGTTGAAGGTGCGC > TCAAGCACACCAGTTACCTGAACCGGACCTTTACCTTTGACACCTTCGTCGAAGGTAAGTCGAACCAGCT > CGCCCGCGCGGCTGCCTGGCAGGTTGCGGACAACCCTAAGCATGGCTACAACCCACTGTTCCTTTATGGC > GGTGTGGGTTTGGGTAAAACCCACCTTATGCATGCTGTGGGTAACCATCTGCTGAAGAAGAATCCGAACG > CCAAGGTGGTGTACCTGCATTCGGAGCGCTTCGTCGCGGACATGGTCAAAGCGTTGCAACTCAACGCCAT > CAACGAATTCAAGCGCTTCTACCGCTCGGTGGACGCGTTGCTGATCGACGATATCCAGTTCTTCGCTCGC > AAAGAGCGCTCGCAAGAAGAGTTTTTCCACACCTTCAACGCCTTGCTTGAGGGTGGCCAGCAGGTAATCC > TTACCTCTGACCGCTATCCCAAGGAAATCGAAGGCCTGGAAGAGCGTCTGAAGTCGCGCTTTGGTTGGGG > CCTGACGGTGGCTGTCGAGCCGCCAGAGCTGGAGACCCGCGTAGCGATCCTGATGAAGAAGGCCGACCAG > GCCAAAGTCGAGCTCCCGCATGACGCAGCCTTTTTCATCGCTCAGCGCATCCGGTCCAACGTCCGTGAGC > TGGAAGGTGCACTGAAGCGAGTTATTGCTCACTCGCACTTCATGGGGCGTGACATCACCATCGAGCTGAT > TCGTGAATCGCTCAAGGATCTGTTGGCGCTGCAAGACAAACTGGTCAGTGTGGATAACATTCAGCGTACC > GTCGCTGAGTACTACAAGATCAAGATCTCCGATCTGTTGTCCAAGCGTCGTTCGCGTTCTGTCGCGCGCC > CGCGTCAGGTAGCCATGGCCCTGTCCAAGGAGTTGACCAACCACAGTCTGCCGGAAATCGGCGACATGTT > CGGTGGTCGCGACCATACGACCGTGCTGCACGCCTGCCGCAAAATCAATGAACTGAAGGAATCCGACGCG > GACATCCGCGAGGACTACAAGAACCTGCTGCGGACGCTGACGACCTGA > > Please mail me regarding any queries. > > Regards, > Roopa. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From clements at nescent.org Thu Nov 19 12:46:32 2009 From: clements at nescent.org (Dave Clements) Date: Thu, 19 Nov 2009 18:46:32 +0100 Subject: [Bioperl-l] how to visualize multiple sequences alignments In-Reply-To: References: Message-ID: Hi Xiaoyu, I would also take a look at GBrowse_syn, a perl based solution built with the GBrowse genome browser framework. See http://gmod.org/wiki/GBrowse_syn. Cheers, Dave C. On Wed, Nov 18, 2009 at 7:23 PM, Jason Stajich wrote: > try jalview http://www.jalview.org/ > > > On Nov 18, 2009, at 9:18 AM, Xiaoyu Liang wrote: > > Hi, >> >> I'm wondering Is there any modules that can be used for visualizing >> multiple >> sequences alignments? like the result from ClustalW? >> >> Thank you very much, >> Xiaoyu >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- http://gmod.org/wiki/GMOD_News http://gmod.org/wiki/January_2010_GMOD_Meeting From maj at fortinbras.us Thu Nov 19 18:37:05 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 19 Nov 2009 18:37:05 -0500 Subject: [Bioperl-l] verbosity and error stack, was accessing EMBL database In-Reply-To: References: <475F74057A618245A773CD325E105D1E033334AC@phy-srv01.physiol.physiology.wisc.edu> Message-ID: I like this verbose/strict separability a lot. Should we go for it? ----- Original Message ----- From: "Chris Fields" To: "Mark A. Jensen" Cc: Sent: Thursday, November 19, 2009 10:30 AM Subject: [Bioperl-l] verbosity and error stack, was accessing EMBL database > Mark, Dave, > > This could be based on verbose(). > > Level w t d st > verbose < 0 - + - -/+ > verbose 0 + + - -/+ > verbose 1 + + + +/+ > verbose > 1 +* -> + + +/+ > * converts to throw() > w = warn > t = throw > d = debug > st = stack trace > > warn() is set up that way now, you don't get a stack trace unless verbose() is > > 0. throw() could be the same; would be a simple fix, really. > > My only problem with the current state of things is (I think we've delved down > this path before) verbosity level is tied to exception strictness as seen > above, and they're really two separate concepts, at least to me. Verbosity of > 1 or more doesn't necessarily mean I want an elevated level of strictness > along with it. For instance, one might want very strict exceptions w/o the > noise, or (conversely) lots of debugging output but no warnings. > > (aside: another small nit, but I haven't exactly liked that the global level > of strictness is designated by a env. variable with DEBUG in the name, but > that's just me). > > I've been thinking it would be nice to have simple separate verbose/strict > switches (this is the way it's implemented in Biome). This would allow some > finer grained control over output: > > Level d st > verbose 0 - - > verbose 1 + + > Default = BIOPERLDEBUG || 0 # current situation > > Level w t > strict -1 - + > strict 0 + + > strict 1 +* -> + > * converts to throw() > Default = BIOPERLSTRICT || 0 > > We could even allow finer-grained control of verbosity (states which cover all > combinations) w/o affecting strictness. > > chris > > On Nov 19, 2009, at 8:17 AM, Mark A. Jensen wrote: > >> I'm inclined to agree. Lots of responses to questions here that begin >> "Well, as the error message said, you need to check...", which means >> people tend towards "I broke it! Write the list!". I do find it hairy when >> my errors are way down in the object tree. >> ----- Original Message ----- From: "Dave Messina" >> To: "Mark A. Jensen" >> Cc: >> Sent: Thursday, November 19, 2009 9:04 AM >> Subject: Re: [Bioperl-l] accessing EMBL database >> >> >>> I would agree that the error message is not really informative. >> >> Agreed that it could be better, but I wonder whether part of the problem with >> BioPerl error messages is the stack dump. >> >> I think a lot of eyes just glaze right over when they see a big wad of >> complicated stuff, with colons and slashes and line numbers, spewing out at >> them. >> >> Perhaps the stack dump should be turned off by default? >> >> Wouldn't this: >> >> ERROR: EMBL stream with no ID. Not embl in my book >> >> >> >> Be a lot clearer than this?: >> >> MSG: EMBL stream with no ID. Not embl in my book >> STACK: Error::throw >> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm: 368 >> STACK: Bio::SeqIO::embl::next_seq C:/Perl/site/lib/Bio\SeqIO\embl.pm: 203 >> STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc >> C:/Perl/site/lib/Bio/DB/WebDBSeqI.pm: 194 >> STACK: trial2.pl >> >> >> >> Just a thought. This has probably been discussed before. >> Dave >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From michael.watson at bbsrc.ac.uk Fri Nov 20 05:07:10 2009 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Fri, 20 Nov 2009 10:07:10 +0000 Subject: [Bioperl-l] strand in Bio::SearchIO when parsing fasta35 output In-Reply-To: <8D08960C647E64438CE5740657CBBDC501487319B6@iahcexch1.iah.bbsrc.ac.uk> References: <8D08960C647E64438CE5740657CBBDC501487319AE@iahcexch1.iah.bbsrc.ac.uk> <9994F70B-AE92-4425-9AAC-E9A2DC26964E@bioperl.org> <8D08960C647E64438CE5740657CBBDC501487319B6@iahcexch1.iah.bbsrc.ac.uk> Message-ID: <8D08960C647E64438CE5740657CBBDC50148731CEB@iahcexch1.iah.bbsrc.ac.uk> Hello I was just wondering if anyone had had time to look into this? I posted a bug: http://bugzilla.open-bio.org/show_bug.cgi?id=2937 Thanks Mick -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of michael watson (IAH-C) Sent: 27 October 2009 09:01 To: 'Jason Stajich' Cc: bioperl-l Subject: Re: [Bioperl-l] strand in Bio::SearchIO when parsing fasta35 output Hi Jason They both print 0 also. A bug report it is Mick -----Original Message----- From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of Jason Stajich Sent: 26 October 2009 18:46 To: michael watson (IAH-C) Cc: bioperl-l Subject: Re: [Bioperl-l] strand in Bio::SearchIO when parsing fasta35 output Is this -m9 -d 0 output or standard default? I think the strand is parsed in the HSP parsing. Can you double check what $hsp->query->strand and $hsp->hit->strand prints? A full example report as a bug request will be next step if that doesn't resolve. -jason On Oct 26, 2009, at 10:04 AM, michael watson (IAH-C) wrote: > Dear all > > Where does this go? Perhaps I am doing something wrong. > > Fasta35 output puts the strand in the hit list at the top: > > cluster_99033:3 ( 23) [r] 115 37.9 > 0.0011 > cluster_79238:1 ( 27) [f] 126 38.0 0.00097 0.963 > 0.963 27 > > The [r] stands for reverse and the [f] stands for forward. > > There is also the text "rev-comp" after the hit line further down. > > However, when I parse fasta35 output using SearchIO and output the > strand of the HSP: > > print $hsp->strand('hit'), ","; > print $hsp->strand('query'), "\n"; > > This simply prints out 0, 0 (I assume 0 is the default in BioPerl > for "I don't know which strand it's on"). > > So the information is there, but it's not getting parsed. > Alternatively, I've missed something and will feel a bit foolish. > > Currently using BioPerl 1.6.0 > > Thanks > Mick > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Fri Nov 20 05:15:11 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 20 Nov 2009 11:15:11 +0100 Subject: [Bioperl-l] verbosity and error stack, was accessing EMBL database In-Reply-To: References: <475F74057A618245A773CD325E105D1E033334AC@phy-srv01.physiol.physiology.wisc.edu> Message-ID: <3277368F-615A-4DD3-B9B3-5D32A5EEEE98@sbc.su.se> Chris, I took a look at how you implemented this in Biome -- very nice! > I like this verbose/strict separability a lot. Should we go for it? Me too. So yes, I think so. > We could even allow finer-grained control of verbosity (states which cover all combinations) w/o affecting strictness. Perhaps this is a job for Log::Log4Perl or Log::Dispatch? http://search.cpan.org/~mschilli/Log-Log4perl-1.25/lib/Log/Log4perl.pm http://search.cpan.org/~drolsky/Log-Dispatch-2.26/lib/Log/Dispatch.pm That might be overkill, though. Dave From roychu at gmail.com Fri Nov 20 05:21:54 2009 From: roychu at gmail.com (Chu, Roy) Date: Fri, 20 Nov 2009 02:21:54 -0800 Subject: [Bioperl-l] Installing Bio-perl on dreamhost via CPAN Message-ID: <4d7f3e450911200221m5f39ace2hb979712115fb9d78@mail.gmail.com> Hi, Does anyone use dreamhost as a web hosting service? I'm just curious if anyone has had any luck installing the module as their daemon seems to kill my process whenever I try to install it. Dreamhost tech support attributes it to either exceeding the allocated memory cache or exceeding the processing time. I tried to nice the process, but that didn't help for me. Any luck or experience in resolving this would be much appreciated. I suppose my next attempt would be to try installing it directly and hope I don't need root... Thanks, Roy From s.denaxas at gmail.com Fri Nov 20 05:27:42 2009 From: s.denaxas at gmail.com (Spiros Denaxas) Date: Fri, 20 Nov 2009 11:27:42 +0100 Subject: [Bioperl-l] Installing Bio-perl on dreamhost via CPAN In-Reply-To: <4d7f3e450911200221m5f39ace2hb979712115fb9d78@mail.gmail.com> References: <4d7f3e450911200221m5f39ace2hb979712115fb9d78@mail.gmail.com> Message-ID: Hello, normally you don't need to be root - http://sial.org/howto/perl/life-with-cpan/non-root/ Kind of disturbing that their tech support cannot give you a straight answer on what they are killing the process. Good luck Spiros On Fri, Nov 20, 2009 at 11:21 AM, Chu, Roy wrote: > ?I suppose my next attempt would be to try > installing it directly and hope I don't need root... > > Thanks, > Roy > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From charles-listes+bioperl at plessy.org Fri Nov 20 05:44:45 2009 From: charles-listes+bioperl at plessy.org (Charles Plessy) Date: Fri, 20 Nov 2009 19:44:45 +0900 Subject: [Bioperl-l] Installing Bio-perl on dreamhost via CPAN In-Reply-To: <4d7f3e450911200221m5f39ace2hb979712115fb9d78@mail.gmail.com> References: <4d7f3e450911200221m5f39ace2hb979712115fb9d78@mail.gmail.com> Message-ID: <20091120104445.GG31318@kunpuu.plessy.org> Le Fri, Nov 20, 2009 at 02:21:54AM -0800, Chu, Roy a ?crit : > > Does anyone use dreamhost as a web hosting service? I'm just curious > if anyone has had any luck installing the module as their daemon seems > to kill my process whenever I try to install it. Dreamhost tech > support attributes it to either exceeding the allocated memory cache > or exceeding the processing time. I tried to nice the process, but > that didn't help for me. Any luck or experience in resolving this > would be much appreciated. I suppose my next attempt would be to try > installing it directly and hope I don't need root... Dear Roy, DreamHost uses Debian, so you can suggest them to install the Debian package. If you are in contact with the tech service, do not hesitate to tell them to contact me if they are interested by a backport of the 1.6.0 package. For version 1.6.1, it may be more difficult as it depends on perl 5.10.1. PS: if you propse to install BioPerl as a feature in the Dreamhost panel, I will vote for it :) Have a nice day, -- Charles Plessy Debian Med packaging team, http://www.debian.org/devel/debian-med Tsurumi, Kanagawa, Japan From cjfields at illinois.edu Fri Nov 20 07:51:39 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 20 Nov 2009 06:51:39 -0600 Subject: [Bioperl-l] strand in Bio::SearchIO when parsing fasta35 output In-Reply-To: <8D08960C647E64438CE5740657CBBDC50148731CEB@iahcexch1.iah.bbsrc.ac.uk> References: <8D08960C647E64438CE5740657CBBDC501487319AE@iahcexch1.iah.bbsrc.ac.uk> <9994F70B-AE92-4425-9AAC-E9A2DC26964E@bioperl.org> <8D08960C647E64438CE5740657CBBDC501487319B6@iahcexch1.iah.bbsrc.ac.uk> <8D08960C647E64438CE5740657CBBDC50148731CEB@iahcexch1.iah.bbsrc.ac.uk> Message-ID: Mick, Short answer, no. It was in the queue to be fixed at some point in 1.6.x, but that queue is quite long. I'm pushing it into the queue specifically for 1.6.2, so it should be addressed soon. chris On Nov 20, 2009, at 4:07 AM, michael watson (IAH-C) wrote: > Hello > > I was just wondering if anyone had had time to look into this? > > I posted a bug: http://bugzilla.open-bio.org/show_bug.cgi?id=2937 > > Thanks > Mick > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of michael watson (IAH-C) > Sent: 27 October 2009 09:01 > To: 'Jason Stajich' > Cc: bioperl-l > Subject: Re: [Bioperl-l] strand in Bio::SearchIO when parsing fasta35 output > > Hi Jason > > They both print 0 also. > > A bug report it is > > Mick > > -----Original Message----- > From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of Jason Stajich > Sent: 26 October 2009 18:46 > To: michael watson (IAH-C) > Cc: bioperl-l > Subject: Re: [Bioperl-l] strand in Bio::SearchIO when parsing fasta35 output > > > Is this -m9 -d 0 output or standard default? I think the strand is > parsed in the HSP parsing. > > Can you double check what $hsp->query->strand and $hsp->hit->strand > prints? > > A full example report as a bug request will be next step if that > doesn't resolve. > > -jason > On Oct 26, 2009, at 10:04 AM, michael watson (IAH-C) wrote: > >> Dear all >> >> Where does this go? Perhaps I am doing something wrong. >> >> Fasta35 output puts the strand in the hit list at the top: >> >> cluster_99033:3 ( 23) [r] 115 37.9 >> 0.0011 >> cluster_79238:1 ( 27) [f] 126 38.0 0.00097 0.963 >> 0.963 27 >> >> The [r] stands for reverse and the [f] stands for forward. >> >> There is also the text "rev-comp" after the hit line further down. >> >> However, when I parse fasta35 output using SearchIO and output the >> strand of the HSP: >> >> print $hsp->strand('hit'), ","; >> print $hsp->strand('query'), "\n"; >> >> This simply prints out 0, 0 (I assume 0 is the default in BioPerl >> for "I don't know which strand it's on"). >> >> So the information is there, but it's not getting parsed. >> Alternatively, I've missed something and will feel a bit foolish. >> >> Currently using BioPerl 1.6.0 >> >> Thanks >> Mick >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Fri Nov 20 08:00:45 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 20 Nov 2009 07:00:45 -0600 Subject: [Bioperl-l] Installing Bio-perl on dreamhost via CPAN In-Reply-To: <20091120104445.GG31318@kunpuu.plessy.org> References: <4d7f3e450911200221m5f39ace2hb979712115fb9d78@mail.gmail.com> <20091120104445.GG31318@kunpuu.plessy.org> Message-ID: On Nov 20, 2009, at 4:44 AM, Charles Plessy wrote: > Le Fri, Nov 20, 2009 at 02:21:54AM -0800, Chu, Roy a ?crit : >> >> Does anyone use dreamhost as a web hosting service? I'm just curious >> if anyone has had any luck installing the module as their daemon seems >> to kill my process whenever I try to install it. Dreamhost tech >> support attributes it to either exceeding the allocated memory cache >> or exceeding the processing time. I tried to nice the process, but >> that didn't help for me. Any luck or experience in resolving this >> would be much appreciated. I suppose my next attempt would be to try >> installing it directly and hope I don't need root... > > Dear Roy, > > DreamHost uses Debian, so you can suggest them to install the Debian package. > If you are in contact with the tech service, do not hesitate to tell them to > contact me if they are interested by a backport of the 1.6.0 package. For > version 1.6.1, it may be more difficult as it depends on perl 5.10.1. Any reason why this is so? We specify compatibility back to 5.6.1. Alex mentioned the reliance on the specific Extutils::Manifest version. The version requested has an important bug fix, is present on CPAN, and is backwards-compatible to 5.6.1. It should be fairly easy to request that as a separate package. A strict requirement for perl 5.10.1 doesn't make sense in that light, unless said perl maintainer can enlighten us as to why this is an issue? This one may require a ranty blog post. > PS: if you propse to install BioPerl as a feature in the Dreamhost panel, I > will vote for it :) > > Have a nice day, > > -- > Charles Plessy > Debian Med packaging team, > http://www.debian.org/devel/debian-med > Tsurumi, Kanagawa, Japan chris From rtbio.2009 at gmail.com Fri Nov 20 10:52:09 2009 From: rtbio.2009 at gmail.com (Roopa Raghuveer) Date: Fri, 20 Nov 2009 16:52:09 +0100 Subject: [Bioperl-l] Remote blast In-Reply-To: References: <4B056DF4.2030502@gmail.com> Message-ID: Hello everybody, I have tried to use Remote blast on Trypanasoma brucei sequences and could get certain hits.But I am unable to retrieve the complete sequence from where I got hits. i.e., I am unable to parse the blast output file for getting the complete sequences of the hits. Here is my code. #!/usr/bin/perl -w use Bio::SearchIO; my $blast_report = new Bio::SearchIO ('-format' => 'blast', '-file' => $ARGV[0]); my $result = $blast_report->next_result; my $level = $ARGV[1]; while( my $hit = $result->next_hit) { print $hit->name; push(@arr1,$hit->name); while( my $hsp = $hit->next_hsp()) { if ($hsp->frac_identical() >= $level) { #print $hsp->hit_string, "\n"; push(@arr,$hsp->hit_string); } } } $k=@arr1; for($i=0;$i<$k;$i++){ push(@arr2,split(/|/,$arr1[$i])); #print "$arr[$i]\n"; } #$t=@arr2; Here,I am trying to use the blast output file and get the complete sequence where I found a hit but I could not get the complete sequence. i/p:- Last login: Mon Nov 16 11:57:22 on console Welcome to Darwin! lmbicip-mac1:~ cip$ ssh admin at 141.84.66.66 The authenticity of host '141.84.66.66 (141.84.66.66)' can't be established. RSA key fingerprint is 2d:4a:09:1d:2e:f3:51:c7:ba:8b:29:37:36:f6:44:db. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added '141.84.66.66' (RSA) to the list of known hosts. Password: Last login: Fri Nov 20 13:52:57 2009 from 10.153.189.239 Have a lot of fun... admin at BosLinux:~> clear admin at BosLinux:~> cd Documents/ admin at BosLinux:~/Documents> clear admin at BosLinux:~/Documents> vim blast.pl admin at BosLinux:~/Documents> clear admin at BosLinux:~/Documents> vim nnn.pl admin at BosLinux:~/Documents> vim other.pl admin at BosLinux:~/Documents> vim amino.fa admin at BosLinux:~/Documents> vim Tb09.211.2410.out admin at BosLinux:~/Documents> vim Tb09.211.2410.out |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 661 TTTGATGAAACCCCAATTCGGACGTATGAAAAGATTCTTGCGGGCCGGCTTAAATTCCCC 720 Query 721 AATTGGTTTGATGAGCGTGCGCGGGATCTCGTAAAGGGTTTATTGCAAACGGATCACACG 780 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 721 AATTGGTTTGATGAGCGTGCGCGGGATCTCGTAAAGGGTTTATTGCAAACGGATCACACG 780 Query 781 AAACGGTTGGGCACGCTGAAGGATGGCGTAGCTGATGTGAAGAATCACCCATTCTTCCGT 840 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 781 AAACGGTTGGGCACGCTGAAGGATGGCGTAGCTGATGTGAAGAATCACCCATTCTTCCGT 840 Query 841 GGTGCGAATTGGGAGAAACTCTATGGACGTCATTATAACGCCCCCATTGCCGTAAAAGTG 900 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 841 GGTGCGAATTGGGAGAAACTCTATGGACGTCATTATAACGCCCCCATTGCCGTAAAAGTG 900 Query 901 AAGAGCCCCGGCGACACAAGTAACTTTGAGTCGTATCCCGAGAGTGGAGATAAGGGTTCT 960 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 901 AAGAGCCCCGGCGACACAAGTAACTTTGAGTCGTATCCCGAGAGTGGAGATAAGGGTTCT 960 Query 961 CCTCCACTAACCCCTTCGCAACAGGTTGCATTCCGTGGTTTTTAG 1005 ||||||||||||||||||||||||||||||||||||||||||||| Sbjct 961 CCTCCACTAACCCCTTCGCAACAGGTTGCATTCCGTGGTTTTTAG 1005 >ref|XM_822286.1| Trypanosoma brucei TREU927 protein kinase A catalytic subunit isoform 2 (Tb09.211.2360) partial mRNA Length=1011 Score = 1622 bits (1798), Expect = 0.0 Identities = 944/974 (96%), Gaps = 0/974 (0%) Strand=Plus/Plus Query 32 TGTTTACCAAGCCTGACACATCGGGATGGAAGCTGAGTGACTTTGAAATGGGTGACACGC 91 |||||||||| ||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 38 TGTTTACCAAACCTGACACATCGGGATGGAAGCTGAGTGACTTTGAAATGGGTGACACGC 97 Query 92 TAGGGACCGGCTCGTTTGGTCGCGTGCGCATTGCAAAACTGAAGAGCAGGGGGGAGTATT 151 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 98 TAGGGACCGGCTCGTTTGGTCGCGTGCGCATTGCAAAACTGAAGAGCAGGGGGGAGTATT 157 Query 152 ATGCAATAAAATGTCTAAAGAAGCATGAGATACTAAAGATGAAGCAGGTACAACACCTGA 211 |||||||||||||||||||||||| ||||||||||||||||||||||||||||||||||| Sbjct 158 ATGCAATAAAATGTCTAAAGAAGCGTGAGATACTAAAGATGAAGCAGGTACAACACCTGA 217 Query 212 ACCAAGAGAAGCAAATTCTAATGGAGTTGTCACACCCCTTCATTGTGAACATGATGTGTT 271 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 218 ACCAAGAGAAGCAAATTCTAATGGAGTTGTCACACCCCTTCATTGTGAACATGATGTGTT 277 uery 272 CCTTCCAGGATGAGAACCGCGTCTACTTTGTTCTAGAATTTGTGGTAGGTGGTGAGGTAT 331 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 278 CCTTCCAGGATGAGAACCGCGTCTACTTTGTTCTAGAATTTGTGGTAGGTGGTGAGGTAT 337 Query 332 TTACTCACCTTCGTTCCGCAGGCCGTTTCCCGAATGACGTAGCGAAGTTCTATCATGCGG 391 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 338 TTACTCACCTTCGTTCCGCAGGCCGTTTCCCGAATGACGTAGCGAAGTTCTATCATGCGG 397 Query 392 AGCTTGTGTTGGCCTTTGAATATTTACACTCGAAGGACATTATCTACCGTGACTTGAAAC 451 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 398 AGCTTGTGTTGGCCTTTGAATATTTACACTCGAAGGACATTATCTACCGTGACTTGAAAC 457 Query 452 CTGAGAATCTGCTACTTGATGGGAAGGGACACGTCAAGGTGACTGATTTTGGTTTTGCTA 511 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 458 CTGAGAATCTGCTACTTGATGGGAAGGGACACGTCAAGGTGACTGATTTTGGTTTTGCTA 517 Query 512 AGAAGGTGACGGATCGTACCTATACGTTATGTGGGACACCTGAGTATCTTGCACCTGAGG 571 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 518 AGAAGGTGACGGATCGTACCTATACGTTATGTGGGACACCTGAGTATCTTGCACCTGAGG 577 Query 572 TAATTCAGAGCAAAGGACATGGGAAGGCTGTGGATTGGTGGACGATGGGTGTTTTGCTGT 631 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| It follows like this. The output I got is ATGACGACAACTCCCACTGGTGATGGCCAACTGTTTACCAAGCCTGACACATCGGGATGGAAGCTGAGTGACTTTGAAATGGGTGACACGCTAGGGACCGGCTCGTTTGGTCGCGTGCGCATTGCAAAACTGAAGAGCAGGGGGGAGTATTATGCAATAAAATGTCTAAAGAAGCATGAGATACTAAAGATGAAGCAGGTACAACACCTGAACCAAGAGAAGCAAATTCTAATGGAGTTGTCACACCCCTTCATTGTGAACATGATGTGTTCCTTCCAGGATGAGAACCGCGTCTACTTTGTTCTAGAATTTGTGGTAGGTGGTGAGGTATTTACTCACCTTCGTTCCGCAGGCCGTTTCCCGAATGACGTAGCGAAGTTCTATCATGCGGAGCTTGTGTTGGCCTTTGAATATTTACACTCGAAGGACATTATCTACCGTGACTTGAAACCTGAGAATCTGCTACTTGATGGGAAGGGACACGTCAAGGTGACTGATTTTGGTTTTGCTAAGAAGGTGACGGATCGTACCTATACGTTATGTGGGACACCTGAGTATCTTGCACCTGAGGTAATTCAGAGCAAAGGACATGGGAAGGCTGTGGATTGGTGGACGATGGGTGTTTTGCTGTATGAATTCATAGCTGGCCATCCTCCCTTTTTTGATGAAACCCCAATTCGGACGTATGAAAAGATTCTTGCGGGCCGGCTTAAATTCCCCAATTGGTTTGATGAGCGTGCGCGGGATCTCGTAAAGGGTTTATTGCAAACGGATCACACGAAACGGTTGGGCACGCTGAAGGATGGCGTAGCTGATGTGAAGAATCACCCATTCTTCCGTGGTGCGAATTGGGAGAAACTCTATGGACGTCATTATAACGCCCCCATTGCCGTAAAAGTGAAGAGCCCCGGCGACACAAGTAACTTTGAGTCGTATCCCGAGAGTGGAGATAAGGGTTCTCCTCCACTAACCCCTTCGCAACAGG TTGCATTCCGTGGTTTTTAG TGTTTACCAAACCTGACACATCGGGATGGAAGCTGAGTGACTTTGAAATGGGTGACACGCTAGGGACCGGCTCGTTTGGTCGCGTGCGCATTGCAAAACTGAAGAGCAGGGGGGAGTATTATGCAATAAAATGTCTAAAGAAGCGTGAGATACTAAAGATGAAGCAGGTACAACACCTGAACCAAGAGAAGCAAATTCTAATGGAGTTGTCACACCCCTTCATTGTGAACATGATGTGTTCCTTCCAGGATGAGAACCGCGTCTACTTTGTTCTAGAATTTGTGGTAGGTGGTGAGGTATTTACTCACCTTCGTTCCGCAGGCCGTTTCCCGAATGACGTAGCGAAGTTCTATCATGCGGAGCTTGTGTTGGCCTTTGAATATTTACACTCGAAGGACATTATCTACCGTGACTTGAAACCTGAGAATCTGCTACTTGATGGGAAGGGACACGTCAAGGTGACTGATTTTGGTTTTGCTAAGAAGGTGACGGATCGTACCTATACGTTATGTGGGACACCTGAGTATCTTGCACCTGAGGTAATTCAGAGCAAAGGACATGGGAAGGCTGTGGATTGGTGGACGATGGGTGTTTTGCTGTATGAATTCATAGCTGGCCATCCTCCCTTTTTTGATGAAACCCCAATTCGGACGTATGAAAAGATTCTTGCGGGCCGGTTCAAATTCCCCAATTGGTTTGACTCCCGTGCGCGGGATCTCGTAAAGGGTTTATTGCAAACGGATCACACGAAACGGTTGGGCACGCTGAAGGATGGCGTAGCTGATGTGAAGAATCACCCATTCTTCCGTGGTGCGAATTGGGAGAAACTCTATGGACGTCATTATCACGCTCCCATTCCTGTAAAAGTGAAGAGCCCCGGCGACACAAGTAACTTTGAGTCGTATCCCGAGAGTGGGGATAAGCGGTTGCCCCCGTTAGCACCATCACAACAATTGGAGTTCCGTGGGTTTTAG GGATGATGACCGATTGTACCTCCTCCTCGAGTATGTGGTGGGTGGCGAGCTGT TCTCCCACCTCCGGAAGGCGGGAAAATTCCCTAATGATGTAGCCAAGTTCTACTCCGCAGAAGTGGTTTTGGCGTTTGAATATATTCATGAGTGCGGCATCGTATACCGTGACTTGAAGCCAGAAAATGTGCTTTTGGACAAGCAGGGAAACATTAAGATTACGGACTTTGGGTTCGCGAAACGCGTTAGGGACAGAACGTACACGCTATGTGGGACTCCAGAGTATCTTGCGCCGGAGATAATCCAAAGTAAAGGTCACGATCGGGCTGTGGATTGGTGGACACTCGGAATTCTTCTCTATGAGATGCTTGTCGGTTATCCTCCTTTTTTCGACGAGAGTCCTTTTAGAACATACGAAAAAATTTTAGAGGGGAAACTTCAGTTTCCAAAGTGGGTGGAGATGCGGGCGAAGGACCTCATAAAGAGTTTTTTAACAATTGAACCAACGAAACG i.e.,It is only giving the region where it could find the best alignment i.e., the best hit ones. I want the complete sequence i.e., sequences corresponding to the accession numbers XM_822292.1 XM_822286.1 XM_822694.1 Database used in Remote blast was RefSeq i.e.,(refseq_rna),organism used :Trypanasoma brucei. Can any one please help me in solving this problem Regards, Roopa. On Fri, Nov 20, 2009 at 12:30 PM, Roopa Raghuveer wrote: > > Hello Roy, > > Thanks a lot for your reply.My code is working for my sequence now. > > Thanks alot. > > Regards, > Roopa. > > On Thu, Nov 19, 2009 at 5:10 PM, Roy Chaudhuri wrote: > >> Hi Roopa, >> >> I think that the -Organism parameter that you specify for >> Bio::Tools::Run::RemoteBlast is ignored - I can't find any reference to it >> in the documentation: >> >> http://search.cpan.org/~cjfields/BioPerl-1.6.1/Bio/Tools/Run/RemoteBlast.pm >> >> You have the correct approach in your code - limiting the search to the >> Entrez query "Trypanosoma brucei[ORGN]", but the line is commented out. If >> you uncomment the line (and add a semicolon afterwards), the program runs >> correctly, but no hits are reported below your threshold e-value. If you >> change the value of $e_val to 10 then some T.brucei hits are reported. >> >> Roy. >> >> Roopa Raghuveer wrote: >> >>> Hello everybody, >>> >>> I have a problem. I would like to use remote blast to find sequences >>> matching for an input sequence. >>> >>> Ex:-I would like to search sequences which match Trypanosoma Brucei >>> sequence. >>> >>> I want the output to be only Trypanosoma Brucei sequences matching with >>> my >>> query.When i tried to use remoteblast to nr database,I got sequences from >>> different organisms like E.coli,Pseudomonas etc., >>> >>> Could you please tell me how can this be solved...? >>> >>> My code is as follows. >>> >>> use Bio::Tools::Run::RemoteBlast; >>> use strict; >>> my $prog = 'blastn'; >>> my $db = 'nr'; >>> my $e_val= '1e-10'; >>> my $organism= 'Trypanosoma Brucei'; >>> >>> my @params = ( '-prog' => $prog, >>> '-data' => $db, >>> '-expect' => $e_val, >>> '-readmethod' => 'SearchIO', >>> '-Organism' => $organism ); >>> >>> my $factory = Bio::Tools::Run::RemoteBlast-> >>> new(@params); >>> >>> #change a paramter >>> #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma >>> brucei[ORGN]' >>> >>> #remove a parameter >>> #delete $Bio::Tools::Run::RemoteBlast::HEADER{'FILTER'}; >>> >>> my $v = 1; >>> #$v is just to turn on and off the messages >>> >>> my $str = Bio::SeqIO->new(-file=>'amino.fa' , '-format' => 'fasta' , >>> '-organism' => 'Trypanosoma Brucei' ); >>> >>> while (my $input = $str->next_seq()){ >>> #Blast a sequence against a database: >>> my $r = $factory->submit_blast($input); >>> #my $r = $factory->submit_blast('amino.fa'); >>> >>> print STDERR "waiting..." if( $v > 0 ); >>> while ( my @rids = $factory->each_rid ) { >>> foreach my $rid ( @rids ) { >>> my $rc = $factory->retrieve_blast($rid); >>> if( !ref($rc) ) { >>> if( $rc < 0 ) { >>> $factory->remove_rid($rid); >>> } >>> print STDERR "." if ( $v > 0 ); >>> sleep 5; >>> } >>> else { >>> my $result = $rc->next_result(); >>> #save the output >>> my $filename = $result->query_name()."\.out"; >>> $factory->save_output($filename); >>> $factory->remove_rid($rid); >>> print "\nQuery Name: ", $result->query_name(), "\n"; >>> while ( my $hit = $result->next_hit ) { >>> next unless ( $v > 0); >>> print "\thit name is ", $hit->name, "\n"; >>> while( my $hsp = $hit->next_hsp ) { >>> print "\t\tscore is ", $hsp->score, "\n"; >>> } >>> } >>> } >>> } >>> } >>> } >>> >>> My input sequence is >>> >>> ref|NC_009512.1|:385-1902 >>>> >>> GTGTCAGTGGAACTTTGGCAGCAGTGCGTGGAGCTTCTGCGCGATGAACTGCCTGCCCAGCAATTCAACA >>> CCTGGATCCGTCCGCTACAGGTCGAAGCCGAAGGCGACGAGTTGCGCGTCTATGCGCCTAACCGTTTCGT >>> TCTCGATTGGGTCAATGAAAAGTACCTGGGTCGTTTGCTCGAGCTGTTGGGTGAGAACGGTAGCGGCATT >>> GCACCAGCCCTTTCCTTATTAATAGGTAGCCGCCGCAGCTCGGCCCCAAGGGCTGCACCCAACGCGCCGG >>> TCAGCGCTGCCGTTGCGGCTTCGCTGGCGCAGACTCAGGCGCACAAGACGGCCCCGGCAGCAGCGGTTGA >>> ACCCGTTGCCGTGGCCGCGGCCGAGCCTGTATTGGTCGAGACGTCTTCGCGTGACAGCTTTGATGCCATG >>> GCCGAGCCTGCTGCTGCGCCGCCCAGTGGTGGCCGGGCTGAACAGCGCACCGTGCAGGTTGAAGGTGCGC >>> TCAAGCACACCAGTTACCTGAACCGGACCTTTACCTTTGACACCTTCGTCGAAGGTAAGTCGAACCAGCT >>> CGCCCGCGCGGCTGCCTGGCAGGTTGCGGACAACCCTAAGCATGGCTACAACCCACTGTTCCTTTATGGC >>> GGTGTGGGTTTGGGTAAAACCCACCTTATGCATGCTGTGGGTAACCATCTGCTGAAGAAGAATCCGAACG >>> CCAAGGTGGTGTACCTGCATTCGGAGCGCTTCGTCGCGGACATGGTCAAAGCGTTGCAACTCAACGCCAT >>> CAACGAATTCAAGCGCTTCTACCGCTCGGTGGACGCGTTGCTGATCGACGATATCCAGTTCTTCGCTCGC >>> AAAGAGCGCTCGCAAGAAGAGTTTTTCCACACCTTCAACGCCTTGCTTGAGGGTGGCCAGCAGGTAATCC >>> TTACCTCTGACCGCTATCCCAAGGAAATCGAAGGCCTGGAAGAGCGTCTGAAGTCGCGCTTTGGTTGGGG >>> CCTGACGGTGGCTGTCGAGCCGCCAGAGCTGGAGACCCGCGTAGCGATCCTGATGAAGAAGGCCGACCAG >>> GCCAAAGTCGAGCTCCCGCATGACGCAGCCTTTTTCATCGCTCAGCGCATCCGGTCCAACGTCCGTGAGC >>> TGGAAGGTGCACTGAAGCGAGTTATTGCTCACTCGCACTTCATGGGGCGTGACATCACCATCGAGCTGAT >>> TCGTGAATCGCTCAAGGATCTGTTGGCGCTGCAAGACAAACTGGTCAGTGTGGATAACATTCAGCGTACC >>> GTCGCTGAGTACTACAAGATCAAGATCTCCGATCTGTTGTCCAAGCGTCGTTCGCGTTCTGTCGCGCGCC >>> CGCGTCAGGTAGCCATGGCCCTGTCCAAGGAGTTGACCAACCACAGTCTGCCGGAAATCGGCGACATGTT >>> CGGTGGTCGCGACCATACGACCGTGCTGCACGCCTGCCGCAAAATCAATGAACTGAAGGAATCCGACGCG >>> GACATCCGCGAGGACTACAAGAACCTGCTGCGGACGCTGACGACCTGA >>> >>> Please mail me regarding any queries. >>> >>> Regards, >>> Roopa. >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> > From mauricio at open-bio.org Fri Nov 20 11:15:22 2009 From: mauricio at open-bio.org (Mauricio Herrera Cuadra) Date: Fri, 20 Nov 2009 10:15:22 -0600 Subject: [Bioperl-l] DANGER: hacking of bioperl wiki? In-Reply-To: <7761C2223DB54DE6B836F302D2FF6AC0@NewLife> References: <7761C2223DB54DE6B836F302D2FF6AC0@NewLife> Message-ID: <4B06C09A.8060708@open-bio.org> All OBF wikis and blogs have been upgraded and cleaned from the hack. Thanks for the heads up! Mauricio. Mark A. Jensen wrote: > Andrew-- thanks!! We're on it. > MAJ > ----- Original Message ----- From: "Andrew Grimm" > > To: > Sent: Wednesday, November 18, 2009 9:52 PM > Subject: [Bioperl-l] DANGER: hacking of bioperl wiki? > > >> Caution: read the whole email before visiting the bioperl wiki >> >> I was doing some bioinformatics-related searching using google, and >> one of the hits was to the bio dot perl dot org wiki (the FAQ in >> particular). >> >> When I did that, I was redirected to a ferdax dot com web site (a >> typo-squatting of fedex?). >> >> Some people reckon that ferdax hacks web sites and redirects google >> hits from the victim web site to their own web site. For example, this >> thread at google's webmaster central >> http://www.google.com/support/forum/p/Webmasters/thread?tid=37a36c0d1ea99819&hl=en#all >> >> (it's talking about zencart, but presumably they've since found other >> victims) >> >> Just going to the website without using google may not trigger the >> redirect. >> >> Apologies if this is a false alarm, but I don't think it is. >> >> I won't be in contact between Friday and Monday Australian time (I'll >> be at railscamp 6 in Melbourne), so I won't be able to answer any >> replies. >> >> Thanks, >> >> Andrew Grimm >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From David.Messina at sbc.su.se Fri Nov 20 11:39:53 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 20 Nov 2009 17:39:53 +0100 Subject: [Bioperl-l] Remote blast In-Reply-To: References: <4B056DF4.2030502@gmail.com> Message-ID: <7ECF627D-3DBF-4575-89CF-FA6348C88E8E@sbc.su.se> Hi Roopa, As far as I know, a BLAST report never contains the complete sequences of the hits. If it includes any part of the hit's sequence, it will be the part that matches the query. You'll have to use the hit's ID or accession to get its complete sequence from somewhere else. You can use Bio::DB::Genbank to do that, for example. See http://www.bioperl.org/wiki/HOWTO:Beginners#Retrieving_a_sequence_from_a_database Dave From alessandra.bilardi at gmail.com Fri Nov 20 12:44:18 2009 From: alessandra.bilardi at gmail.com (Alessandra) Date: Fri, 20 Nov 2009 18:44:18 +0100 Subject: [Bioperl-l] Bio::DB::EUtilities question Message-ID: Hi all, I'm testing Bio::DB::EUtilities - webagent which interacts with and retrieves data from NCBI's eUtils. My perl script works but it works only if I request less than ~450 times get_Response function.. else I have got this error message: ------------- EXCEPTION ------------- MSG: Response Error Can't connect to eutils.ncbi.nlm.nih.gov:80 (connect: No route to host) STACK Bio::DB::GenericWebAgent::get_Response /usr/local/share/perl/5.10.0/Bio/DB/GenericWebAgent.pm:215 STACK toplevel ./wget4gbk.pl:77 ------------------------------------- wget4gbk.pl lines 76-77 are: my $req = Bio::DB::EUtilities->new(-db => 'genome', -eutil => 'esummary', -retmode => $mode, -rettype => $type, -id => $id); my $entry = $req->get_Response; I run perl script more ten times and this error arrives random time at the range 300-600 requests. If I use another system to request data, then I can to do ~ 10000 requests, without errors. Had I to set EUtilities object with particular parameters? Can you help me about random exception error? Best, -- Alessandra Bilardi, Ph. D. ---- CRIBI, University of Padova, Italy http://www.linkedin.com/in/bilardi ---- From maj at fortinbras.us Fri Nov 20 13:42:38 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 20 Nov 2009 13:42:38 -0500 Subject: [Bioperl-l] gravatars on the wiki Message-ID: <94431678F3764E8C9A49EA4D2FCD0DBD@NewLife> Hi all, You can now reveal your Gravatar (http://www.gravatar.com) on the wiki, by including the following markup on the page: {{#gravatar|youremail -at- yourplace -dot- tld}} You can do the antispam measure above, or use a regular email. Invalid emails throw an error. http://bioperl.org/wiki/Gravatars Happy coding, MAJ From roychu at gmail.com Fri Nov 20 15:23:21 2009 From: roychu at gmail.com (Chu, Roy) Date: Fri, 20 Nov 2009 12:23:21 -0800 Subject: [Bioperl-l] Installing Bio-perl on dreamhost via CPAN In-Reply-To: References: <4d7f3e450911200221m5f39ace2hb979712115fb9d78@mail.gmail.com> <20091120104445.GG31318@kunpuu.plessy.org> Message-ID: <4d7f3e450911201223w59cb308q5af7690a28697966@mail.gmail.com> "sounds very much like you process was killed for prolonged execution time, or memory usage. We have a daemon in place that monitors for processes that take up too much of a shared web server's resources, and this may have kicked in (and often does when trying to install packages on a shared server)." This was the explanation they had. Regarding asking their admins to install, it seems is a "they'll try to get to it but don't hold your breath situation." Hmmm, I tried some other attempts, installing 1.4.0 posed no problems. I'm not a perl guru, so I tried to increase the build cache size from the default, 10 MB, hoping that that may be the problem--can't imagine how though, since I can't imagine how big the whole package version can differ by (though honestly, I haven't checked). Whenever I try to install 1.6.1, it runs into a problem I guess after the 'make' step and lists the modules--BioPerl-1.6.0/t/Variation/SeqDiff.t BioPerl-1.6.0/t/Variation/SNP.t BioPerl-1.6.0/t/Variation/Variation_IO.t --and typically gets killed here '> Killed' Next, I tried 1.6.0, then I get this: "(I think you ran Build.PL directly, so will use CPAN to install prerequisites on demand) CPAN: Storable loaded ok (v2.12) Going to read '/home/$username/.cpan/Metadata' Killed" (everything prior works and it seems to get further along than when I try to install 1.6.1) Any insight into why this may be happening would be appreciated. Something EQUALLY appreciated would be a recommendation of a decent enough hosting service where someone has had success installing Bio-Perl. I'd try to set up my Mac web sharing feature and then try to setup the stuff locally, but I haven't yet been able to successfully get the port forwarding feature working properly on the apple airport extreme--perplexing. Next, I might just try to install via the Build.pl script. Hmm, checking the wiki, it seems I'll still be able to run remote blast and use the basic seq modules, although some discrepancies and idiosyncrasies may be expected? Any head-ups about any false assumptions by me would be greatly appreciated. Thanks in advance, Roy On Fri, Nov 20, 2009 at 5:00 AM, Chris Fields wrote: > > On Nov 20, 2009, at 4:44 AM, Charles Plessy wrote: > >> Le Fri, Nov 20, 2009 at 02:21:54AM -0800, Chu, Roy a ?crit : >>> >>> Does anyone use dreamhost as a web hosting service? ?I'm just curious >>> if anyone has had any luck installing the module as their daemon seems >>> to kill my process whenever I try to install it. ?Dreamhost tech >>> support attributes it to either exceeding the allocated memory cache >>> or exceeding the processing time. ?I tried to nice the process, but >>> that didn't help for me. ?Any luck or experience in resolving this >>> would be much appreciated. ?I suppose my next attempt would be to try >>> installing it directly and hope I don't need root... >> >> Dear Roy, >> >> DreamHost uses Debian, so you can suggest them to install the Debian package. >> If you are in contact with the tech service, do not hesitate to tell them to >> contact me if they are interested by a backport of the 1.6.0 package. For >> version 1.6.1, it may be more difficult as it depends on perl 5.10.1. > > Any reason why this is so? ?We specify compatibility back to 5.6.1. > > Alex mentioned the reliance on the specific Extutils::Manifest version. ?The version requested has an important bug fix, is present on CPAN, and is backwards-compatible to 5.6.1. ?It should be fairly easy to request that as a separate package. > > A strict requirement for perl 5.10.1 doesn't make sense in that light, unless said perl maintainer can enlighten us as to why this is an issue? ?This one may require a ranty blog post. > >> PS: if you propse to install BioPerl as a feature in the Dreamhost panel, I >> will vote for it :) >> >> Have a nice day, >> >> -- >> Charles Plessy >> Debian Med packaging team, >> http://www.debian.org/devel/debian-med >> Tsurumi, Kanagawa, Japan > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Fri Nov 20 15:40:24 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 20 Nov 2009 14:40:24 -0600 Subject: [Bioperl-l] Installing Bio-perl on dreamhost via CPAN In-Reply-To: <4d7f3e450911201223w59cb308q5af7690a28697966@mail.gmail.com> References: <4d7f3e450911200221m5f39ace2hb979712115fb9d78@mail.gmail.com> <20091120104445.GG31318@kunpuu.plessy.org> <4d7f3e450911201223w59cb308q5af7690a28697966@mail.gmail.com> Message-ID: <1D1B0987-3309-4281-BCE0-2737E4F0D0B1@illinois.edu> BioPerl is pure perl. If you believe all dependencies are installed, just unpack the dist to a specific directory and point PERL5LIB at it (for bash): export PERL5LIB=/home/USER/bioperl/bioperl-live Note that if you plan on doing the same for other bioperl-related modules (ex: bioperl-db) you'll need to add 'lib' to it, as they use a generic Module::Build now. export PERL5LIB=/home/USER/bioperl/bioperl-db/lib You can also add a 'use lib' directive in your scripts as well. More at the following link: http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix#USING_MODULES_NOT_INSTALLED_IN_THE_STANDARD_LOCATION chris On Nov 20, 2009, at 2:23 PM, Chu, Roy wrote: > "sounds very much like you process was killed for prolonged execution > time, or memory usage. We have a daemon in place that monitors for > processes that take up too much of a shared web server's resources, and > this may have kicked in (and often does when trying to install packages > on a shared server)." > > This was the explanation they had. Regarding asking their admins to > install, it seems is a "they'll try to get to it but don't hold your > breath situation." > > Hmmm, I tried some other attempts, installing 1.4.0 posed no problems. > I'm not a perl guru, so I tried to increase the build cache size from > the default, 10 MB, hoping that that may be the problem--can't imagine > how though, since I can't imagine how big the whole package version > can differ by (though honestly, I haven't checked). > Whenever I try to install 1.6.1, it runs into a problem I guess after > the 'make' step and lists the > modules--BioPerl-1.6.0/t/Variation/SeqDiff.t > BioPerl-1.6.0/t/Variation/SNP.t > BioPerl-1.6.0/t/Variation/Variation_IO.t > --and typically gets killed here '> Killed' > > Next, I tried 1.6.0, then I get this: > "(I think you ran Build.PL directly, so will use CPAN to install > prerequisites on demand) > CPAN: Storable loaded ok (v2.12) > Going to read '/home/$username/.cpan/Metadata' > Killed" (everything prior works and it seems to get further along than > when I try to install 1.6.1) > > Any insight into why this may be happening would be appreciated. > Something EQUALLY appreciated would be a recommendation of a decent > enough hosting service where someone has had success installing > Bio-Perl. I'd try to set up my Mac web sharing feature and then try > to setup the stuff locally, but I haven't yet been able to > successfully get the port forwarding feature working properly on the > apple airport extreme--perplexing. Next, I might just try to install > via the Build.pl script. > > Hmm, checking the wiki, it seems I'll still be able to run remote > blast and use the basic seq modules, although some discrepancies and > idiosyncrasies may be expected? Any head-ups about any false > assumptions by me would be greatly appreciated. > > Thanks in advance, > Roy > > On Fri, Nov 20, 2009 at 5:00 AM, Chris Fields wrote: >> >> On Nov 20, 2009, at 4:44 AM, Charles Plessy wrote: >> >>> Le Fri, Nov 20, 2009 at 02:21:54AM -0800, Chu, Roy a ?crit : >>>> >>>> Does anyone use dreamhost as a web hosting service? I'm just curious >>>> if anyone has had any luck installing the module as their daemon seems >>>> to kill my process whenever I try to install it. Dreamhost tech >>>> support attributes it to either exceeding the allocated memory cache >>>> or exceeding the processing time. I tried to nice the process, but >>>> that didn't help for me. Any luck or experience in resolving this >>>> would be much appreciated. I suppose my next attempt would be to try >>>> installing it directly and hope I don't need root... >>> >>> Dear Roy, >>> >>> DreamHost uses Debian, so you can suggest them to install the Debian package. >>> If you are in contact with the tech service, do not hesitate to tell them to >>> contact me if they are interested by a backport of the 1.6.0 package. For >>> version 1.6.1, it may be more difficult as it depends on perl 5.10.1. >> >> Any reason why this is so? We specify compatibility back to 5.6.1. >> >> Alex mentioned the reliance on the specific Extutils::Manifest version. The version requested has an important bug fix, is present on CPAN, and is backwards-compatible to 5.6.1. It should be fairly easy to request that as a separate package. >> >> A strict requirement for perl 5.10.1 doesn't make sense in that light, unless said perl maintainer can enlighten us as to why this is an issue? This one may require a ranty blog post. >> >>> PS: if you propse to install BioPerl as a feature in the Dreamhost panel, I >>> will vote for it :) >>> >>> Have a nice day, >>> >>> -- >>> Charles Plessy >>> Debian Med packaging team, >>> http://www.debian.org/devel/debian-med >>> Tsurumi, Kanagawa, Japan >> >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From charles-listes+bioperl at plessy.org Fri Nov 20 20:07:23 2009 From: charles-listes+bioperl at plessy.org (Charles Plessy) Date: Sat, 21 Nov 2009 10:07:23 +0900 Subject: [Bioperl-l] Installing Bio-perl on dreamhost via CPAN In-Reply-To: References: <4d7f3e450911200221m5f39ace2hb979712115fb9d78@mail.gmail.com> <20091120104445.GG31318@kunpuu.plessy.org> Message-ID: <20091121010723.GA7786@kunpuu.plessy.org> Le Fri, Nov 20, 2009 at 07:00:45AM -0600, Chris Fields a ?crit : > On Nov 20, 2009, at 4:44 AM, Charles Plessy wrote: > > > > DreamHost uses Debian, so you can suggest them to install the Debian > > package. If you are in contact with the tech service, do not hesitate to > > tell them to contact me if they are interested by a backport of the 1.6.0 > > package. For version 1.6.1, it may be more difficult as it depends on perl > > 5.10.1. > > Any reason why this is so? We specify compatibility back to 5.6.1. Dear Chris, you make a good point: although for building we need to either depend on perl 5.10.1 or package separately Extutils::Manifest, the resulting bioperl package does not depend on such a high version. Therefore, there is no need for a backport, and the latest Debian package can be installed on Debian stable (5.0/Lenny) system. I just checked the Dreamhost machine on which I happen to have an acces, ?waratahs?, and it seems to be older, but nevertheless it may be worth asking the admins anyway (with the big drawback that they would have to be asked for each update). Have a nice week-end, -- Charles Plessy Debian Med packaging team, http://www.debian.org/devel/debian-med Tsurumi, Kanagawa, Japan From robert.bradbury at gmail.com Fri Nov 20 20:40:14 2009 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Fri, 20 Nov 2009 20:40:14 -0500 Subject: [Bioperl-l] Excessive CPU use by various Bioperl sites Message-ID: I run a Linux system which is in a gradual process of evolution from the default Linux browsers (Galeon, Epiphany, etc.) through Firefox (better) to Google's Chromium (IMO, perhaps the best so far). Chromium allows one to create a process per tab/URL so one can effectively track what it is doing. It also allows one to track the machine usage of these processes (through the Developer > Task manager [shift-escape keyboard] option) which though expensive in terms of overhead allows one to track offending windows (in terms of memory or CPU use). My processor recently jumped from a typical 700 MHz to 1.4 GHz speed (using the Linux Ondemand scheduler - which saves ~20 W at the wall outlet -- I've measured it) to the full tilt 2.8 GHz the CPU is capable of. Looking at the chrome task manager I was not surprised to find the NY Times high on the list (they are pushing content, esp. using Javascript) but much to my dismay the Jalview and Howto:Trees:Bioperl appeared to be high on the list. Now I am forced to ask myself *why* sites which are simply distributing static information are eating up CPU on my machine! This is a fundamental flaw in the architecture of the sites -- wherein there should be conscious efforts to minimize user-CPU use (or avoid Javascript entirely). This would not be a problem if I were using Firefox as I can easily use NoScript to block Javacscript from non-approved sites. But it raises the question of when one should allow Javascript to run (one would "normally" approve academic sites by default) when even the academic sites are abusing my CPU. There needs to be much greater awareness both on the part of software distributors and software consumers that it is *MY* CPU and *MY* Electricty and *MY* contribution to global warming. And the developers/distributors should not be sucking down those resources without first saying "May I?" and I have the option of saying "No you may not." There is enough we can do productively (running low homology blast searches) without engaging in endless wheel spinning of Javascripts or looped GIFs. Robert From maj at fortinbras.us Fri Nov 20 23:17:12 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 20 Nov 2009 23:17:12 -0500 Subject: [Bioperl-l] ohlohers Message-ID: You can now add your Ohloh widgets and increase your carbon footprint with the less crufty: {{#ohloh|acct_id|TYPE}} where TYPE is [Detailed|Rank|Tiny]. Taint checks aplenty. MAJ From maj at fortinbras.us Fri Nov 20 23:33:02 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 20 Nov 2009 23:33:02 -0500 Subject: [Bioperl-l] Installing Bio-perl on dreamhost via CPAN In-Reply-To: <4d7f3e450911201223w59cb308q5af7690a28697966@mail.gmail.com> References: <4d7f3e450911200221m5f39ace2hb979712115fb9d78@mail.gmail.com><20091120104445.GG31318@kunpuu.plessy.org> <4d7f3e450911201223w59cb308q5af7690a28697966@mail.gmail.com> Message-ID: <9ECC66C2F23F47469AF0F07E3F9307FC@NewLife> Maybe 'nightmarehost' is more appropriate. I've had no problems on AWS, but this may not exactly what you need. MAJ ----- Original Message ----- From: "Chu, Roy" To: Sent: Friday, November 20, 2009 3:23 PM Subject: Re: [Bioperl-l] Installing Bio-perl on dreamhost via CPAN "sounds very much like you process was killed for prolonged execution time, or memory usage. We have a daemon in place that monitors for processes that take up too much of a shared web server's resources, and this may have kicked in (and often does when trying to install packages on a shared server)." This was the explanation they had. Regarding asking their admins to install, it seems is a "they'll try to get to it but don't hold your breath situation." Hmmm, I tried some other attempts, installing 1.4.0 posed no problems. I'm not a perl guru, so I tried to increase the build cache size from the default, 10 MB, hoping that that may be the problem--can't imagine how though, since I can't imagine how big the whole package version can differ by (though honestly, I haven't checked). Whenever I try to install 1.6.1, it runs into a problem I guess after the 'make' step and lists the modules--BioPerl-1.6.0/t/Variation/SeqDiff.t BioPerl-1.6.0/t/Variation/SNP.t BioPerl-1.6.0/t/Variation/Variation_IO.t --and typically gets killed here '> Killed' Next, I tried 1.6.0, then I get this: "(I think you ran Build.PL directly, so will use CPAN to install prerequisites on demand) CPAN: Storable loaded ok (v2.12) Going to read '/home/$username/.cpan/Metadata' Killed" (everything prior works and it seems to get further along than when I try to install 1.6.1) Any insight into why this may be happening would be appreciated. Something EQUALLY appreciated would be a recommendation of a decent enough hosting service where someone has had success installing Bio-Perl. I'd try to set up my Mac web sharing feature and then try to setup the stuff locally, but I haven't yet been able to successfully get the port forwarding feature working properly on the apple airport extreme--perplexing. Next, I might just try to install via the Build.pl script. Hmm, checking the wiki, it seems I'll still be able to run remote blast and use the basic seq modules, although some discrepancies and idiosyncrasies may be expected? Any head-ups about any false assumptions by me would be greatly appreciated. Thanks in advance, Roy On Fri, Nov 20, 2009 at 5:00 AM, Chris Fields wrote: > > On Nov 20, 2009, at 4:44 AM, Charles Plessy wrote: > >> Le Fri, Nov 20, 2009 at 02:21:54AM -0800, Chu, Roy a ?crit : >>> >>> Does anyone use dreamhost as a web hosting service? I'm just curious >>> if anyone has had any luck installing the module as their daemon seems >>> to kill my process whenever I try to install it. Dreamhost tech >>> support attributes it to either exceeding the allocated memory cache >>> or exceeding the processing time. I tried to nice the process, but >>> that didn't help for me. Any luck or experience in resolving this >>> would be much appreciated. I suppose my next attempt would be to try >>> installing it directly and hope I don't need root... >> >> Dear Roy, >> >> DreamHost uses Debian, so you can suggest them to install the Debian package. >> If you are in contact with the tech service, do not hesitate to tell them to >> contact me if they are interested by a backport of the 1.6.0 package. For >> version 1.6.1, it may be more difficult as it depends on perl 5.10.1. > > Any reason why this is so? We specify compatibility back to 5.6.1. > > Alex mentioned the reliance on the specific Extutils::Manifest version. The > version requested has an important bug fix, is present on CPAN, and is > backwards-compatible to 5.6.1. It should be fairly easy to request that as a > separate package. > > A strict requirement for perl 5.10.1 doesn't make sense in that light, unless > said perl maintainer can enlighten us as to why this is an issue? This one may > require a ranty blog post. > >> PS: if you propse to install BioPerl as a feature in the Dreamhost panel, I >> will vote for it :) >> >> Have a nice day, >> >> -- >> Charles Plessy >> Debian Med packaging team, >> http://www.debian.org/devel/debian-med >> Tsurumi, Kanagawa, Japan > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Fri Nov 20 23:38:23 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 20 Nov 2009 22:38:23 -0600 Subject: [Bioperl-l] Excessive CPU use by various Bioperl sites In-Reply-To: References: Message-ID: <8163BC62-3F3E-4936-AAA9-61A4FB307C99@illinois.edu> Robert, Not sure why you're seeing that, but the HOWTO (and, AFAIK, the wiki in general) do not use JS, unless there is a specific addition I'm unaware of. Now, the site wiki was recently 'parasited' for redirects, which may be the culprit, but this is now fixed. Can you at least retest to see if this persists? Anyone else know about this? chris On Nov 20, 2009, at 7:40 PM, Robert Bradbury wrote: > I run a Linux system which is in a gradual process of evolution from the > default Linux browsers (Galeon, Epiphany, etc.) through Firefox (better) to > Google's Chromium (IMO, perhaps the best so far). Chromium allows one to > create a process per tab/URL so one can effectively track what it is doing. > It also allows one to track the machine usage of these processes (through > the Developer > Task manager [shift-escape keyboard] option) which though > expensive in terms of overhead allows one to track offending windows (in > terms of memory or CPU use). My processor recently jumped from a typical > 700 MHz to 1.4 GHz speed (using the Linux Ondemand scheduler - which saves > ~20 W at the wall outlet -- I've measured it) to the full tilt 2.8 GHz the > CPU is capable of. Looking at the chrome task manager I was not surprised > to find the NY Times high on the list (they are pushing content, esp. using > Javascript) but much to my dismay the Jalview and Howto:Trees:Bioperl > appeared to be high on the list. Now I am forced to ask myself *why* sites > which are simply distributing static information are eating up CPU on my > machine! This is a fundamental flaw in the architecture of the sites -- > wherein there should be conscious efforts to minimize user-CPU use (or avoid > Javascript entirely). This would not be a problem if I were using Firefox > as I can easily use NoScript to block Javacscript from non-approved sites. > But it raises the question of when one should allow Javascript to run (one > would "normally" approve academic sites by default) when even the academic > sites are abusing my CPU. There needs to be much greater awareness both on > the part of software distributors and software consumers that it is *MY* CPU > and *MY* Electricty and *MY* contribution to global warming. And the > developers/distributors should not be sucking down those resources without > first saying "May I?" and I have the option of saying "No you may not." > There is enough we can do productively (running low homology blast > searches) without engaging in endless wheel spinning of Javascripts or > looped GIFs. > > Robert > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From sdavis2 at mail.nih.gov Sat Nov 21 00:11:34 2009 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Fri, 20 Nov 2009 21:11:34 -0800 Subject: [Bioperl-l] Excessive CPU use by various Bioperl sites In-Reply-To: <8163BC62-3F3E-4936-AAA9-61A4FB307C99@illinois.edu> References: <8163BC62-3F3E-4936-AAA9-61A4FB307C99@illinois.edu> Message-ID: <264855a00911202111u4b1f1020r4aa6e0e9b0ea61@mail.gmail.com> On Fri, Nov 20, 2009 at 8:38 PM, Chris Fields wrote: > Robert, > > Not sure why you're seeing that, but the HOWTO (and, AFAIK, the wiki in > general) do not use JS, unless there is a specific addition I'm unaware of. > Now, the site wiki was recently 'parasited' for redirects, which may be the > culprit, but this is now fixed. Can you at least retest to see if this > persists? > > Anyone else know about this? > > The page in question does include javascript, it appears from the source. This is a function of using mediawiki, though, I believe and not something specific to that page. Sean From cjfields at illinois.edu Sat Nov 21 00:20:37 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 20 Nov 2009 23:20:37 -0600 Subject: [Bioperl-l] Excessive CPU use by various Bioperl sites In-Reply-To: <264855a00911202111u4b1f1020r4aa6e0e9b0ea61@mail.gmail.com> References: <8163BC62-3F3E-4936-AAA9-61A4FB307C99@illinois.edu> <264855a00911202111u4b1f1020r4aa6e0e9b0ea61@mail.gmail.com> Message-ID: On Nov 20, 2009, at 11:11 PM, Sean Davis wrote: > On Fri, Nov 20, 2009 at 8:38 PM, Chris Fields wrote: > >> Robert, >> >> Not sure why you're seeing that, but the HOWTO (and, AFAIK, the wiki in >> general) do not use JS, unless there is a specific addition I'm unaware of. >> Now, the site wiki was recently 'parasited' for redirects, which may be the >> culprit, but this is now fixed. Can you at least retest to see if this >> persists? >> >> Anyone else know about this? >> >> > The page in question does include javascript, it appears from the source. > This is a function of using mediawiki, though, I believe and not something > specific to that page. > > Sean Sean, thanks for pointing that out. chris From robert.bradbury at gmail.com Sat Nov 21 13:26:05 2009 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Sat, 21 Nov 2009 13:26:05 -0500 Subject: [Bioperl-l] Bio::DB::EUtilities question In-Reply-To: References: Message-ID: It sounds like NCBI may be counting frequency of requests, how much data they send or something similar. Are you delaying the time between fetches? The code I've seen typically sleeps for a few seconds each time around a loop. You might try longer delays between fetches and see if that gets you any more data. Alternatively perhaps the libraries aren't reusing the TCP/IP connection properly. Is there a difference between the amount of memory on the machines? Have you watched the size of the process to see if it grows over time? I think the bug which prevented me from fetching a not-so-large genome from a few months ago (eating up 3GB of memory in the process) has not been resolved. If so that could be your problem. Robert On Fri, Nov 20, 2009 at 12:44 PM, Alessandra wrote: > > > I'm testing Bio::DB::EUtilities - webagent which interacts with and > retrieves data from NCBI's eUtils. My perl script works but it works > only if I request less than ~450 times get_Response function.. else I > have got this error message: > > ------------- EXCEPTION ------------- > MSG: Response Error > Can't connect to eutils.ncbi.nlm.nih.gov:80 (connect: No route to host) > STACK Bio::DB::GenericWebAgent::get_Response > /usr/local/share/perl/5.10.0/Bio/DB/GenericWebAgent.pm:215 > STACK toplevel ./wget4gbk.pl:77 > From cjfields at illinois.edu Sat Nov 21 14:19:24 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 21 Nov 2009 13:19:24 -0600 Subject: [Bioperl-l] Bio::DB::EUtilities question In-Reply-To: References: Message-ID: <837CE7E7-E625-4285-AD54-06FD168C0DF3@illinois.edu> NCBI has specific rules about the repeated queries to its servers: http://eutils.ncbi.nlm.nih.gov/#UserSystemRequirements Acc. to that, if you are making over 100 requests at peak times you will run into problems (they'll probably temp-block your IP), even if the timeout is much shorter now (it's 3 requests/second, whereas a year or two ago it was once every 3 sec). In general it's best to run something like this during off-hours. The actual limit on number of server requests is one specific part of Bio::DB::EUtilities that hasn't been added yet, but is tentatively planned. chris On Nov 21, 2009, at 12:26 PM, Robert Bradbury wrote: > It sounds like NCBI may be counting frequency of requests, how much data > they send or something similar. Are you delaying the time between fetches? > The code I've seen typically sleeps for a few seconds each time around a > loop. You might try longer delays between fetches and see if that gets you > any more data. > > Alternatively perhaps the libraries aren't reusing the TCP/IP connection > properly. Is there a difference between the amount of memory on the > machines? Have you watched the size of the process to see if it grows over > time? I think the bug which prevented me from fetching a not-so-large > genome from a few months ago (eating up 3GB of memory in the process) has > not been resolved. If so that could be your problem. > > Robert > > On Fri, Nov 20, 2009 at 12:44 PM, Alessandra > wrote: >> >> >> I'm testing Bio::DB::EUtilities - webagent which interacts with and >> retrieves data from NCBI's eUtils. My perl script works but it works >> only if I request less than ~450 times get_Response function.. else I >> have got this error message: >> >> ------------- EXCEPTION ------------- >> MSG: Response Error >> Can't connect to eutils.ncbi.nlm.nih.gov:80 (connect: No route to host) >> STACK Bio::DB::GenericWebAgent::get_Response >> /usr/local/share/perl/5.10.0/Bio/DB/GenericWebAgent.pm:215 >> STACK toplevel ./wget4gbk.pl:77 >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Sat Nov 21 21:58:37 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 21 Nov 2009 20:58:37 -0600 Subject: [Bioperl-l] BioPerl on FLOSS Weekly Message-ID: <05EB7AF4-8A20-4046-A585-FBF41EA8350A@illinois.edu> Jason and I were recently interviewed (Wednesday!) about BioPerl for FLOSS Weekly by Randal Schwartz, Leo Laporte, Marc Pelletier, and Kirsten Sanford. The interview is now available online, so get your favorite flavor (MP3, podcast) here: http://twit.tv/floss96 Enjoy! chris and jason From adsj at novozymes.com Sun Nov 22 07:37:40 2009 From: adsj at novozymes.com (Adam =?iso-8859-1?Q?Sj=F8gren?=) Date: Sun, 22 Nov 2009 13:37:40 +0100 Subject: [Bioperl-l] BioPerl on FLOSS Weekly In-Reply-To: <05EB7AF4-8A20-4046-A585-FBF41EA8350A@illinois.edu> (Chris Fields's message of "Sat, 21 Nov 2009 20:58:37 -0600") References: <05EB7AF4-8A20-4046-A585-FBF41EA8350A@illinois.edu> Message-ID: <87aaye91m3.fsf@topper.koldfront.dk> On Sat, 21 Nov 2009 20:58:37 -0600, Chris wrote: > Jason and I were recently interviewed (Wednesday!) about BioPerl for > FLOSS Weekly by Randal Schwartz, Leo Laporte, Marc Pelletier, and > Kirsten Sanford. Great! How about linking to it on bioperl.org? :-), Adam -- Adam Sj?gren adsj at novozymes.com From cjfields at illinois.edu Sun Nov 22 15:30:01 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 22 Nov 2009 14:30:01 -0600 Subject: [Bioperl-l] BioPerl on FLOSS Weekly In-Reply-To: <87aaye91m3.fsf@topper.koldfront.dk> References: <05EB7AF4-8A20-4046-A585-FBF41EA8350A@illinois.edu> <87aaye91m3.fsf@topper.koldfront.dk> Message-ID: <2F050081-8B44-4B4C-82D2-7AC71F156588@illinois.edu> On Nov 22, 2009, at 6:37 AM, Adam Sj?gren wrote: > On Sat, 21 Nov 2009 20:58:37 -0600, Chris wrote: > >> Jason and I were recently interviewed (Wednesday!) about BioPerl for >> FLOSS Weekly by Randal Schwartz, Leo Laporte, Marc Pelletier, and >> Kirsten Sanford. > > Great! > > How about linking to it on bioperl.org? > > > :-), > > Adam > > -- > Adam Sj?gren > adsj at novozymes.com Now posted via O|B|F News; I'll try to make that feed more prominent on the main page. Since this is the second such interview (Jason did one a few years back for PerlCast), I'm thinking we need a media page of some sort. chris From maj at fortinbras.us Sun Nov 22 15:48:39 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sun, 22 Nov 2009 15:48:39 -0500 Subject: [Bioperl-l] BioPerl on FLOSS Weekly In-Reply-To: <2F050081-8B44-4B4C-82D2-7AC71F156588@illinois.edu> References: <05EB7AF4-8A20-4046-A585-FBF41EA8350A@illinois.edu><87aaye91m3.fsf@topper.koldfront.dk> <2F050081-8B44-4B4C-82D2-7AC71F156588@illinois.edu> Message-ID: <247658CC6D9A4529B281F4482BD3E4BD@NewLife> We do have http://www.bioperl.org/wiki/Category:BioPerl_Media -- ----- Original Message ----- From: "Chris Fields" To: "Adam Sj?gren" Cc: Sent: Sunday, November 22, 2009 3:30 PM Subject: Re: [Bioperl-l] BioPerl on FLOSS Weekly On Nov 22, 2009, at 6:37 AM, Adam Sj?gren wrote: > On Sat, 21 Nov 2009 20:58:37 -0600, Chris wrote: > >> Jason and I were recently interviewed (Wednesday!) about BioPerl for >> FLOSS Weekly by Randal Schwartz, Leo Laporte, Marc Pelletier, and >> Kirsten Sanford. > > Great! > > How about linking to it on bioperl.org? > > > :-), > > Adam > > -- > Adam Sj?gren > adsj at novozymes.com Now posted via O|B|F News; I'll try to make that feed more prominent on the main page. Since this is the second such interview (Jason did one a few years back for PerlCast), I'm thinking we need a media page of some sort. chris _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From jardim.rodrigo at gmail.com Sun Nov 22 11:06:40 2009 From: jardim.rodrigo at gmail.com (Rodrigo Jardim) Date: Sun, 22 Nov 2009 14:06:40 -0200 Subject: [Bioperl-l] Problems with Genbank Proteins File Message-ID: I have been problem to parser genbank protein file. I think that because this file have a other order of fields. For example: In most general genbank files: ======================== LOCUS AA399704 183 bp mRNA linear EST 03-MAR-2000 ACCESSION AA399704 VERSION AA399704.1 GI:2053305 DEFINITION TEUF0001 T.cruzi epimastigote non-normalized cDNA Library Trypanosoma cruzi cDNA clone 1 5' similar to T. cruzi gene for histone H2b (X60982), mRNA sequence. KEYWORDS EST. SOURCE Trypanosoma cruzi In genbank protein files: =================== LOCUS XP_628849 510 aa linear INV 31-OCT-2008 DEFINITION hypothetical protein [Dictyostelium discoideum AX4]. ACCESSION XP_628849 VERSION XP_628849.1 GI:66799847 DBSOURCE REFSEQ: accession XM_628847.1 KEYWORDS . SOURCE Dictyostelium discoideum AX4. When I try to parser, Bioperl abort with message error. Any ideas? Thanks all, -- Atc, Rodrigo Jardim jardim.rodrigo at gmail.com From biopython at maubp.freeserve.co.uk Mon Nov 23 12:36:36 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 23 Nov 2009 17:36:36 +0000 Subject: [Bioperl-l] Problems with Genbank Proteins File In-Reply-To: References: Message-ID: <320fb6e00911230936ofb9d897rbd45abb73a361250@mail.gmail.com> On Sun, Nov 22, 2009 at 4:06 PM, Rodrigo Jardim wrote: > I have been problem to parser genbank protein file. I think that because > this file have a other order of fields. For example: > > ... > > When I try to parser, Bioperl abort with message error. > > Any ideas? There are some important bits of information missing - what is the error message, and what version of BioPerl are you using? Peter From maj at fortinbras.us Mon Nov 23 12:58:46 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 23 Nov 2009 12:58:46 -0500 Subject: [Bioperl-l] building samtools/Bio::DB::Sam on cygwin Message-ID: Hi All-- I've had some hard-won success installing samtools and Lincoln's Bio::DB::Sam under cygwin; thought some on the list would be able to use my notes. (Yes, Jason, I'm working on Bio::Tools::Run::BWA...) (To get the current samtools, ping http://sourceforge.net/projects/samtools/files/samtools/0.1.7/samtools-0.1.7a.tar.bz2/download ) * Getting samtools to make from scratch in cygwin The following diff details the changes to the samtools Makefile I made by hand. The key points are -D_WIN32 and the additional variable LFLAGS and its interpolations. To get the linker to see libgcc libstdc++ I needed to add symlinks from /lib to the correct files in /lib/gcc/i386-pc-cygwin/4.3.2/. Your gcc version may differ. --- ../old/samtools-0.1.7a/Makefile 2009-11-16 10:13:43.000000000 -0500 +++ Makefile 2009-11-23 12:14:18.529000000 -0500 @@ -1,16 +1,18 @@ CC= gcc CFLAGS= -g -Wall -O2 #-m64 #-arch ppc -DFLAGS= -D_FILE_OFFSET_BITS=64 -D_USE_KNETFILE -D_CURSES_LIB=1 +LFLAGS= -lws2_32 -lgcc -lcygwin -lbz2 -lz -lstdc++ +DFLAGS= -D_WIN32 -D_FILE_OFFSET_BITS=64 -D_CURSES_LIB=1 LOBJS= bgzf.o kstring.o bam_aux.o bam.o bam_import.o sam.o bam_index.o \ bam_pileup.o bam_lpileup.o bam_md.o glf.o razf.o faidx.o knetfile.o \ bam_sort.o sam_header.o AOBJS= bam_tview.o bam_maqcns.o bam_plcmd.o sam_view.o \ bam_rmdup.o bam_rmdupse.o bam_mate.o bam_stat.o bam_color.o \ bamtk.o kaln.o @@ -36,13 +38,13 @@ $(AR) -cru $@ $(LOBJS) samtools:lib $(AOBJS) - $(CC) $(CFLAGS) -o $@ $(AOBJS) -lm $(LIBPATH) $(LIBCURSES) -lz -L. -lbam + $(CC) $(CFLAGS) -o $@ $(AOBJS) -Xlinker --enable-auto-import -lm $(LIBPATH) $(LIBCURSES) -lz -L. -lbam $(LFLAGS) razip:razip.o razf.o knetfile.o - $(CC) $(CFLAGS) -o $@ razf.o razip.o knetfile.o -lz + $(CC) $(CFLAGS) -o $@ razf.o razip.o knetfile.o -lz -lm -lws2_32 bgzip:bgzip.o bgzf.o - $(CC) $(CFLAGS) -o $@ bgzf.o bgzip.o -lz + $(CC) $(CFLAGS) -o $@ bgzf.o bgzip.o -lz -lm -lws2_32 razip.o:razf.h bam.o:bam.h razf.h bam_endian.h kstring.h sam_header.h * Getting Bio::DB::Sam to compile and install Bio::DB::Sam requires not the samtools.exe, but the bam library created during the samtools build, as well as all the samtools header files. Create a symlink in /lib to libbam.a in the build directory (or copy libbam.a up to /lib), and create symlinks or copy *.h into /usr/include. Then in cygwin bash shell $ cpan cpan> install Bio::DB::Sam should fly. Hope someone finds this useful. These mods led me to a successful Bio::DB::Sam install--have not yet checked original code based on Bio::DB::Sam. If they don't work for you, reply to the list. cheers, MAJ From jcline at ieee.org Mon Nov 23 14:13:26 2009 From: jcline at ieee.org (Jonathan Cline) Date: Mon, 23 Nov 2009 13:13:26 -0600 Subject: [Bioperl-l] Installing Bio-perl on dreamhost via CPAN In-Reply-To: References: Message-ID: <4B0ADED6.8040901@ieee.org> Dreamhost has terrible reliability. I have stats going back years on a standard dreamhost hosting account (non-dedicated server), and on some days the web server doesn't respond. Dreamhost service is OK for a hobby blog however it is definitely *not* suitable for anything real. Add in latency, arbitrary account limits/restrictions, etc, and as a hosting service, it is a bad idea to host a project there. Although some users apparently get lucky with server allocation and end up on a "good server", the provider can change this at any time as well. I think more typically, the accounts users don't notice, since most are simple bloggers. Here's a data snip that illustrates the problem with a typical dreamhost account: ---------------------------------------------------------------------- date uptime dns connect request ttfb ttlb 2008-08-05 91.40 0.000 0.528 0.528 2.257 1.619 2008-08-04 89.13 0.002 0.301 0.301 1.302 0.971 2008-08-03 94.62 0.000 0.567 0.567 1.506 0.913 2008-08-02 100.00 0.000 0.335 0.335 1.475 1.079 2008-08-01 100.00 0.000 0.310 0.310 1.587 0.825 2008-07-31 93.55 0.023 0.386 0.386 1.280 0.759 2008-07-30 100.00 0.000 0.345 0.345 1.373 0.860 2008-07-29 100.00 0.000 0.358 0.358 1.335 0.757 2008-07-28 100.00 0.000 0.327 0.327 1.462 0.896 2008-07-27 100.00 0.000 0.292 0.292 1.410 0.966 2008-07-26 100.00 0.000 0.283 0.283 1.280 0.815 2008-07-25 100.00 0.000 0.297 0.297 1.231 0.853 2008-07-24 100.00 0.000 0.362 0.362 1.258 0.699 2008-07-23 100.00 0.000 0.339 0.339 1.270 0.785 ---------------------------------------------------------------------- minimum 89.13 0.000 0.283 0.283 1.231 0.699 maximum 100.00 0.023 0.567 0.567 2.257 1.619 average 97.76 0.002 0.359 0.359 1.430 0.914 ---------------------------------------------------------------------- Or this month: ---------------------------------------------------------------------- date uptime dns connect request ttfb ttlb 2009-11-11 100.00 0.011 0.097 0.097 1.260 1.638 2009-11-10 100.00 0.008 0.094 0.094 1.285 1.647 2009-11-09 100.00 0.008 0.094 0.094 1.494 1.872 2009-11-08 100.00 0.015 0.101 0.101 1.509 1.894 2009-11-07 100.00 0.006 0.092 0.092 1.453 1.831 2009-11-06 100.00 0.011 0.097 0.097 1.500 1.882 2009-11-05 97.80 0.012 0.097 0.097 1.445 1.806 2009-11-04 100.00 0.010 0.096 0.096 1.235 1.605 2009-11-03 95.65 0.007 0.093 0.093 1.266 1.612 2009-11-02 100.00 0.010 0.096 0.096 1.267 1.637 2009-11-01 100.00 0.007 0.093 0.093 1.311 1.692 2009-10-31 100.00 0.009 0.095 0.095 1.225 1.594 2009-10-30 100.00 0.009 0.095 0.095 1.364 1.739 2009-10-29 100.00 0.017 0.103 0.103 1.121 1.505 ---------------------------------------------------------------------- minimum 95.65 0.006 0.092 0.092 1.121 1.505 maximum 100.00 0.017 0.103 0.103 1.509 1.894 average 99.53 0.010 0.096 0.096 1.338 1.711 ---------------------------------------------------------------------- ## Jonathan Cline ## jcline at ieee.org ## Mobile: +1-805-617-0223 ######################## From cjfields at illinois.edu Mon Nov 23 22:19:02 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 23 Nov 2009 21:19:02 -0600 Subject: [Bioperl-l] verbosity and error stack, was accessing EMBL database In-Reply-To: <3277368F-615A-4DD3-B9B3-5D32A5EEEE98@sbc.su.se> References: <475F74057A618245A773CD325E105D1E033334AC@phy-srv01.physiol.physiology.wisc.edu> <3277368F-615A-4DD3-B9B3-5D32A5EEEE98@sbc.su.se> Message-ID: <167D2408-653E-4DF5-BCD7-134CE2549E44@illinois.edu> Okay, so I think it's feasible to add this into trunk. I like the idea of optionally having a log class, if someone comes up with a nice way of adding it in I would be for it. chris On Nov 20, 2009, at 4:15 AM, Dave Messina wrote: > Chris, I took a look at how you implemented this in Biome -- very nice! > > >> I like this verbose/strict separability a lot. Should we go for it? > > Me too. So yes, I think so. > > >> We could even allow finer-grained control of verbosity (states which cover all combinations) w/o affecting strictness. > > > Perhaps this is a job for Log::Log4Perl or Log::Dispatch? > http://search.cpan.org/~mschilli/Log-Log4perl-1.25/lib/Log/Log4perl.pm > http://search.cpan.org/~drolsky/Log-Dispatch-2.26/lib/Log/Dispatch.pm > > > That might be overkill, though. > > Dave > From David.Messina at sbc.su.se Tue Nov 24 11:18:22 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 24 Nov 2009 17:18:22 +0100 Subject: [Bioperl-l] verbosity and error stack, was accessing EMBL database In-Reply-To: <167D2408-653E-4DF5-BCD7-134CE2549E44@illinois.edu> References: <475F74057A618245A773CD325E105D1E033334AC@phy-srv01.physiol.physiology.wisc.edu> <3277368F-615A-4DD3-B9B3-5D32A5EEEE98@sbc.su.se> <167D2408-653E-4DF5-BCD7-134CE2549E44@illinois.edu> Message-ID: <3FD2086D-062F-4706-9DC8-2A53224C4913@sbc.su.se> > I like the idea of optionally having a log class, if someone comes up with a nice way of adding it in I would be for it. My suggestion of the logging modules was actually to handle the various levels of verbose output -- I think both of the ones I mentioned "log" to STDERR by default. But of course a nice side effect of using such a logging module is that it would allow optional logging to a file, too. Dave From paolo.pavan at gmail.com Tue Nov 24 14:28:09 2009 From: paolo.pavan at gmail.com (Paolo Pavan) Date: Tue, 24 Nov 2009 20:28:09 +0100 Subject: [Bioperl-l] Bio::Tools::Run::Cap3 usage question Message-ID: <56be91b60911241128s52613a56u99e5b1cb3ba8d19a@mail.gmail.com> Dear, I'm confused about the proper usage of the module Bio::Tools::Run::Cap3. As documented in the pod, the run(@seqs) method returns the cap3 report file while I expect to return a Bio::Assembly object, consistently with other Bio::Tools::Run classes. However, I went around this by getting from the factory object the location and the names of the temp output files (actually accessing a private property, although) and reading them via the Assembly::IO system. I was just wandering what is the proper designed way to do this job. Thank you for enlighten the way! Paolo From Russell.Smithies at agresearch.co.nz Tue Nov 24 17:04:31 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Wed, 25 Nov 2009 11:04:31 +1300 Subject: [Bioperl-l] Bio::DB::Fasta Message-ID: <18DF7D20DFEC044098A1062202F5FFF32B63085409@exchsth.agresearch.co.nz> Is there any way to pass a filename to Bio::DB::Fasta for the location of where to write the directory.index? It's writing in the same dir as the fasta but I'd rather have it write in /tmp as it's part of a web app. Thanx, Russell ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From Russell.Smithies at agresearch.co.nz Tue Nov 24 17:21:52 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Wed, 25 Nov 2009 11:21:52 +1300 Subject: [Bioperl-l] Bio::DB::Fasta In-Reply-To: <4296CD1039FC44B89034A1FD3E6721F3@NewLife> References: <18DF7D20DFEC044098A1062202F5FFF32B63085409@exchsth.agresearch.co.nz> <4296CD1039FC44B89034A1FD3E6721F3@NewLife> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32B6308542C@exchsth.agresearch.co.nz> That's what I ended up doing. Also, there's no "obvious" way to index a single file so I ended putting the filename in the glob parameter. my $db = Bio::DB::Fasta->new( "$tmp", -glob => "test.faa", -reindex => 1 ); --Russell > -----Original Message----- > From: Mark A. Jensen [mailto:maj at fortinbras.us] > Sent: Wednesday, 25 November 2009 11:19 a.m. > To: Smithies, Russell; 'bioperl-l' > Subject: Re: [Bioperl-l] Bio::DB::Fasta > > The code (method index_dir() ) seems to expect all the fasta files to be > contained in that directory. Looks hairy; what about creating symlinks to > your > fasta files in a /tmp subdir and calling new() with that subdir? > ----- Original Message ----- > From: "Smithies, Russell" > To: "'bioperl-l'" > Sent: Tuesday, November 24, 2009 5:04 PM > Subject: [Bioperl-l] Bio::DB::Fasta > > > > Is there any way to pass a filename to Bio::DB::Fasta for the location > of > > where to write the directory.index? > > It's writing in the same dir as the fasta but I'd rather have it write > in /tmp > > as it's part of a web app. > > > > Thanx, > > > > Russell > > > > > > ======================================================================= > > Attention: The information contained in this message and/or attachments > > from AgResearch Limited is intended only for the persons or entities > > to which it is addressed and may contain confidential and/or privileged > > material. Any review, retransmission, dissemination or other use of, or > > taking of any action in reliance upon, this information by persons or > > entities other than the intended recipients is prohibited by AgResearch > > Limited. If you have received this message in error, please notify the > > sender immediately. > > ======================================================================= > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > From maj at fortinbras.us Tue Nov 24 17:18:51 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 24 Nov 2009 17:18:51 -0500 Subject: [Bioperl-l] Bio::DB::Fasta In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32B63085409@exchsth.agresearch.co.nz> References: <18DF7D20DFEC044098A1062202F5FFF32B63085409@exchsth.agresearch.co.nz> Message-ID: <4296CD1039FC44B89034A1FD3E6721F3@NewLife> The code (method index_dir() ) seems to expect all the fasta files to be contained in that directory. Looks hairy; what about creating symlinks to your fasta files in a /tmp subdir and calling new() with that subdir? ----- Original Message ----- From: "Smithies, Russell" To: "'bioperl-l'" Sent: Tuesday, November 24, 2009 5:04 PM Subject: [Bioperl-l] Bio::DB::Fasta > Is there any way to pass a filename to Bio::DB::Fasta for the location of > where to write the directory.index? > It's writing in the same dir as the fasta but I'd rather have it write in /tmp > as it's part of a web app. > > Thanx, > > Russell > > > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From florent.angly at gmail.com Tue Nov 24 17:54:48 2009 From: florent.angly at gmail.com (Florent Angly) Date: Tue, 24 Nov 2009 14:54:48 -0800 Subject: [Bioperl-l] Bio::Tools::Run::Cap3 usage question In-Reply-To: <56be91b60911241128s52613a56u99e5b1cb3ba8d19a@mail.gmail.com> References: <56be91b60911241128s52613a56u99e5b1cb3ba8d19a@mail.gmail.com> Message-ID: <4B0C6438.8070405@gmail.com> Hi Paolo, It turns out that there is no standard for what is to be passed to the Bio::Tools::Run wrappers and returned by them. I noticed the inconsistency between the assembly wrappers recently while implementing support for new wrapper. I implemented inital support for additional de novo assembly programs in BioPerl (454 Newbler and Minimo) a couple of weeks ago and Mark Jensen added support for Maq, a program that assembler reads against a reference. In the process, all the assembly wrappers were changed to take the same type of input data (a FASTA sequence or an array reference of sequence objects) and return one of the following: * a Bio::Assembly::Scaffold object (the default), or * a Bio::Assembly::IO object, or * the name of a file for the output of the assembler Use the out_type method to set up which output you want, e.g.: $factory->out_type('Bio::Assembly::IO'); or $factory->out_type('cap3_results.ace'); You'll have to use the code in the bioperl-run subversion if you want to use these new features. Cheers, Florent Paolo Pavan wrote: > Dear, > I'm confused about the proper usage of the module Bio::Tools::Run::Cap3. > As documented in the pod, the run(@seqs) method returns the cap3 report file > while I expect to return a Bio::Assembly object, consistently with other > Bio::Tools::Run classes. > However, I went around this by getting from the factory object the location > and the names of the temp output files (actually accessing a private > property, although) and reading them via the Assembly::IO system. > I was just wandering what is the proper designed way to do this job. > > Thank you for enlighten the way! > Paolo > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From roychu at gmail.com Tue Nov 24 18:00:58 2009 From: roychu at gmail.com (Roy) Date: Tue, 24 Nov 2009 15:00:58 -0800 Subject: [Bioperl-l] Remote Blast - same script but different results Message-ID: <4d7f3e450911241500y7df305acq1d03819ea1ec7d3e@mail.gmail.com> Hi bioperl community, I've tried searching the old lists to see if this topic has been covered, and perhaps this question arises from my own lack of familiarity with BLAST, but (from my perl script listed below) I get different results with remote blast when I call my script (that is, I will either get hits or no hits at all). I'll call the script one time, and get no hits. Then call the script again (with the same parameters), and get the same several hits that I may have before after having gotten no hits. I use a subroutine to parse the blast report information, and then I use a boolean to indicate whether results are returned or not. Any insight into what I may have missed would be appreciated. Short question, is this behavior typical? My understanding of how BLAST works is that it shouldn'tl... Thanks in advance, Roy #!/usr/bin/perl -w use strict; use warnings; use Carp; use Bio::Perl; use CGI; use Bio::SeqIO; use Bio::SearchIO; use Bio::SeqFeature::Generic; use Bio::Restriction::Analysis; use Bio::Tools::Run::RemoteBlast; use Bio::SimpleAlign; use Bio::AlignIO; use Bio::LocatableSeq; my $five_seqobj = Bio::Seq->new( -seq => 'ATTCCCACCGGGACCTGCGGGGCTGAGTGCCCTTCTCGGTTGCTGCCGCTGAGGAGCCCGCCCAGCCAGCCAGGGCCGCGAGGCCGAGGCCAGGCCGCAGCCCAGGAGCCGCCCCACCGCAGCTGGCGATGGACCCGCCGAGGCCCGCGCTGCTGGCGCTGCTGGCGCTGCCTGCGCTGCTGCTGCTGCTGCTGGCGGGCGCCAGGGCCG', -display_id => 'genomic_a', -alphabet => 'dna', ); my $three_seqobj = Bio::Seq->new( -seq => 'GTGAGTGCGCGGCCGCTCTGCGGGCGCAGAGGGAGCGGGAGGGAGCCGGCGGCACGAGGTTGGCCGGGGCAGCCTGGGCCTAGGCCAGAGGGAGGGCAGCCACAGGGTCCAGGGCGAGTGGGGGGATTGGACCAGCTGGCGGCCCCTGCAGGCTCAGGATGGGGGGCGCGGGATGGAGGGGCTGAGGAGGGGGTCTCCGGAGCCTGCCTC', -display_id => 'genomic_b', -alphabet => 'dna', ); my @params = ( '-program' => 'blastn', '-database' => 'refseq_genomic', '-expect' => '10', '-readmethod' => 'blastxml' ); $Bio::Tools::Run::RemoteBlast::HEADER{'MEGABLAST'} = 'YES'; $Bio::Tools::Run::RemoteBlast::HEADER{'PERC_IDENT'} = 75; $Bio::Tools::Run::RemoteBlast::HEADER{'FORMAT_TYPE'} = 'XML'; $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Homo sapiens [ORGN]'; $Bio::Tools::Run::RemoteBlast::HEADER{'HITLIST_SIZE'} = 100; # Put: limit number of hits my $factory_a = Bio::Tools::Run::RemoteBlast->new(@params); $factory_a->retrieve_parameter('FORMAT_TYPE', 'XML'); my $hits_a; my $hits_b; my $r; my $bool_hit; print "Submitting BLAST query - 5' end (MEGABLAST = YES)\n"; $Bio::Tools::Run::RemoteBlast::HEADER{'MEGABLAST'} = 'YES'; $r = $factory_a->submit_blast($a_seqobj); $bool_hit = fetch_blast_report($factory_a); unless ($bool_hit) { print "\nNo hits\n"; print "Re-submitting BLAST query - 5' end (MEGABLAST = NO)\n"; sleep 5; $Bio::Tools::Run::RemoteBlast::HEADER{'MEGABLAST'} = 'NO'; $r = $factory_a->submit_blast($a_seqobj); ($bool_hit, $hits_a) = fetch_blast_report($factory_a); if ($bool_hit == 0) { print "No hits\n"; } sleep 5; } my $factory_b = Bio::Tools::Run::RemoteBlast->new(@params); print "\n--------------------------------------------------\n\n"; print "Submitting BLAST query - 3' end (MEGABLAST = YES)\n"; $Bio::Tools::Run::RemoteBlast::HEADER{'MEGABLAST'} = 'YES'; $r = $remote_blast_three->submit_blast($b_seqobj); $bool_hit = fetch_blast_report($factory_b); unless ($bool_hit) { print " No hits\n"; print "Re-submitting BLAST query - 3' end (MEGABLAST = NO)\n"; sleep 5; $Bio::Tools::Run::RemoteBlast::HEADER{'MEGABLAST'} = 'NO'; $r = $factory_b->submit_blast($b_seqobj); ($bool_hit, $hits_b) = fetch_blast_report($factory_b); if ($bool_hit == 0) { print " No hits\n"; } sleep 5; } print "\nbye\n\n"; print "$hits_a\n$hits_b\n"; exit; sub fetch_blast_report { my ($factory) = @_; my $v = 1; my $bool_hit = 0; my $hits = ''; print STDERR "waiting..."; while (my @rids = $factory->each_rid) { foreach my $rid (@rids) { print STDERR "."; my $rc = $factory->retrieve_blast($rid); # retrieves blast report from remote blast queue, # returns -1 on error, 0 on 'job not finished', Bio::SearchIO object # args, remote blast id (rid) if (!ref($rc)) { # if not empty string, ref EXPR returns a non-empty string if EXPR is a reference if ($rc < 0) { $factory->remove_rid($rid); } print STDERR "." if ($v > 0); ##################################################################################### is this printing out as multiple dots? when and why? sleep 5; } else { $bool_hit = 1; my $result = $rc->next_result(); unless ($result->num_hits > 0) { $bool_hit = 0; } # returns: Bio::Search::Result::ResultI object $factory->remove_rid($rid); print "\ndatabase:\t", $result->database_name,"\n"; print "query name:\t", $result->query_name,"\n"; print "query length\t", $result->query_length,"\n"; print "num hits\t", $result->num_hits,"\n"; if ($result->num_hits) { # $result->hits returns an array of hits # $results->no_hits_found, boolean vs $#{@hits} ie. filtering\ while (my $hit = $result->next_hit) { print "\nhit name:\t", $hit->name,"\n"; print "description:\t", $hit->description,"\n"; print "locus:\t", $hit->locus,"\n"; print "algorithm: ", $hit->algorithm,"\thit length: ", $hit->length,"\thit ranking: ", $hit->rank,"\n"; while (my $hsp = $hit->next_hsp) { print "evalue: ", $hsp->evalue,"\tscore: ", $hsp->score,"\tpercent_id: ", $hsp->percent_identity,"\n"; print "query_start: ", $hsp->query->start,"\tquery_end: ", $hsp->query->end; print "\tquery_length: ", $hsp->query->length,"\tquery_strand: ", $hsp->strand('query'), "\n"; print "subject_start: ", $hsp->subject->start,"\tsubject_end: ", $hsp->subject->end; print "\tsubject_length: ", $hsp->subject->length,"\tsubject_strand: ", $hsp->strand('subject'), "\n\n"; my $aln = $hsp->get_aln; if ($aln->is_flush) { foreach my $seq ($aln->each_seq) { print $seq->seq,"\n"; } print $aln->gap_line, "\n"; print $aln->consensus_string(95), "\n\n"; } $hits .= $hit->name."\t".$hsp->subject->start."\t".$hsp->subject->end."\t".$hsp->strand('subject')."\n"; } } } } } return ($bool_hit, $hits); } } From maj at fortinbras.us Tue Nov 24 23:12:13 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 24 Nov 2009 23:12:13 -0500 Subject: [Bioperl-l] Bio::DB::Fasta In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32B6308542C@exchsth.agresearch.co.nz> References: <18DF7D20DFEC044098A1062202F5FFF32B63085409@exchsth.agresearch.co.nz> <4296CD1039FC44B89034A1FD3E6721F3@NewLife> <18DF7D20DFEC044098A1062202F5FFF32B6308542C@exchsth.agresearch.co.nz> Message-ID: <3ECFA0236D1B467181EE63C8C6BE7E1F@NewLife> I seem to be able to do $db = Bio::DB::Fasta->new("$tmp/test.faa"); without a problem- something in the mixing of named and unnamed parameters? ----- Original Message ----- From: "Smithies, Russell" To: "'Mark A. Jensen'" ; "'bioperl-l'" Sent: Tuesday, November 24, 2009 5:21 PM Subject: RE: [Bioperl-l] Bio::DB::Fasta That's what I ended up doing. Also, there's no "obvious" way to index a single file so I ended putting the filename in the glob parameter. my $db = Bio::DB::Fasta->new( "$tmp", -glob => "test.faa", -reindex => 1 ); --Russell > -----Original Message----- > From: Mark A. Jensen [mailto:maj at fortinbras.us] > Sent: Wednesday, 25 November 2009 11:19 a.m. > To: Smithies, Russell; 'bioperl-l' > Subject: Re: [Bioperl-l] Bio::DB::Fasta > > The code (method index_dir() ) seems to expect all the fasta files to be > contained in that directory. Looks hairy; what about creating symlinks to > your > fasta files in a /tmp subdir and calling new() with that subdir? > ----- Original Message ----- > From: "Smithies, Russell" > To: "'bioperl-l'" > Sent: Tuesday, November 24, 2009 5:04 PM > Subject: [Bioperl-l] Bio::DB::Fasta > > > > Is there any way to pass a filename to Bio::DB::Fasta for the location > of > > where to write the directory.index? > > It's writing in the same dir as the fasta but I'd rather have it write > in /tmp > > as it's part of a web app. > > > > Thanx, > > > > Russell > > > > > > ======================================================================= > > Attention: The information contained in this message and/or attachments > > from AgResearch Limited is intended only for the persons or entities > > to which it is addressed and may contain confidential and/or privileged > > material. Any review, retransmission, dissemination or other use of, or > > taking of any action in reliance upon, this information by persons or > > entities other than the intended recipients is prohibited by AgResearch > > Limited. If you have received this message in error, please notify the > > sender immediately. > > ======================================================================= > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > From maj at fortinbras.us Wed Nov 25 12:25:30 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 25 Nov 2009 12:25:30 -0500 Subject: [Bioperl-l] question for all regarding a sam-based Bio::Assembly::IO Message-ID: <1E72D5B0A190448FA27545DB5B68638D@NewLife> Short-readers, I'm working on an Assembly::IO class for sam alignments. I'm currently making a decision about handling multiple reference sequences: would you prefer that next_assembly() return an assembly that covers all reference sequences, or that next_assembly iterates over each reference sequence? (Or both?) thanks for your input- MAJ From timbourine81 at gmail.com Wed Nov 25 12:40:52 2009 From: timbourine81 at gmail.com (Tim) Date: Wed, 25 Nov 2009 18:40:52 +0100 Subject: [Bioperl-l] How to parse BLAST output - all hits of each query in new file Message-ID: <4B0D6C24.2080308@gmail.com> Dear bioperl users, I am a real newbie and have - maybe a very trivial - question. I searched the mailing list archive and many howtos but I have not found a concrete answer to my problem. So hopefully you can help me :) Background: I use the latest Bioperl version (installed it two weeks before). When I use Bio::Tools::Run::StandAloneBlast to BLAST one fasta file including different sequences, I get a BLAST output with many queries each having several hits / sbjcts. My problem is how to parse *all* hits of *one* query into a single new file. And this for all the queries I have in my BLAST output file. Or is it better the other way round; first to make fasta files with only single sequences inside and BLAST each file? But how can I automize that using Bioperl? I tried Bio::SearchIO but can only parse all queries and their respective hits in only one file... I think iteration is also necessary here, but I do not really know how to include that into Bio::SearchIO. Or do I have to use Module:Bio::Index::Blast? I can index a file (see below), but I have no idea what comes next... ###How I index a file... #!/usr/bin/perl -w $ENV{BIOPERL_INDEX_TYPE} = "SDBM_File"; use Bio::Index::Fasta; $file_name = "8_to_BLAST_two_seq_index.fasta"; $id = "48882"; $inx = Bio::Index::Fasta->new (-filename => $file_name . ".idx", -write_flag => 1); $inx->make_index($file_name); Hopefully, you can give me at least hints what to look for. A big THANKS in advance! Cheers, Tim From timbourine81 at gmail.com Wed Nov 25 12:53:34 2009 From: timbourine81 at gmail.com (Tim) Date: Wed, 25 Nov 2009 18:53:34 +0100 Subject: [Bioperl-l] How to parse different (fasta) files Message-ID: <4B0D6F1E.8@gmail.com> Hey everybody, another question from me...if you do not mind :) My situation is like this: I have parsed a standalone BLAST output using SearchIO with only the hit names. Now I have a second fasta file with the same sequences like in the BLAST database but including an alignment (meaning "." and "-"). (There is no chance to make a BLAST database with fasta files including the alignment, unfortunately...). My intention is now to take the name of the hit sequences (BLAST output) and to get the corresponding aligned sequences (fasta file incl. alignment) and putting it in a new file. Is anybody out there who has tried that before? Again, I am a absolute greenhorn in using (Bio)perl. Maybe it is very simple :D Looking forward to get an answer of you. All the best, Tim -- Tim K?hler MPI for Terrestrial Microbiology Karl-von-Frisch-Stra?e D-35043 Marburg / Germany Email: koehlerd at mpi-marburg.mpg.de Phone: +49 6421 178-740 Fax: +49 6421 178-999 From maj at fortinbras.us Wed Nov 25 13:20:03 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 25 Nov 2009 13:20:03 -0500 Subject: [Bioperl-l] How to parse BLAST output - all hits of each query innew file In-Reply-To: <4B0D6C24.2080308@gmail.com> References: <4B0D6C24.2080308@gmail.com> Message-ID: <53DE480F205E42CE8D2B9421592AAF0E@NewLife> hey Tim-- Sound like you need to go about collecting your queries inside out: my %hits_by_query; for ($result->hits) { push @{$hits_by_query{$hit->name}} $hit; } I believe now each hash element, keyed by the query name, will contain an arrayref to the set of hits assoc with that query. >From here, I believe use Bio::Search::Result::BlastResult; use Bio::SearchIO; foreach my $qid ( keys %hits_by_query ) { my $result = Bio::Search::Result::BlastResult->new(); $result->add_hit($_) for ( @{$hits_by_query{$qid}} ); my $blio = Bio::SearchIO->new( -file => $qid.".bls", -format=>'blast' ); $blio->write_result($result); } will do what you want. hope this helps - Mark ----- Original Message ----- From: "Tim" To: Sent: Wednesday, November 25, 2009 12:40 PM Subject: [Bioperl-l] How to parse BLAST output - all hits of each query innew file > Dear bioperl users, > > I am a real newbie and have - maybe a very trivial - question. > > I searched the mailing list archive and many howtos but I have not found > a concrete answer to my problem. So hopefully you can help me :) > > Background: I use the latest Bioperl version (installed it two weeks > before). > When I use Bio::Tools::Run::StandAloneBlast to BLAST one fasta file > including different sequences, I get a BLAST output with many queries > each having several hits / sbjcts. > > My problem is how to parse *all* hits of *one* query into a single new > file. And this for all the queries I have in my BLAST output file. > > Or is it better the other way round; first to make fasta files with only > single sequences inside and BLAST each file? But how can I automize that > using Bioperl? > > I tried Bio::SearchIO but can only parse all queries and their > respective hits in only one file... > I think iteration is also necessary here, but I do not really know how > to include that into Bio::SearchIO. > Or do I have to use Module:Bio::Index::Blast? > > I can index a file (see below), but I have no idea what comes next... > > ###How I index a file... > > #!/usr/bin/perl -w > > $ENV{BIOPERL_INDEX_TYPE} = "SDBM_File"; > > use Bio::Index::Fasta; > > > $file_name = "8_to_BLAST_two_seq_index.fasta"; > $id = "48882"; > $inx = Bio::Index::Fasta->new (-filename => $file_name . ".idx", > -write_flag => 1); > $inx->make_index($file_name); > > > Hopefully, you can give me at least hints what to look for. > > A big THANKS in advance! > > Cheers, > > Tim > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From Russell.Smithies at agresearch.co.nz Wed Nov 25 14:07:26 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 26 Nov 2009 08:07:26 +1300 Subject: [Bioperl-l] How to parse BLAST output - all hits of each query in new file In-Reply-To: <4B0D6C24.2080308@gmail.com> References: <4B0D6C24.2080308@gmail.com> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32B63085701@exchsth.agresearch.co.nz> Hi Tim, Here's some code for a job I'm working on at the moment that contains all the bits you'll probably need. It's extracting 2 species-specific databases from nr (based on tax ids), doing a blast, then parsing the results and creating a substitution matrix. I was initially using Bio::DB::Eutilities to query and retrieve sequences but I kept getting errors and time-outs from NCBI when pulling back large numbers of sequences. It should give you a rough idea of how to run Bio::Tools::Run::StandAloneBlast, Bio::DB::Fasta and Bio::SearchIO. Email me direct if you want further explaination as it's not well commented ;-) Russell Smithies Bioinformatics Applications Developer T +64 3 489 9085 E russell.smithies at agresearch.co.nz Invermay Research Centre Puddle Alley, Mosgiel, New Zealand T +64 3 489 3809 F +64 3 489 9174 www.agresearch.co.nz ======================================= #!/usr/local/bin/perl use strict; use Bio::Tools::Run::StandAloneBlast; use Bio::SeqIO; use Bio::SearchIO; use Bio::DB::Fasta; use Storable; # Parameters: # Percentage can be specified as either 20p, 20P or 20% # So for 20% of rice sequences blasted against oil palm: # 4530 51953 20p (4530=rice,51953=oil_palm, 20p=20%) # Or for 20 searches: # 4530 51953 20 # my ( $q, $s, $c ) = @ARGV; my $nr = "/data/databases/flatfile/illuminati_blastdata/nr"; my $tax_file = "/data/anonftp/pub/mirror/taxonomy/gi_taxid_prot.dmp.gz"; my $tmp = "/tmp/tax"; my %stats = (); my $total_subs = 0; my $min_hsp_len = 0; my $min_hsp_identity = 0; my $num_searches = $c || 10; my $blast_e = '1e-6'; my $count = 0; # check if all the fasta and blast files exist # if not, extract new fasta and re-formatdb the database foreach my $t ( $q, $s ) { foreach ( map { "$tmp/$t.$_" } qw(faa list phr pin psq) ) { unless ( -e $_ ) { print "Creating database for $t\n"; &create_database($t); last; } } } my @params = ( -database => "$tmp/$q", -program => 'blastp', -e => $blast_e, -outfile => "$tmp/blast.out", -v => '1', -b => '1' ); my $factory = Bio::Tools::Run::StandAloneBlast->new(@params) or die $!; # load the query sequences into a db # makes it easier to randomly access them my $db = Bio::DB::Fasta->new( "$tmp", -glob => "$s.faa", -reindex => 1 ); my @ids = $db->ids; my $id_count = $#ids; exit "No sequences\n" unless $id_count; # if a percentage is requested, calculate # the required number of searches if ( $num_searches =~ m/(\d+)[pP%]/ ) { $num_searches = int( ( $1 / 100 ) * $id_count ); warn "Searching random $1 percent ($num_searches) of $id_count sequences from taxid $q\n"; } my $summary_file = "$tmp/".$$."_summary.txt"; open( OUT, ">", $summary_file ) or die $!; print OUT "#Summary of $num_searches random blast searches from taxid $q against taxid $s.\n"; print OUT "#Parameters used were:\n"; print OUT "#blast_e: $blast_e\n"; print OUT "#min_hsp_len: $min_hsp_len\n"; print OUT "#min_hsp_identity: $min_hsp_identity\n"; print OUT "\n"; while ( my $seq = $db->get_Seq_by_id( $ids[ rand($#ids) ] ) ) { next unless $seq; warn "Processing ", $seq->id, "\n"; eval { my $blast_report = $factory->blastall($seq); sleep 5; }; my $blast_in = new Bio::SearchIO( -format => "blast", -file => "$tmp/blast.out" ); while ( my $result = $blast_in->next_result ) { if ( $result->num_hits <= 0 ) { warn "No hits for ", $result->query_accession, "\n"; print OUT "No hits for ", $result->query_accession, "\n"; next; } $count++; while ( my $hit = $result->next_hit ) { while ( my $hsp = $hit->next_hsp ) { warn sprintf( "%s had %s hsp%s\n", $result->query_accession, $hit->num_hsps, $hit->num_hsps > 1 ? "s" : "" ); print OUT sprintf( "%s had %s hsp%s\n", $result->query_accession, $hit->num_hsps, $hit->num_hsps > 1 ? "s" : "" ); # http://www.bioperl.org/wiki/HOWTO:SearchIO#Table_of_Methods if ( $hsp->length('total') > $min_hsp_len ) { if ( $hsp->percent_identity >= $min_hsp_identity ) { my @query_string = split '', $hsp->query_string; my @homol_string = split '', $hsp->homology_string; my @hit_string = split '', $hsp->hit_string; for ( my $i = 0; $i < $#query_string; $i++ ) { next unless $homol_string[$i] =~ /\+/; $stats{ $query_string[$i] }{ $hit_string[$i] }++; $total_subs++; } } } } } } unlink '$tmp/blast.out' if -e '$tmp/blast.out'; last if $count >= $num_searches; } # create summary frequency list my %summary = (); for my $query ( keys %stats ) { for my $hit ( keys %{ $stats{$query} } ) { $summary{"$query->$hit"} = sprintf( "%6f", $stats{$query}{$hit} / $total_subs ); } } print OUT "\n"; # sort by decending frequencies and print to summary file foreach my $k ( sort { $summary{$b} <=> $summary{$a} } keys %summary ) { print OUT "$k\t", $summary{$k}, "\n" unless $k =~ /TOTAL/; } print OUT "\n\n"; # print substitution matrix my $i = 0; my @prots = qw(A R N D C Q E G H I L K M F P S T W Y V); my $sep = "\t"; print OUT sprintf( "%7s %s", $_, $sep ) foreach ( " ", @prots ); print OUT "\n"; foreach my $x (@prots) { print OUT sprintf( "%7s|%s", $prots[ $i++ ], $sep ); foreach my $y (@prots) { my $val = defined( $stats{$x}{$y} ) ? sprintf( "%0.6f", $stats{$x}{$y} / $total_subs ) : "--------"; print OUT sprintf( "%s%s", $val, $sep ); } print OUT "\n"; } close OUT; open(IN, $summary_file) or die $!; print $_ while(); close IN; # extract sequences from nr database based on taxid. sub create_database { my $txid = shift; my %hash = (); my $gi_stored = "/tmp/gi.dat"; if ( -e $gi_stored ) { %hash = %{ retrieve($gi_stored) }; } else { open( TXID, "zcat $tax_file | " ) or die $!; while () { chomp; my ( $gi, $tx ) = split( "\t", $_ ); push( @{ $hash{$tx} }, $gi ); } close TXID; store( \%hash, $gi_stored ); } my $txlist = "$tmp/$txid.list"; my $txseq = "$tmp/$txid.faa"; die "No sequences found for taxid $txid\n" unless defined( @{ $hash{$txid} }); my $num_seqs = scalar( @{ $hash{$txid} }); warn "Found $num_seqs sequences for taxid $txid in $tax_file\n"; open OUT, ">", $txlist or die $!; print OUT "$_\n" foreach ( @{ $hash{$txid} } ); close OUT; my $cmd = "fastacmd -d $nr -i $txlist -t T -o $txseq 2>/dev/null"; system $cmd; my $count = `grep -c '>' $txseq`; $count =~ s/\n//; warn "Could only extract $count sequences from $nr\n"; $cmd = "formatdb -p T -i $tmp/$txid.faa -n $tmp/$txid -l $tmp/formatdb.log"; system $cmd; $cmd = "fastacmd -d $tmp/$txid -I"; system $cmd; warn "Check the formatdb.log for any errors\n"; } ======================================= > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Tim > Sent: Thursday, 26 November 2009 6:41 a.m. > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] How to parse BLAST output - all hits of each query in > new file > > Dear bioperl users, > > I am a real newbie and have - maybe a very trivial - question. > > I searched the mailing list archive and many howtos but I have not found > a concrete answer to my problem. So hopefully you can help me :) > > Background: I use the latest Bioperl version (installed it two weeks > before). > When I use Bio::Tools::Run::StandAloneBlast to BLAST one fasta file > including different sequences, I get a BLAST output with many queries > each having several hits / sbjcts. > > My problem is how to parse *all* hits of *one* query into a single new > file. And this for all the queries I have in my BLAST output file. > > Or is it better the other way round; first to make fasta files with only > single sequences inside and BLAST each file? But how can I automize that > using Bioperl? > > I tried Bio::SearchIO but can only parse all queries and their > respective hits in only one file... > I think iteration is also necessary here, but I do not really know how > to include that into Bio::SearchIO. > Or do I have to use Module:Bio::Index::Blast? > > I can index a file (see below), but I have no idea what comes next... > > ###How I index a file... > > #!/usr/bin/perl -w > > $ENV{BIOPERL_INDEX_TYPE} = "SDBM_File"; > > use Bio::Index::Fasta; > > > $file_name = "8_to_BLAST_two_seq_index.fasta"; > $id = "48882"; > $inx = Bio::Index::Fasta->new (-filename => $file_name . ".idx", > -write_flag => 1); > $inx->make_index($file_name); > > > Hopefully, you can give me at least hints what to look for. > > A big THANKS in advance! > > Cheers, > > Tim > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From maj at fortinbras.us Wed Nov 25 14:21:27 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 25 Nov 2009 14:21:27 -0500 Subject: [Bioperl-l] How to parse BLAST output - all hits of each queryinnew file In-Reply-To: <53DE480F205E42CE8D2B9421592AAF0E@NewLife> References: <4B0D6C24.2080308@gmail.com> <53DE480F205E42CE8D2B9421592AAF0E@NewLife> Message-ID: <815D2A47BC9C4D89B8DEF0B10DA9EAF8@NewLife> whoops: change the following line: my $blio = Bio::SearchIO->new( -file => $qid.".bls", -format=>'blast' ); to my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", -format=>'blast' ); (I always forget that...) MAJ ----- Original Message ----- From: "Mark A. Jensen" To: "Tim" ; Sent: Wednesday, November 25, 2009 1:20 PM Subject: Re: [Bioperl-l] How to parse BLAST output - all hits of each queryinnew file > hey Tim-- > > Sound like you need to go about collecting your queries inside out: > > my %hits_by_query; > for ($result->hits) { > push @{$hits_by_query{$hit->name}} $hit; > } > > I believe now each hash element, keyed by the query name, will contain > an arrayref to the set of hits assoc with that query. >>From here, I believe > > use Bio::Search::Result::BlastResult; > use Bio::SearchIO; > > foreach my $qid ( keys %hits_by_query ) { > my $result = Bio::Search::Result::BlastResult->new(); > $result->add_hit($_) for ( @{$hits_by_query{$qid}} ); > my $blio = Bio::SearchIO->new( -file => $qid.".bls", -format=>'blast' ); > $blio->write_result($result); > } > > will do what you want. > > hope this helps - > Mark > > ----- Original Message ----- > From: "Tim" > To: > Sent: Wednesday, November 25, 2009 12:40 PM > Subject: [Bioperl-l] How to parse BLAST output - all hits of each query innew > file > > >> Dear bioperl users, >> >> I am a real newbie and have - maybe a very trivial - question. >> >> I searched the mailing list archive and many howtos but I have not found >> a concrete answer to my problem. So hopefully you can help me :) >> >> Background: I use the latest Bioperl version (installed it two weeks >> before). >> When I use Bio::Tools::Run::StandAloneBlast to BLAST one fasta file >> including different sequences, I get a BLAST output with many queries >> each having several hits / sbjcts. >> >> My problem is how to parse *all* hits of *one* query into a single new >> file. And this for all the queries I have in my BLAST output file. >> >> Or is it better the other way round; first to make fasta files with only >> single sequences inside and BLAST each file? But how can I automize that >> using Bioperl? >> >> I tried Bio::SearchIO but can only parse all queries and their >> respective hits in only one file... >> I think iteration is also necessary here, but I do not really know how >> to include that into Bio::SearchIO. >> Or do I have to use Module:Bio::Index::Blast? >> >> I can index a file (see below), but I have no idea what comes next... >> >> ###How I index a file... >> >> #!/usr/bin/perl -w >> >> $ENV{BIOPERL_INDEX_TYPE} = "SDBM_File"; >> >> use Bio::Index::Fasta; >> >> >> $file_name = "8_to_BLAST_two_seq_index.fasta"; >> $id = "48882"; >> $inx = Bio::Index::Fasta->new (-filename => $file_name . ".idx", >> -write_flag => 1); >> $inx->make_index($file_name); >> >> >> Hopefully, you can give me at least hints what to look for. >> >> A big THANKS in advance! >> >> Cheers, >> >> Tim >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From alden.huang at gmail.com Thu Nov 26 05:54:30 2009 From: alden.huang at gmail.com (Alden Huang) Date: Thu, 26 Nov 2009 02:54:30 -0800 Subject: [Bioperl-l] Function that determines serious mutations In-Reply-To: References: Message-ID: <9e408d720911260254r1e85169lb92d944d88a1880c@mail.gmail.com> Hey rob, Sorting Intolerant from Tolerant http://sift.jcvi.org/ ~alden ...a bit late, i kno; I just read you post now while cleaning the inbox On Fri, Nov 6, 2009 at 9:35 AM, Robert Bradbury wrote: > Is there a function in the library (or has someone written one) that can > take a genbank entry and determine which mutations are harmful? > > It would be used to produce a table summary of: > ?GENE ? ? ? ? ?# SNP ? ? ?# BadSNP > > One kind of gets this from NCBI if you lookup in the "GENE" db a gene name > and then go to the "GeneView" om dbSNP page it has the information I want > but largely in a graphical format while I simply want numbers I can dump > into a spreadsheet. > > I don't think it would be hard, fetch the gene, run through the features for > the SNP database, figure out whether they are good or bad SNPs, accumulate > the statistics and dump it. ?I think the functions available are flexible > enough to do it but I can't believe nobody has already done it. ?It could be > a bit more complex in that one could do an analysis to see if the mutations > are in a conserved domain or mutations that code for Cysteine or Methionine > (or othe potentially "critical" amino acids) but since "critical" is in the > eye of the beholder there would have to be some kind of callback to a > scoring function. > > Thanks, > Robert > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From robert.bradbury at gmail.com Thu Nov 26 06:27:50 2009 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Thu, 26 Nov 2009 06:27:50 -0500 Subject: [Bioperl-l] Function that determines serious mutations In-Reply-To: <9e408d720911260254r1e85169lb92d944d88a1880c@mail.gmail.com> References: <9e408d720911260254r1e85169lb92d944d88a1880c@mail.gmail.com> Message-ID: On Thu, Nov 26, 2009 at 5:54 AM, Alden Huang wrote: > > Sorting Intolerant from Tolerant > http://sift.jcvi.org/ > > Ah yes, thank you very much. This looks very much like a tool that can be adapted for various uses. Robert From jason at bioperl.org Thu Nov 26 12:16:17 2009 From: jason at bioperl.org (Jason Stajich) Date: Thu, 26 Nov 2009 09:16:17 -0800 Subject: [Bioperl-l] question about a Bio::Tree::Tree method In-Reply-To: <30960443.966281259248778372.JavaMail.defaultUser@defaultHost> References: <30960443.966281259248778372.JavaMail.defaultUser@defaultHost> Message-ID: <14F4B8C9-A1F4-436B-813F-50E139932D3D@bioperl.org> Emilio - please ask your questions on the list - many people there can help answer questions. get_nodes returns all the nodes in the tree, the options specify the order they are returned in. Depending on your question the order probably won't matter so you can just call it without any arguments like in the examples and the HOWTO. The documentation for the method says: Title : get_nodes Usage : my @nodes = $tree?>get_nodes() Function: Return list of Bio::Tree::NodeI objects Returns : array of Bio::Tree::NodeI objects Args : (named values) hash with one value order => ?b?breadth? first order or ?d?depth? first order So you can provide no arguments and get the default (breadth-first I believe) or you can specify -order => 'd' or -order => 'depth' to get the nodes in depth-first order. -jason On Nov 26, 2009, at 7:19 AM, miglio83 at libero.it wrote: > Hi Jason, > I'm Emilio Siena, a PhD student of the University of Perugia. > I have > a question about the method "get_nodes" of the "Bio::Tree::Tree" > class. > In > particular I didn't understand which type of arguments it accepts > and in which > format an argument should be given. > > Thank you in advance! > > Emilio -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From maj at fortinbras.us Thu Nov 26 12:40:45 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 26 Nov 2009 12:40:45 -0500 Subject: [Bioperl-l] Bio::Assembly::IO::sam is alpha Message-ID: <599F8BABCD2848EFA98FB24A4419674E@NewLife> in bioperl-live/trunk with plenty pod; bravehearts can (please!) test on .bam files cheers, MAJ From mauricio at open-bio.org Thu Nov 26 16:45:43 2009 From: mauricio at open-bio.org (Mauricio Herrera Cuadra) Date: Thu, 26 Nov 2009 15:45:43 -0600 Subject: [Bioperl-l] [DAS] DAS workshop 7th-9th April 2010 In-Reply-To: References: Message-ID: <4B0EF707.6080202@open-bio.org> Hi Jonathan, Any chance it can be webcasted? I'm sure it would attract a lot of remote attendees ;) Regards, Mauricio. Jonathan Warren wrote: > We are considering running a Distributed Annotation System workshop here > at the Sanger/EBI in the UK subject to decent demand. > The workshop will be held from Wednesday 7th-Friday 9th April 2010. If > you would be interested in attending either to present or just take part > then please email me jw12 at sanger.ac.uk > > The format of the workshop is likely to be similar to last years (1st > day for beginners, 2nd for both beginners and advanced users, 3rd day > for advanced), information for which can be found here: > http://www.dasregistry.org/course.jsp > > If you would like to present then please send a short summary of what > you would like to talk about. > > Thanks > > Jonathan. > > Jonathan Warren > Senior Developer and DAS coordinator > jw12 at sanger.ac.uk > > > > > > > > > From robert.bradbury at gmail.com Thu Nov 26 21:06:40 2009 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Thu, 26 Nov 2009 21:06:40 -0500 Subject: [Bioperl-l] BioPerl "guts" question regarding forked processes Message-ID: I'm currently running near my process limit and running sequence fetches from swissprot (I've also had this happen with getting gi's from NCBI) and am running out of processes about halfway through the set I'm trying to fetch [1]. Now, is there someplace in the bioperl documentation that documents where one is supposed to wait() for defunct processes after each sequence fetch. I'm encountering the problem both when the sequence fetches succeed as well as when they fail. Thanks in advance. Robert 1. This is due to a bug in chromium's use of flash that involves it leaving many defunct processes that are uncollected and therefore counting towards ones "process limit". From kanzure at gmail.com Thu Nov 26 21:12:46 2009 From: kanzure at gmail.com (Bryan Bishop) Date: Thu, 26 Nov 2009 20:12:46 -0600 Subject: [Bioperl-l] BioPerl "guts" question regarding forked processes In-Reply-To: References: Message-ID: <55ad6af70911261812q583277d5l71df0d66e756f617@mail.gmail.com> On Thu, Nov 26, 2009 at 8:06 PM, Robert Bradbury wrote: > I'm currently running near my process limit and running sequence fetches > from swissprot (I've also had this happen with getting gi's from NCBI) and > am running out of processes about halfway through the set I'm trying to > fetch [1]. Hey Robert, sorry for the off-topic question, but I was wondering if you're the same Robert Bradbury from the extropy-chat list. Hi? - Bryan http://heybryan.org/ 1 512 203 0507 From paolo.pavan at gmail.com Fri Nov 27 06:35:03 2009 From: paolo.pavan at gmail.com (Paolo Pavan) Date: Fri, 27 Nov 2009 12:35:03 +0100 Subject: [Bioperl-l] More general Bio::Assembly::Contig question (was Bio::Tools::Run::Cap3 usage question) Message-ID: <56be91b60911270335s3a50ab0cpb03aabb6660f81dc@mail.gmail.com> Dear Florent, Thank you for your kind answer and for your efforts spent in this module. Since you are working on these topics I would like to seize the day and put you some questions about some doubts I have in mind, if you agree, of course :-) Some times ago I tried to work with bioperl, loading the data from an ACE file originated by Newbler; my need was to extract part of the contig like an alignment of reads and I tought to do it with a slice() method, since I saw Bio::Assembly::Contig implements Bio::AlignI interface. Unfortunately I realize that this interface is inherited but not implemented. I tried to hack it by adding a slice method which would act on a Bio::Alignment created from the array of LocatableSeqs representing the reads. This is the question: If I'm not wrong (please correct me if yes), Bio::Assembly::Contig class stores reads informations in: Bio::Assembly::Contigs->{_elem}{READ_NAME}{_feat}{ _align_clipping:READ_NAME} _aligned_coord:READ_NAME} _quality_clipping:READ_NAME} Anyone of these 3 features _align_clipping, _aligned_coord, _quality_clipping, contains a Bio::SeqFeature::Generic, which of them is more suitable to the purpose expressed before, the slice method? And more, If you apologize me for being too long, is consequently to the previous: I don't have perfectly clear the purpose of this 3 feature per read, can you explain it? Really thanks you for the time you would spend. Bye bye, Paolo 2009/11/24 Florent Angly > Hi Paolo, > > It turns out that there is no standard for what is to be passed to the > Bio::Tools::Run wrappers and returned by them. I noticed the inconsistency > between the assembly wrappers recently while implementing support for new > wrapper. I implemented inital support for additional de novo assembly > programs in BioPerl (454 Newbler and Minimo) a couple of weeks ago and Mark > Jensen added support for Maq, a program that assembler reads against a > reference. In the process, all the assembly wrappers were changed to take > the same type of input data (a FASTA sequence or an array reference of > sequence objects) and return one of the following: > * a Bio::Assembly::Scaffold object (the default), or > * a Bio::Assembly::IO object, or > * the name of a file for the output of the assembler > Use the out_type method to set up which output you want, e.g.: > $factory->out_type('Bio::Assembly::IO'); > or > $factory->out_type('cap3_results.ace'); > You'll have to use the code in the bioperl-run subversion if you want to > use these new features. > > Cheers, > > Florent > > > > > Paolo Pavan wrote: > >> Dear, >> I'm confused about the proper usage of the module Bio::Tools::Run::Cap3. >> As documented in the pod, the run(@seqs) method returns the cap3 report >> file >> while I expect to return a Bio::Assembly object, consistently with other >> Bio::Tools::Run classes. >> However, I went around this by getting from the factory object the >> location >> and the names of the temp output files (actually accessing a private >> property, although) and reading them via the Assembly::IO system. >> I was just wandering what is the proper designed way to do this job. >> >> Thank you for enlighten the way! >> Paolo >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > > From jw12 at sanger.ac.uk Thu Nov 26 09:57:35 2009 From: jw12 at sanger.ac.uk (Jonathan Warren) Date: Thu, 26 Nov 2009 14:57:35 +0000 Subject: [Bioperl-l] DAS workshop 7th-9th April 2010 Message-ID: We are considering running a Distributed Annotation System workshop here at the Sanger/EBI in the UK subject to decent demand. The workshop will be held from Wednesday 7th-Friday 9th April 2010. If you would be interested in attending either to present or just take part then please email me jw12 at sanger.ac.uk The format of the workshop is likely to be similar to last years (1st day for beginners, 2nd for both beginners and advanced users, 3rd day for advanced), information for which can be found here: http://www.dasregistry.org/course.jsp If you would like to present then please send a short summary of what you would like to talk about. Thanks Jonathan. Jonathan Warren Senior Developer and DAS coordinator jw12 at sanger.ac.uk -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From timbourine81 at googlemail.com Thu Nov 26 11:02:30 2009 From: timbourine81 at googlemail.com (Tim Koehler) Date: Thu, 26 Nov 2009 17:02:30 +0100 Subject: [Bioperl-l] How to parse BLAST output - all hits of each queryinnew file In-Reply-To: <4B0EA44D.2050507@gmail.com> References: <4B0D6C24.2080308@gmail.com> <53DE480F205E42CE8D2B9421592AAF0E@NewLife> <815D2A47BC9C4D89B8DEF0B10DA9EAF8@NewLife> <4B0EA44D.2050507@gmail.com> Message-ID: ups, sent too early... Hey Mark, thanks for the answer. But I am still struggling, especially where to put in your code. Here ist the code I have, so far: #!/usr/bin/perl -w ### should I put your code here as push is a perl command? my %hits_by_query; for ($result->hits) { ### I inserted a comma after name}}; if there is no comma, there was the error: Scalar found where operator expected at 12_BLAST_two_sequence_each_query_one_file.PL line7, near "} $hit" ### (Missing operator before $hit?) ###Useless use of push with no values at 12_BLAST_two_sequence_each_query_one_file.PL line 7. ###syntax error at 12_BLAST_two_sequence_each_query_one_file.PL line 7, near "} $hit" ###BEGIN not safe after errors--compilation aborted at 12_BLAST_two_sequence_each_query_one_file.PL line 13. push @{$hits_by_query{$hit->name}}, $hit; ###here, every time this terror appears: Name "main::result" used only once: possible typo at 12_BLAST_two_sequence_each_query_one_file.PL line 5. ###error: Can't call method "hits" on an undefined value at 12_BLAST_two_sequence_each_query_one_file.PL line 5. } use strict; use Bio::Tools::Run::StandAloneBlast; use Bio::SeqIO; use Bio::SearchIO; use Bio::Search::Result::BlastResult; my $Seq_in = Bio::SeqIO->new ( -file => "/home/koehler/Programs/for_BLAST/BLAST_Pipeline/1_to_BLAST_two_seq.fasta", -format => 'fasta' ); while (my $query = $Seq_in->next_seq()) { my $factory = Bio::Tools::Run::StandAloneBlast->new( 'program' => 'blastn', 'database' => '/home/koehler/Programs/for_BLAST/BLAST_Pipeline/3_BLAST_db', _READMETHOD => "Blast" ); my $blast_report = $factory->blastall($query); ### Should I need to use a module? are the commands here at the right position? errors, e.g., Global symbol "$hit" requires explicit package name #my %hits_by_query; #for ($result->hits) { ### inserted comma after name}} # push @{$hits_by_query{$hit->name}}, $hit; #} foreach my $qid ( keys %hits_by_query ) { my $result = Bio::Search::Result::BlastResult->new(); $result->add_hit($_) for ( @{$hits_by_query{$qid}} ); my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", -format=>'blast' ); $blio->write_result($result); } ###where are the files stored? what is their name. Sorry, but I cannot get behind that :( while( my $result = $blast_report->next_result ) { ## $result is a Bio::Search::Result::ResultI compliant object while( my $hit = $result->next_hit ) { ## $hit is a Bio::Search::Hit::HitI compliant object while( my $hsp = $hit->next_hsp ) { ## $hsp is a Bio::Search::HSP::HSPI compliant object if( $hsp->length('total') > 50 ) { if ( $hsp->percent_identity >= 75 ) { print "Query= ", $result->query_name, "Hit= ", $hit->name, "Length= ", $hsp->length('total'), "Percent_id= ", $hsp->percent_identity, "Subject=", $hsp->hit_string,"\n"; } } } } } } Again, a big thanks in advance :) All the best, Tim On Thu, Nov 26, 2009 at 4:52 PM, Tim wrote: > Hey Mark, > > thanks for the answer > > On 25.11.2009 20:21, Mark A. Jensen wrote: > > whoops: change the following line: > > my $blio = Bio::SearchIO->new( -file => $qid.".bls", -format=>'blast' ); > > > > to > > > > my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", -format=>'blast' ); > > > > (I always forget that...) > > MAJ > > > > ----- Original Message ----- From: "Mark A. Jensen" > > To: "Tim" ; > > Sent: Wednesday, November 25, 2009 1:20 PM > > Subject: Re: [Bioperl-l] How to parse BLAST output - all hits of each > > queryinnew file > > > > > >> hey Tim-- > >> > >> Sound like you need to go about collecting your queries inside out: > >> > >> my %hits_by_query; > >> for ($result->hits) { > >> push @{$hits_by_query{$hit->name}} $hit; > >> } > >> > >> I believe now each hash element, keyed by the query name, will contain > >> an arrayref to the set of hits assoc with that query. > >>> From here, I believe > >> > >> use Bio::Search::Result::BlastResult; > >> use Bio::SearchIO; > >> > >> foreach my $qid ( keys %hits_by_query ) { > >> my $result = Bio::Search::Result::BlastResult->new(); > >> $result->add_hit($_) for ( @{$hits_by_query{$qid}} ); > >> my $blio = Bio::SearchIO->new( -file => $qid.".bls", -format=>'blast' > ); > >> $blio->write_result($result); > >> } > >> > >> will do what you want. > >> > >> hope this helps - > >> Mark > >> > >> ----- Original Message ----- From: "Tim" > >> To: > >> Sent: Wednesday, November 25, 2009 12:40 PM > >> Subject: [Bioperl-l] How to parse BLAST output - all hits of each > >> query innew file > >> > >> > >>> Dear bioperl users, > >>> > >>> I am a real newbie and have - maybe a very trivial - question. > >>> > >>> I searched the mailing list archive and many howtos but I have not > found > >>> a concrete answer to my problem. So hopefully you can help me :) > >>> > >>> Background: I use the latest Bioperl version (installed it two weeks > >>> before). > >>> When I use Bio::Tools::Run::StandAloneBlast to BLAST one fasta file > >>> including different sequences, I get a BLAST output with many queries > >>> each having several hits / sbjcts. > >>> > >>> My problem is how to parse *all* hits of *one* query into a single new > >>> file. And this for all the queries I have in my BLAST output file. > >>> > >>> Or is it better the other way round; first to make fasta files with > only > >>> single sequences inside and BLAST each file? But how can I automize > that > >>> using Bioperl? > >>> > >>> I tried Bio::SearchIO but can only parse all queries and their > >>> respective hits in only one file... > >>> I think iteration is also necessary here, but I do not really know how > >>> to include that into Bio::SearchIO. > >>> Or do I have to use Module:Bio::Index::Blast? > >>> > >>> I can index a file (see below), but I have no idea what comes next... > >>> > >>> ###How I index a file... > >>> > >>> #!/usr/bin/perl -w > >>> > >>> $ENV{BIOPERL_INDEX_TYPE} = "SDBM_File"; > >>> > >>> use Bio::Index::Fasta; > >>> > >>> > >>> $file_name = "8_to_BLAST_two_seq_index.fasta"; > >>> $id = "48882"; > >>> $inx = Bio::Index::Fasta->new (-filename => $file_name . ".idx", > >>> -write_flag => 1); > >>> $inx->make_index($file_name); > >>> > >>> > >>> Hopefully, you can give me at least hints what to look for. > >>> > >>> A big THANKS in advance! > >>> > >>> Cheers, > >>> > >>> Tim > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >>> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > > > > Tim K?hler MPI for Terrestrial Microbiology Karl-von-Frisch-Stra?e D-35043 Marburg / Germany Email: koehlerd at mpi-marburg.mpg.de Phone: +49 6421 178-740 Fax: +49 6421 178-999 From rtbio.2009 at gmail.com Sat Nov 28 02:53:43 2009 From: rtbio.2009 at gmail.com (Roopa Raghuveer) Date: Sat, 28 Nov 2009 08:53:43 +0100 Subject: [Bioperl-l] Linking of two cgi scripts Message-ID: hello everyone, I have a small question. I would like to link two cgi scripts i.e., I have an input sequence being entered in a text area ex:->gi|at442323|... ATGCCCCCTTGGAACCAAAAAAA.... So I would like to compare this with the query sequences.These query sequences would be from a BLAST script in the module blast.pm So once I enter the input sequence and request for BLAST using submit button,my request should go to a program which performs BLAST search.After this, the sequences obtained from BLAST have to be returned to a program Roopa.pm which compares the input sequence and the sequences obtained from blast. But I am unable to provide this link between the cgi scripts.(i.e.,one script to use BLAST,the other script to compare the sequences and send the results to the browser) Could any one help me in this regard? Regards, Roopa. From s.denaxas at gmail.com Sat Nov 28 05:56:15 2009 From: s.denaxas at gmail.com (Spiros Denaxas) Date: Sat, 28 Nov 2009 10:56:15 +0000 Subject: [Bioperl-l] Linking of two cgi scripts In-Reply-To: References: Message-ID: Hello, Why do they both have to be CGi scripts? cant all the processing happen server side, i.e. both BLAST and comparison of returned results? If that is strictly a requirement, you could: a) get input from user on script A, i.e. the input sequence b) do a HTTP request from the CGI to the other script B using LWP::UserAgent c) get results from script B, pass on to comparison module d) return results to user As I said, this will be clunky so either do everything in one go or consider AJAX hope this helps Spiros On Sat, Nov 28, 2009 at 7:53 AM, Roopa Raghuveer wrote: > hello everyone, > > I have a small question. > > I would like to link two cgi scripts i.e., > > I have an input sequence being entered in a text area > > ex:->gi|at442323|... > ATGCCCCCTTGGAACCAAAAAAA.... > > So I would like to compare this with the query sequences.These query > sequences would be from a BLAST script in the module blast.pm > So once I enter the input sequence and request for BLAST using submit > button,my request should go to a program which performs BLAST search.After > this, the sequences obtained from BLAST have to be returned to a program > Roopa.pm which compares the input sequence and the sequences obtained from > blast. > > But I am unable to provide this link between the cgi scripts.(i.e.,one > script to use BLAST,the other script to compare the sequences and send the > results to the browser) > > Could any one help me in this regard? > > Regards, > Roopa. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From maj at fortinbras.us Sat Nov 28 11:23:53 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sat, 28 Nov 2009 11:23:53 -0500 Subject: [Bioperl-l] Run wrappers for BWA and Samtools Message-ID: <7F56A6EEEB0E4EE291D5340F27DF7D3A@NewLife> Hi All, Run wrappers for the bwa assembler and the samtools suite are now available as beta in the bioperl-run/trunk. The bwa wrapper allows you to run a canned assembly pipeline, or to execute individual bwa components. The assembly pipeline can return a Bio::Assembly::Scaffold object via the new Bio::Assembly::IO::sam module in bioperl-live/trunk (this requires lstein's Bio::DB::Sam, from CPAN). Details at http://www.bioperl.org/wiki/HOWTO:Short-read_assemblies_with_BWA and, of course, in the pod. Cheers, MAJ From maj at fortinbras.us Sat Nov 28 21:55:42 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sat, 28 Nov 2009 21:55:42 -0500 Subject: [Bioperl-l] How to parse BLAST output - all hits of eachqueryinnew file In-Reply-To: References: <4B0D6C24.2080308@gmail.com><53DE480F205E42CE8D2B9421592AAF0E@NewLife><815D2A47BC9C4D89B8DEF0B10DA9EAF8@NewLife><4B0EA44D.2050507@gmail.com> Message-ID: <21BFD947CEEF43CCAC8AFFDB7A064A49@NewLife> Hi Tim-- There's a bug in my code; should be for my $hit ($result->hits) { ... } and you're right about the comma. My bad. But I don't think you need this-- you're already looping over your query sequences and doing blastn on each one. So in the middle of your loop, you can simply write the blast result that you got: my $blio = Bio::SearchIO->new( -file => ">".$query->id.".bls", -format=>"blast" ); $blio->write_result($result); and forget about the foreach my $qid loop entirely. The files should show up in the directory from which you're running the script. cheers, MAJ ----- Original Message ----- From: "Tim Koehler" To: Sent: Thursday, November 26, 2009 11:02 AM Subject: Re: [Bioperl-l] How to parse BLAST output - all hits of eachqueryinnew file ups, sent too early... Hey Mark, thanks for the answer. But I am still struggling, especially where to put in your code. Here ist the code I have, so far: #!/usr/bin/perl -w ### should I put your code here as push is a perl command? my %hits_by_query; for ($result->hits) { ### I inserted a comma after name}}; if there is no comma, there was the error: Scalar found where operator expected at 12_BLAST_two_sequence_each_query_one_file.PL line7, near "} $hit" ### (Missing operator before $hit?) ###Useless use of push with no values at 12_BLAST_two_sequence_each_query_one_file.PL line 7. ###syntax error at 12_BLAST_two_sequence_each_query_one_file.PL line 7, near "} $hit" ###BEGIN not safe after errors--compilation aborted at 12_BLAST_two_sequence_each_query_one_file.PL line 13. push @{$hits_by_query{$hit->name}}, $hit; ###here, every time this terror appears: Name "main::result" used only once: possible typo at 12_BLAST_two_sequence_each_query_one_file.PL line 5. ###error: Can't call method "hits" on an undefined value at 12_BLAST_two_sequence_each_query_one_file.PL line 5. } use strict; use Bio::Tools::Run::StandAloneBlast; use Bio::SeqIO; use Bio::SearchIO; use Bio::Search::Result::BlastResult; my $Seq_in = Bio::SeqIO->new ( -file => "/home/koehler/Programs/for_BLAST/BLAST_Pipeline/1_to_BLAST_two_seq.fasta", -format => 'fasta' ); while (my $query = $Seq_in->next_seq()) { my $factory = Bio::Tools::Run::StandAloneBlast->new( 'program' => 'blastn', 'database' => '/home/koehler/Programs/for_BLAST/BLAST_Pipeline/3_BLAST_db', _READMETHOD => "Blast" ); my $blast_report = $factory->blastall($query); ### Should I need to use a module? are the commands here at the right position? errors, e.g., Global symbol "$hit" requires explicit package name #my %hits_by_query; #for ($result->hits) { ### inserted comma after name}} # push @{$hits_by_query{$hit->name}}, $hit; #} foreach my $qid ( keys %hits_by_query ) { my $result = Bio::Search::Result::BlastResult->new(); $result->add_hit($_) for ( @{$hits_by_query{$qid}} ); my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", -format=>'blast' ); $blio->write_result($result); } ###where are the files stored? what is their name. Sorry, but I cannot get behind that :( while( my $result = $blast_report->next_result ) { ## $result is a Bio::Search::Result::ResultI compliant object while( my $hit = $result->next_hit ) { ## $hit is a Bio::Search::Hit::HitI compliant object while( my $hsp = $hit->next_hsp ) { ## $hsp is a Bio::Search::HSP::HSPI compliant object if( $hsp->length('total') > 50 ) { if ( $hsp->percent_identity >= 75 ) { print "Query= ", $result->query_name, "Hit= ", $hit->name, "Length= ", $hsp->length('total'), "Percent_id= ", $hsp->percent_identity, "Subject=", $hsp->hit_string,"\n"; } } } } } } Again, a big thanks in advance :) All the best, Tim On Thu, Nov 26, 2009 at 4:52 PM, Tim wrote: > Hey Mark, > > thanks for the answer > > On 25.11.2009 20:21, Mark A. Jensen wrote: > > whoops: change the following line: > > my $blio = Bio::SearchIO->new( -file => $qid.".bls", -format=>'blast' ); > > > > to > > > > my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", -format=>'blast' ); > > > > (I always forget that...) > > MAJ > > > > ----- Original Message ----- From: "Mark A. Jensen" > > To: "Tim" ; > > Sent: Wednesday, November 25, 2009 1:20 PM > > Subject: Re: [Bioperl-l] How to parse BLAST output - all hits of each > > queryinnew file > > > > > >> hey Tim-- > >> > >> Sound like you need to go about collecting your queries inside out: > >> > >> my %hits_by_query; > >> for ($result->hits) { > >> push @{$hits_by_query{$hit->name}} $hit; > >> } > >> > >> I believe now each hash element, keyed by the query name, will contain > >> an arrayref to the set of hits assoc with that query. > >>> From here, I believe > >> > >> use Bio::Search::Result::BlastResult; > >> use Bio::SearchIO; > >> > >> foreach my $qid ( keys %hits_by_query ) { > >> my $result = Bio::Search::Result::BlastResult->new(); > >> $result->add_hit($_) for ( @{$hits_by_query{$qid}} ); > >> my $blio = Bio::SearchIO->new( -file => $qid.".bls", -format=>'blast' > ); > >> $blio->write_result($result); > >> } > >> > >> will do what you want. > >> > >> hope this helps - > >> Mark > >> > >> ----- Original Message ----- From: "Tim" > >> To: > >> Sent: Wednesday, November 25, 2009 12:40 PM > >> Subject: [Bioperl-l] How to parse BLAST output - all hits of each > >> query innew file > >> > >> > >>> Dear bioperl users, > >>> > >>> I am a real newbie and have - maybe a very trivial - question. > >>> > >>> I searched the mailing list archive and many howtos but I have not > found > >>> a concrete answer to my problem. So hopefully you can help me :) > >>> > >>> Background: I use the latest Bioperl version (installed it two weeks > >>> before). > >>> When I use Bio::Tools::Run::StandAloneBlast to BLAST one fasta file > >>> including different sequences, I get a BLAST output with many queries > >>> each having several hits / sbjcts. > >>> > >>> My problem is how to parse *all* hits of *one* query into a single new > >>> file. And this for all the queries I have in my BLAST output file. > >>> > >>> Or is it better the other way round; first to make fasta files with > only > >>> single sequences inside and BLAST each file? But how can I automize > that > >>> using Bioperl? > >>> > >>> I tried Bio::SearchIO but can only parse all queries and their > >>> respective hits in only one file... > >>> I think iteration is also necessary here, but I do not really know how > >>> to include that into Bio::SearchIO. > >>> Or do I have to use Module:Bio::Index::Blast? > >>> > >>> I can index a file (see below), but I have no idea what comes next... > >>> > >>> ###How I index a file... > >>> > >>> #!/usr/bin/perl -w > >>> > >>> $ENV{BIOPERL_INDEX_TYPE} = "SDBM_File"; > >>> > >>> use Bio::Index::Fasta; > >>> > >>> > >>> $file_name = "8_to_BLAST_two_seq_index.fasta"; > >>> $id = "48882"; > >>> $inx = Bio::Index::Fasta->new (-filename => $file_name . ".idx", > >>> -write_flag => 1); > >>> $inx->make_index($file_name); > >>> > >>> > >>> Hopefully, you can give me at least hints what to look for. > >>> > >>> A big THANKS in advance! > >>> > >>> Cheers, > >>> > >>> Tim > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >>> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > > > > Tim K?hler MPI for Terrestrial Microbiology Karl-von-Frisch-Stra?e D-35043 Marburg / Germany Email: koehlerd at mpi-marburg.mpg.de Phone: +49 6421 178-740 Fax: +49 6421 178-999 _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Sat Nov 28 22:32:42 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sat, 28 Nov 2009 22:32:42 -0500 Subject: [Bioperl-l] HOWTO copyright policy vs FDL on wiki Message-ID: <9EC73CA501BD45BA912F2D77954D6CD7@NewLife> The HOWTOs appear to have a more restrictive copyright than FDL-- in particular, the blurb at the bottom of the HOWTO page asks users to use the documents for personal use only. I'm for this; I think we should therefore have some explicit license for these that specifies this kind of restriction, and then express that on each howto and in BioPerl:Copyright. Any thoughts on the right license and whether this is a good plan? MAJ From florent.angly at gmail.com Sat Nov 28 22:47:45 2009 From: florent.angly at gmail.com (Florent Angly) Date: Sat, 28 Nov 2009 19:47:45 -0800 Subject: [Bioperl-l] More general Bio::Assembly::Contig question (was Bio::Tools::Run::Cap3 usage question) In-Reply-To: <56be91b60911270335s3a50ab0cpb03aabb6660f81dc@mail.gmail.com> References: <56be91b60911270335s3a50ab0cpb03aabb6660f81dc@mail.gmail.com> Message-ID: <4B11EEE1.8070907@gmail.com> Hi Paolo, The aligned reads of a contig are stored in Bio::Assembly::Contigs->{_elem}{READ_NAME}{_seq}. To implement a slice() method, you could retrieve the reads using get_seq_ids(), get_seq_by_name() or get_seq_by_pos(). To retrieve the position of an aligned read in the contig, use get_seq_coord() which returns a Bio::SeqFeature::Generic object (from Bio::Assembly::Contigs->{_elem}{READ_NAME}{_feat}{_aligned_coord:READ_NAME}) on which you can call the start() and end() methods. I'm not entirely sure what Bio::Assembly::Contigs->{_elem}{READ_NAME}{_feat}{_align_clipping:READ_NAME} and {_quality_clipping:READ_NAME} are. I believe that they represent the clear range of the read/contig. Hope it helps, Florent Paolo Pavan wrote: > Dear Florent, > Thank you for your kind answer and for your efforts spent in this module. > Since you are working on these topics I would like to seize the day > and put you some questions about some doubts I have in mind, if you > agree, of course :-) > Some times ago I tried to work with bioperl, loading the data from an > ACE file originated by Newbler; my need was to extract part of the > contig like an alignment of reads and I tought to do it with a slice() > method, since I saw Bio::Assembly::Contig implements Bio::AlignI > interface. Unfortunately I realize that this interface is inherited > but not implemented. > I tried to hack it by adding a slice method which would act on a > Bio::Alignment created from the array of LocatableSeqs representing > the reads. > > This is the question: > If I'm not wrong (please correct me if yes), Bio::Assembly::Contig > class stores reads informations in: > Bio::Assembly::Contigs->{_elem}{READ_NAME}{_feat}{ > _align_clipping:READ_NAME} > _aligned_coord:READ_NAME} > _quality_clipping:READ_NAME} > > Anyone of these 3 features _align_clipping, _aligned_coord, > _quality_clipping, contains a Bio::SeqFeature::Generic, which of them > is more suitable to the purpose expressed before, the slice method? > And more, If you apologize me for being too long, is consequently to > the previous: I don't have perfectly clear the purpose of this 3 > feature per read, can you explain it? > > Really thanks you for the time you would spend. > Bye bye, > Paolo From bimber at wisc.edu Sun Nov 29 00:31:25 2009 From: bimber at wisc.edu (Ben Bimber) Date: Sat, 28 Nov 2009 23:31:25 -0600 Subject: [Bioperl-l] using bioperl to compare sequences Message-ID: <9f985cdc0911282131l350bc525gd9ad4717c101ac63@mail.gmail.com> Hello, I have a couple years programming experience, but am reasonably new to perl and extremely new to bioperl. I have been reading through the bioperl documentation and am trying to understand the best way to approach a particular problem. I'm hoping someone could offer some tips and point me in the right direction. If someone has solved this sort of problem before, i'd prefer not to reinvent things. Here's what I'm trying to do: Our lab generates mRNA sequence data, consisting of alleles of a given gene or genes I want to compare each of these sequences against a reference using BLAST or clustalw (will need the ability to choose at run time) Take the result of this alignment, then record positions of difference between the experimental sequence and reference sequence (SNPs) Translate the corresponding AA change(s) associated with each SNP. There can be overlapping ORFs. I see that bioperl has modules for BLAST and clustal. I've also been looking at the modules under variation. I havent fully wrapped my head around them, but they look to be what i'd use for SNP detection. has anyone has written code to perform similar things and if so, would you be willing to share specific examples? Anything concrete to see exactly how these modules operate would be extremely helpful. Thanks in advance for any tips or help. From jason at bioperl.org Sun Nov 29 10:54:53 2009 From: jason at bioperl.org (Jason Stajich) Date: Sun, 29 Nov 2009 07:54:53 -0800 Subject: [Bioperl-l] How to parse BLAST output - all hits of eachqueryinnew file In-Reply-To: <21BFD947CEEF43CCAC8AFFDB7A064A49@NewLife> References: <4B0D6C24.2080308@gmail.com><53DE480F205E42CE8D2B9421592AAF0E@NewLife><815D2A47BC9C4D89B8DEF0B10DA9EAF8@NewLife><4B0EA44D.2050507@gmail.com> <21BFD947CEEF43CCAC8AFFDB7A064A49@NewLife> Message-ID: <897A8DB4-AF29-4601-A1E5-9A04D9D8C151@bioperl.org> or while( my $hit = $result->next_hit ) { } On Nov 28, 2009, at 6:55 PM, Mark A. Jensen wrote: > Hi Tim-- > There's a bug in my code; should be > for my $hit ($result->hits) { > ... > } > and you're right about the comma. My bad. > > But I don't think you need this-- you're already looping over your > query sequences and doing blastn on each one. So in the middle of > your loop, you can simply write the blast result that you got: > > my $blio = Bio::SearchIO->new( -file => ">".$query->id.".bls", - > format=>"blast" ); > $blio->write_result($result); > > and forget about the foreach my $qid loop entirely. > > The files should show up in the directory from which you're > running the script. > cheers, MAJ > > > > ----- Original Message ----- From: "Tim Koehler" > > To: > Sent: Thursday, November 26, 2009 11:02 AM > Subject: Re: [Bioperl-l] How to parse BLAST output - all hits of > eachqueryinnew file > > > ups, sent too early... > > Hey Mark, > > thanks for the answer. But I am still struggling, especially where > to put in > your code. > > Here ist the code I have, so far: > > #!/usr/bin/perl -w > > ### should I put your code here as push is a perl command? > my %hits_by_query; > for ($result->hits) { > ### I inserted a comma after name}}; if there is no comma, there was > the > error: Scalar found where operator expected at > 12_BLAST_two_sequence_each_query_one_file.PL line7, near "} $hit" > ### (Missing operator before $hit?) > ###Useless use of push with no values at > 12_BLAST_two_sequence_each_query_one_file.PL line 7. > ###syntax error at 12_BLAST_two_sequence_each_query_one_file.PL line > 7, near > "} $hit" > ###BEGIN not safe after errors--compilation aborted at > 12_BLAST_two_sequence_each_query_one_file.PL line 13. > push @{$hits_by_query{$hit->name}}, $hit; > ###here, every time this terror appears: Name "main::result" used > only once: > possible typo at 12_BLAST_two_sequence_each_query_one_file.PL line 5. > ###error: Can't call method "hits" on an undefined value at > 12_BLAST_two_sequence_each_query_one_file.PL line 5. > } > > > use strict; > use Bio::Tools::Run::StandAloneBlast; > use Bio::SeqIO; > use Bio::SearchIO; > use Bio::Search::Result::BlastResult; > > my $Seq_in = Bio::SeqIO->new ( > -file => > "/home/koehler/Programs/for_BLAST/BLAST_Pipeline/ > 1_to_BLAST_two_seq.fasta", > -format => 'fasta' > ); > while (my $query = $Seq_in->next_seq()) { > my $factory = Bio::Tools::Run::StandAloneBlast->new( > 'program' => 'blastn', > 'database' => '/home/koehler/Programs/for_BLAST/BLAST_Pipeline/ > 3_BLAST_db', > _READMETHOD => "Blast" > ); > > my $blast_report = $factory->blastall($query); > > ### Should I need to use a module? are the commands here at the right > position? errors, e.g., Global symbol "$hit" requires explicit > package name > #my %hits_by_query; > #for ($result->hits) { > ### inserted comma after name}} > # push @{$hits_by_query{$hit->name}}, $hit; > #} > > foreach my $qid ( keys %hits_by_query ) { > my $result = Bio::Search::Result::BlastResult->new(); > $result->add_hit($_) for ( @{$hits_by_query{$qid}} ); > my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", - > format=>'blast' ); > $blio->write_result($result); > } > > ###where are the files stored? what is their name. Sorry, but I > cannot get > behind that :( > > while( my $result = $blast_report->next_result ) { > ## $result is a Bio::Search::Result::ResultI compliant object > while( my $hit = $result->next_hit ) { > ## $hit is a Bio::Search::Hit::HitI compliant object > while( my $hsp = $hit->next_hsp ) { > ## $hsp is a Bio::Search::HSP::HSPI compliant object > if( $hsp->length('total') > 50 ) { > if ( $hsp->percent_identity >= 75 ) { > print "Query= ", $result->query_name, > "Hit= ", $hit->name, > "Length= ", $hsp->length('total'), > "Percent_id= ", $hsp->percent_identity, > "Subject=", $hsp->hit_string,"\n"; > } > } > } > } > } > } > > Again, a big thanks in advance :) > > All the best, > > Tim > > > On Thu, Nov 26, 2009 at 4:52 PM, Tim wrote: > >> Hey Mark, >> >> thanks for the answer >> >> On 25.11.2009 20:21, Mark A. Jensen wrote: >> > whoops: change the following line: >> > my $blio = Bio::SearchIO->new( -file => $qid.".bls", - >> format=>'blast' ); >> > >> > to >> > >> > my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", - >> format=>'blast' ); >> > >> > (I always forget that...) >> > MAJ >> > >> > ----- Original Message ----- From: "Mark A. Jensen" > > >> > To: "Tim" ; >> > Sent: Wednesday, November 25, 2009 1:20 PM >> > Subject: Re: [Bioperl-l] How to parse BLAST output - all hits of >> each >> > queryinnew file >> > >> > >> >> hey Tim-- >> >> >> >> Sound like you need to go about collecting your queries inside >> out: >> >> >> >> my %hits_by_query; >> >> for ($result->hits) { >> >> push @{$hits_by_query{$hit->name}} $hit; >> >> } >> >> >> >> I believe now each hash element, keyed by the query name, will >> contain >> >> an arrayref to the set of hits assoc with that query. >> >>> From here, I believe >> >> >> >> use Bio::Search::Result::BlastResult; >> >> use Bio::SearchIO; >> >> >> >> foreach my $qid ( keys %hits_by_query ) { >> >> my $result = Bio::Search::Result::BlastResult->new(); >> >> $result->add_hit($_) for ( @{$hits_by_query{$qid}} ); >> >> my $blio = Bio::SearchIO->new( -file => $qid.".bls", - >> format=>'blast' >> ); >> >> $blio->write_result($result); >> >> } >> >> >> >> will do what you want. >> >> >> >> hope this helps - >> >> Mark >> >> >> >> ----- Original Message ----- From: "Tim" >> >> To: >> >> Sent: Wednesday, November 25, 2009 12:40 PM >> >> Subject: [Bioperl-l] How to parse BLAST output - all hits of each >> >> query innew file >> >> >> >> >> >>> Dear bioperl users, >> >>> >> >>> I am a real newbie and have - maybe a very trivial - question. >> >>> >> >>> I searched the mailing list archive and many howtos but I have >> not >> found >> >>> a concrete answer to my problem. So hopefully you can help me :) >> >>> >> >>> Background: I use the latest Bioperl version (installed it two >> weeks >> >>> before). >> >>> When I use Bio::Tools::Run::StandAloneBlast to BLAST one fasta >> file >> >>> including different sequences, I get a BLAST output with many >> queries >> >>> each having several hits / sbjcts. >> >>> >> >>> My problem is how to parse *all* hits of *one* query into a >> single new >> >>> file. And this for all the queries I have in my BLAST output >> file. >> >>> >> >>> Or is it better the other way round; first to make fasta files >> with >> only >> >>> single sequences inside and BLAST each file? But how can I >> automize >> that >> >>> using Bioperl? >> >>> >> >>> I tried Bio::SearchIO but can only parse all queries and their >> >>> respective hits in only one file... >> >>> I think iteration is also necessary here, but I do not really >> know how >> >>> to include that into Bio::SearchIO. >> >>> Or do I have to use Module:Bio::Index::Blast? >> >>> >> >>> I can index a file (see below), but I have no idea what comes >> next... >> >>> >> >>> ###How I index a file... >> >>> >> >>> #!/usr/bin/perl -w >> >>> >> >>> $ENV{BIOPERL_INDEX_TYPE} = "SDBM_File"; >> >>> >> >>> use Bio::Index::Fasta; >> >>> >> >>> >> >>> $file_name = "8_to_BLAST_two_seq_index.fasta"; >> >>> $id = "48882"; >> >>> $inx = Bio::Index::Fasta->new (-filename => $file_name . ".idx", >> >>> -write_flag => 1); >> >>> $inx->make_index($file_name); >> >>> >> >>> >> >>> Hopefully, you can give me at least hints what to look for. >> >>> >> >>> A big THANKS in advance! >> >>> >> >>> Cheers, >> >>> >> >>> Tim >> >>> _______________________________________________ >> >>> Bioperl-l mailing list >> >>> Bioperl-l at lists.open-bio.org >> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >>> >> >>> >> >> >> >> _______________________________________________ >> >> Bioperl-l mailing list >> >> Bioperl-l at lists.open-bio.org >> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> >> >> > >> >> Tim K?hler > MPI for Terrestrial Microbiology > Karl-von-Frisch-Stra?e > D-35043 Marburg / Germany > > Email: koehlerd at mpi-marburg.mpg.de > Phone: +49 6421 178-740 > Fax: +49 6421 178-999 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From suzi at berkeleybop.org Sun Nov 29 23:03:09 2009 From: suzi at berkeleybop.org (Suzanna Lewis) Date: Sun, 29 Nov 2009 20:03:09 -0800 Subject: [Bioperl-l] [DAS] DAS workshop 7th-9th April 2010 In-Reply-To: References: Message-ID: <3AD3C819-4BAA-4D90-B141-9611F48C5CAD@ berkeleybop.org> I/we (Gregg) would be interested in attending. We'd present an update on the collaborative, web-based version of Apollo. We will be working with Ian Holmes and Mitch Skinner using JBrowse for basic display. -S On Nov 26, 2009, at 6:57 AM, Jonathan Warren wrote: > We are considering running a Distributed Annotation System workshop here at the Sanger/EBI in the UK subject to decent demand. > The workshop will be held from Wednesday 7th-Friday 9th April 2010. If you would be interested in attending either to present or just take part > then please email me jw12 at sanger.ac.uk > > The format of the workshop is likely to be similar to last years (1st day for beginners, 2nd for both beginners and advanced users, 3rd day for advanced), information for which can be found here: > http://www.dasregistry.org/course.jsp > > If you would like to present then please send a short summary of what you would like to talk about. > > Thanks > > Jonathan. > > Jonathan Warren > Senior Developer and DAS coordinator > jw12 at sanger.ac.uk > > > > > > > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome ResearchLimited, a charity registered in England with number 1021457 and acompany registered in England with number 2742969, whose registeredoffice is 215 Euston Road, London, NW1 2BE._______________________________________________ > DAS mailing list > DAS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das > From maj at fortinbras.us Mon Nov 30 09:31:27 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 30 Nov 2009 09:31:27 -0500 Subject: [Bioperl-l] HOWTO copyright policy vs FDL on wiki In-Reply-To: <81B3C4A1-9F14-4FF9-A4AF-F7E90817A2F1@verizon.net> References: <9EC73CA501BD45BA912F2D77954D6CD7@NewLife> <81B3C4A1-9F14-4FF9-A4AF-F7E90817A2F1@verizon.net> Message-ID: <513F1C824EF84974993A76F0CC719CDF@NewLife> Well, it has a history, Jason's point. So the question could be: "is this still a valid issue"? A while back, a user on the wiki, with natural and good intentions, removed the authorship and revision info from a couple of the HOWTOs; it is more wiki-like, after all. But Chris had some objections to that, which I seconded, mainly on the basis of the special status that seems implied by the copyright note on the HOWTO page. I also think that the nature of the howto is somewhat different from other info on the site -- that developers themselves put a lot of time in to explaining how to use their modules, and that in this world where devs get paid by recognition, it is a reasonable thing to allow this extra horn-tooting. Now, that is a policy that could be completely separable from the issue of copyright. However, devs may also get paid by using their materials in teaching seminars. The dilemma would be that people who like to use the wiki are people who like to share, and so it feels unnatural to withhold from the community the materials they develop, but people who like to share also like to eat and wear shoes... so I'm interested in everyone's thoughts about it. ----- Original Message ----- From: "Brian Osborne" To: "Mark A. Jensen" Cc: "Chris Fields" ; "Jason Stajich" ; "bioperl List" Sent: Monday, November 30, 2009 9:16 AM Subject: Re: [Bioperl-l] HOWTO copyright policy vs FDL on wiki > Mark, > > Let me ask you a question, and don't take this question as an implicit > criticism of your suggestion, it is not. Why would you want this more > restrictive copyright? > > Brian O. > > On Nov 28, 2009, at 10:32 PM, Mark A. Jensen wrote: > >> The HOWTOs appear to have a more restrictive copyright >> than FDL-- in particular, the blurb at the bottom of the >> HOWTO page asks users to use the documents for personal >> use only. I'm for this; I think we should therefore have some >> explicit license for these that specifies this kind of restriction, >> and then express that on each howto and in BioPerl:Copyright. >> Any thoughts on the right license and whether this is a good plan? >> MAJ >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From bosborne11 at verizon.net Mon Nov 30 10:15:32 2009 From: bosborne11 at verizon.net (Brian Osborne) Date: Mon, 30 Nov 2009 10:15:32 -0500 Subject: [Bioperl-l] HOWTO copyright policy vs FDL on wiki In-Reply-To: <513F1C824EF84974993A76F0CC719CDF@NewLife> References: <9EC73CA501BD45BA912F2D77954D6CD7@NewLife> <81B3C4A1-9F14-4FF9-A4AF-F7E90817A2F1@verizon.net> <513F1C824EF84974993A76F0CC719CDF@NewLife> Message-ID: <54671455-A02C-4139-8C39-AC17B50D5CE6@verizon.net> Mark, I have no objection to a more restrictive copyright, and I also have no objection to using FDL, or things like it. Brian O. On Nov 30, 2009, at 9:31 AM, Mark A. Jensen wrote: > Well, it has a history, Jason's point. So the question could > be: "is this still a valid issue"? A while back, a user on the wiki, > with natural and good intentions, removed the authorship and revision > info from a couple of the HOWTOs; it is more wiki-like, > after all. But Chris had some objections to that, which I > seconded, mainly on the basis of the special status that > seems implied by the copyright note on the HOWTO > page. I also think that the nature of the howto is somewhat > different from other info on the site -- that developers themselves > put a lot of time in to explaining how to use their modules, and > that in this world where devs get paid by recognition, it is a > reasonable > thing to allow this extra horn-tooting. Now, that is a policy > that could be completely separable from the issue of copyright. > However, devs may also get paid by using their materials in teaching > seminars. The dilemma would be that people who like to use the > wiki are people who like to share, and so it feels unnatural to > withhold from the community the materials they develop, but > people who like to share also like to eat and wear shoes... > so I'm interested in everyone's thoughts about it. > ----- Original Message ----- From: "Brian Osborne" > > To: "Mark A. Jensen" > Cc: "Chris Fields" ; "Jason Stajich" >; "bioperl List" > Sent: Monday, November 30, 2009 9:16 AM > Subject: Re: [Bioperl-l] HOWTO copyright policy vs FDL on wiki > > >> Mark, >> >> Let me ask you a question, and don't take this question as an >> implicit criticism of your suggestion, it is not. Why would you >> want this more restrictive copyright? >> >> Brian O. >> >> On Nov 28, 2009, at 10:32 PM, Mark A. Jensen wrote: >> >>> The HOWTOs appear to have a more restrictive copyright >>> than FDL-- in particular, the blurb at the bottom of the >>> HOWTO page asks users to use the documents for personal >>> use only. I'm for this; I think we should therefore have some >>> explicit license for these that specifies this kind of restriction, >>> and then express that on each howto and in BioPerl:Copyright. >>> Any thoughts on the right license and whether this is a good plan? >>> MAJ >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > From bosborne11 at verizon.net Mon Nov 30 09:16:07 2009 From: bosborne11 at verizon.net (Brian Osborne) Date: Mon, 30 Nov 2009 09:16:07 -0500 Subject: [Bioperl-l] HOWTO copyright policy vs FDL on wiki In-Reply-To: <9EC73CA501BD45BA912F2D77954D6CD7@NewLife> References: <9EC73CA501BD45BA912F2D77954D6CD7@NewLife> Message-ID: <81B3C4A1-9F14-4FF9-A4AF-F7E90817A2F1@verizon.net> Mark, Let me ask you a question, and don't take this question as an implicit criticism of your suggestion, it is not. Why would you want this more restrictive copyright? Brian O. On Nov 28, 2009, at 10:32 PM, Mark A. Jensen wrote: > The HOWTOs appear to have a more restrictive copyright > than FDL-- in particular, the blurb at the bottom of the > HOWTO page asks users to use the documents for personal > use only. I'm for this; I think we should therefore have some > explicit license for these that specifies this kind of restriction, > and then express that on each howto and in BioPerl:Copyright. > Any thoughts on the right license and whether this is a good plan? > MAJ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Mon Nov 30 12:41:44 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 30 Nov 2009 12:41:44 -0500 Subject: [Bioperl-l] How to parse BLAST output - all hits of each queryinnew file In-Reply-To: References: <4B0D6C24.2080308@gmail.com> <53DE480F205E42CE8D2B9421592AAF0E@NewLife> <815D2A47BC9C4D89B8DEF0B10DA9EAF8@NewLife> <4B0EA44D.2050507@gmail.com> <18DF7D20DFEC044098A1062202F5FFF32B630E6C53@exchsth.agresearch.co.nz> <52D67F20A9CB4953B86FF794ADE0BE96@NewLife> <18DF7D20DFEC044098A1062202F5FFF32B630E6D05@exchsth.agresearch.co.nz> Message-ID: <8C288FEF9CEB4055B0CDD19267FBA26C@NewLife> thanks Tim! corrected (I hope) in r16432... MAJ ----- Original Message ----- From: Tim Koehler To: Smithies, Russell Cc: Mark A. Jensen ; bioperl-l at lists.open-bio.org Sent: Monday, November 30, 2009 12:23 PM Subject: Re: [Bioperl-l] How to parse BLAST output - all hits of each queryinnew file Hello everybody, thanks a lot for the overwhelming answers! All these codes are different flavors and worked all. For me the added code works the best. But I think I found a bug in ...Bio/SearchIO/blast.pm. There the DEFAULT_BLAST_... variable is set to Bio::Search::Writer::HitTableWriter instead of Bio::SearchIO::Writer::HitTableWriter. This variable I changed also to HTMLResultWriter and others. So again: THANKS for the support! Cheers, Tim #!/usr/bin/perl -w use strict; use Bio::Tools::Run::StandAloneBlast; use Bio::SeqIO; use Bio::SearchIO; ### add here the writer you want use Bio::SearchIO::Writer::HitTableWriter; use Bio::Search::Result::BlastResult; use Data::Dumper; my $Seq_in = Bio::SeqIO->new( -file => "/home/koehler/Programs/for_BLAST/1_to_BLAST_two_seq.fasta", -format => "fasta" ); while ( my $query = $Seq_in->next_seq() ) { warn "Processing ",$query->id, "\n"; my $factory = Bio::Tools::Run::StandAloneBlast->new( program => "blastn", database => "/home/koehler/Programs/for_BLAST/BLAST_Pipeline/3_BLAST_db", _READMETHOD => "Blast" ); my $blast_report = $factory->blastall($query); sleep 5; # just write the result we got for this query into a #new blast-formatted file...named after the id of the query seq... my $result = $blast_report->next_result; my $blio = Bio::SearchIO->new( -file => ">".$query->id.".bls", -format => "blast" ) or die $!; $blio->write_result($result); # below, just looking at the current blast result ###this does not appear in the output files while ( my $result = $blast_report->next_result ) { ## $result is a Bio::Search::Result::ResultI compliant object while ( my $hit = $result->next_hit ) { ## $hit is a Bio::Search::Hit::HitI compliant object while ( my $hsp = $hit->next_hsp ) { ## $hsp is a Bio::Search::HSP::HSPI compliant object if ( $hsp->length('total') > 50 ) { if ( $hsp->percent_identity >= 75 ) { print "Query= ", $result->query_name, "Hit= ", $hit->name, "Length= ", $hsp->length('total'), "Percent_id= ", $hsp->percent_identity, "Subject=", $hsp->hit_string, "\n"; } } } } } } On Sun, Nov 29, 2009 at 11:29 PM, Smithies, Russell wrote: Changed it to a generic result and added a writer and it seems tio work: foreach my $qid ( keys %hits_by_query ) { warn "qid = $qid\n"; my $res = Bio::Search::Result::GenericResult->new(-algorithm => "blastn") or die $!; # print Dumper $res; foreach my $h ( @{ $hits_by_query{$qid} } ){ warn "adding hit ", $h->name, "\n"; $res->add_hit($h) if defined($h); } my $writerhtml = Bio::SearchIO::Writer::HTMLResultWriter->new(); my $blio = Bio::SearchIO->new(-writer => $writerhtml, -file => ">$qid\.bls\.html", -format => "blast" ) or die $!; $blio->write_result($res); } From: Mark A. Jensen [mailto:maj at fortinbras.us] Sent: Monday, 30 November 2009 10:19 a.m. To: Smithies, Russell; 'Tim Koehler' Subject: Re: [Bioperl-l] How to parse BLAST output - all hits of each queryinnew file My thought here was that since Tim's already going one at a time thru his queries, my scrap was not really necessary: use strict; use Bio::Tools::Run::StandAloneBlast; use Bio::SeqIO; use Bio::SearchIO; use Bio::Search::Result::BlastResult; use Data::Dumper; my $Seq_in = Bio::SeqIO->new( -file => "sequences.fasta", -format => "fasta" ); while ( my $query = $Seq_in->next_seq() ) { warn "Processing ",$query->id, "\n"; my $factory = Bio::Tools::Run::StandAloneBlast->new( program => "blastn", database => "/data/databases/flatfile/illuminati_blastdata/nt", _READMETHOD => "Blast" ); my $blast_report = $factory->blastall($query); sleep 5; # just write the result we got for this query into a #new blast-formatted file...named after the id of the query seq... my $result = $blast_report->next_result; my $blio = Bio::SearchIO->new( -file => ">".$query->id.".bls", -format => "blast" ) or die $!; $blio->write_result($result); # below, just looking at the current blast result while ( my $result = $blast_report->next_result ) { ## $result is a Bio::Search::Result::ResultI compliant object while ( my $hit = $result->next_hit ) { ## $hit is a Bio::Search::Hit::HitI compliant object while ( my $hsp = $hit->next_hsp ) { ## $hsp is a Bio::Search::HSP::HSPI compliant object if ( $hsp->length('total') > 50 ) { if ( $hsp->percent_identity >= 75 ) { print "Query= ", $result->query_name, "Hit= ", $hit->name, "Length= ", $hsp->length('total'), "Percent_id= ", $hsp->percent_identity, "Subject=", $hsp->hit_string, "\n"; } } } } } } ----- Original Message ----- From: Smithies, Russell To: 'Tim Koehler' ; 'maj at fortinbras.us' Sent: Sunday, November 29, 2009 3:58 PM Subject: RE: [Bioperl-l] How to parse BLAST output - all hits of each queryinnew file Hi Tim With various people writing the ?howtos? and other docs, the examples are bound to have differing names for the variables used but as long as you?re consistent, it should all fit together. I think I?ve almost got your code working, just getting errors from Bio::Search::Result::BlastResult which I?m not entirely sure how to use. Perhaps Mark can get this bit going? --Russell =============================== use strict; use Bio::Tools::Run::StandAloneBlast; use Bio::SeqIO; use Bio::SearchIO; use Bio::Search::Result::BlastResult; use Data::Dumper; my $Seq_in = Bio::SeqIO->new( -file => "sequences.fasta", -format => "fasta" ); while ( my $query = $Seq_in->next_seq() ) { warn "Processing ",$query->id, "\n"; my $factory = Bio::Tools::Run::StandAloneBlast->new( program => "blastn", database => "/data/databases/flatfile/illuminati_blastdata/nt", _READMETHOD => "Blast" ); my $blast_report = $factory->blastall($query); sleep 5; my %hits_by_query; while ( my $result = $blast_report->next_result ) { foreach my $hit ( $result->hits ) { warn "Pushed a hit for ",$hit->name, "\n"; push( @{ $hits_by_query{ $hit->name } }, $hit ); } } foreach my $qid ( keys %hits_by_query ) { warn "qid = $qid\n"; my $res = Bio::Search::Result::BlastResult->new() or die $!; print Dumper $res; foreach my $h ( @{ $hits_by_query{$qid} } ){ warn "adding hit ", $h->name, "\n"; $res->add_hit($h) if defined($h); } my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", -format => "blast" ) or die $!; $blio->write_result($res); } while ( my $result = $blast_report->next_result ) { ## $result is a Bio::Search::Result::ResultI compliant object while ( my $hit = $result->next_hit ) { ## $hit is a Bio::Search::Hit::HitI compliant object while ( my $hsp = $hit->next_hsp ) { ## $hsp is a Bio::Search::HSP::HSPI compliant object if ( $hsp->length('total') > 50 ) { if ( $hsp->percent_identity >= 75 ) { print "Query= ", $result->query_name, "Hit= ", $hit->name, "Length= ", $hsp->length('total'), "Percent_id= ", $hsp->percent_identity, "Subject=", $hsp->hit_string, "\n"; } } } } } } =============================== From: Tim Koehler [mailto:timbourine81 at googlemail.com] Sent: Friday, 27 November 2009 10:24 p.m. To: Smithies, Russell; maj at fortinbras.us Subject: Re: [Bioperl-l] How to parse BLAST output - all hits of each queryinnew file Hey guys, please, do not get me wrong that I wanted to put the workload on you. So far I only found the HowTo's but in there in some way the language changed with time (e.g. $in to $Seq_in) or some things I simply could not find. Now I got a tip where else to search: the scrapbook and deobfuscator. I immediately will have a look at that. This is the first time for me touching linux / perl commands; that's why I thought after several days of trial and many errors ;) asking the mailinglist. I was very happy about your fast answers! Cheers and a nice weekend, Tim On Thu, Nov 26, 2009 at 5:02 PM, Tim Koehler wrote: ups, sent too early... Hey Mark, thanks for the answer. But I am still struggling, especially where to put in your code. Here ist the code I have, so far: #!/usr/bin/perl -w ### should I put your code here as push is a perl command? my %hits_by_query; for ($result->hits) { ### I inserted a comma after name}}; if there is no comma, there was the error: Scalar found where operator expected at 12_BLAST_two_sequence_each_query_one_file.PL line7, near "} $hit" ### (Missing operator before $hit?) ###Useless use of push with no values at 12_BLAST_two_sequence_each_query_one_file.PL line 7. ###syntax error at 12_BLAST_two_sequence_each_query_one_file.PL line 7, near "} $hit" ###BEGIN not safe after errors--compilation aborted at 12_BLAST_two_sequence_each_query_one_file.PL line 13. push @{$hits_by_query{$hit->name}}, $hit; ###here, every time this terror appears: Name "main::result" used only once: possible typo at 12_BLAST_two_sequence_each_query_one_file.PL line 5. ###error: Can't call method "hits" on an undefined value at 12_BLAST_two_sequence_each_query_one_file.PL line 5. } use strict; use Bio::Tools::Run::StandAloneBlast; use Bio::SeqIO; use Bio::SearchIO; use Bio::Search::Result::BlastResult; my $Seq_in = Bio::SeqIO->new ( -file => "/home/koehler/Programs/for_BLAST/BLAST_Pipeline/1_to_BLAST_two_seq.fasta", -format => 'fasta' ); while (my $query = $Seq_in->next_seq()) { my $factory = Bio::Tools::Run::StandAloneBlast->new( 'program' => 'blastn', 'database' => '/home/koehler/Programs/for_BLAST/BLAST_Pipeline/3_BLAST_db', _READMETHOD => "Blast" ); my $blast_report = $factory->blastall($query); ### Should I need to use a module? are the commands here at the right position? errors, e.g., Global symbol "$hit" requires explicit package name #my %hits_by_query; #for ($result->hits) { ### inserted comma after name}} # push @{$hits_by_query{$hit->name}}, $hit; #} foreach my $qid ( keys %hits_by_query ) { my $result = Bio::Search::Result::BlastResult->new(); $result->add_hit($_) for ( @{$hits_by_query{$qid}} ); my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", -format=>'blast' ); $blio->write_result($result); } ###where are the files stored? what is their name. Sorry, but I cannot get behind that :( while( my $result = $blast_report->next_result ) { ## $result is a Bio::Search::Result::ResultI compliant object while( my $hit = $result->next_hit ) { ## $hit is a Bio::Search::Hit::HitI compliant object while( my $hsp = $hit->next_hsp ) { ## $hsp is a Bio::Search::HSP::HSPI compliant object if( $hsp->length('total') > 50 ) { if ( $hsp->percent_identity >= 75 ) { print "Query= ", $result->query_name, "Hit= ", $hit->name, "Length= ", $hsp->length('total'), "Percent_id= ", $hsp->percent_identity, "Subject=", $hsp->hit_string,"\n"; } } } } } } Again, a big thanks in advance :) All the best, Tim On Thu, Nov 26, 2009 at 4:52 PM, Tim wrote: Hey Mark, thanks for the answer On 25.11.2009 20:21, Mark A. Jensen wrote: > whoops: change the following line: > my $blio = Bio::SearchIO->new( -file => $qid.".bls", -format=>'blast' ); > > to > > my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", -format=>'blast' ); > > (I always forget that...) > MAJ > > ----- Original Message ----- From: "Mark A. Jensen" > To: "Tim" ; > Sent: Wednesday, November 25, 2009 1:20 PM > Subject: Re: [Bioperl-l] How to parse BLAST output - all hits of each > queryinnew file > > >> hey Tim-- >> >> Sound like you need to go about collecting your queries inside out: >> >> my %hits_by_query; >> for ($result->hits) { >> push @{$hits_by_query{$hit->name}} $hit; >> } >> >> I believe now each hash element, keyed by the query name, will contain >> an arrayref to the set of hits assoc with that query. >>> From here, I believe >> >> use Bio::Search::Result::BlastResult; >> use Bio::SearchIO; >> >> foreach my $qid ( keys %hits_by_query ) { >> my $result = Bio::Search::Result::BlastResult->new(); >> $result->add_hit($_) for ( @{$hits_by_query{$qid}} ); >> my $blio = Bio::SearchIO->new( -file => $qid.".bls", -format=>'blast' ); >> $blio->write_result($result); >> } >> >> will do what you want. >> >> hope this helps - >> Mark >> >> ----- Original Message ----- From: "Tim" >> To: >> Sent: Wednesday, November 25, 2009 12:40 PM >> Subject: [Bioperl-l] How to parse BLAST output - all hits of each >> query innew file >> >> >>> Dear bioperl users, >>> >>> I am a real newbie and have - maybe a very trivial - question. >>> >>> I searched the mailing list archive and many howtos but I have not found >>> a concrete answer to my problem. So hopefully you can help me :) >>> >>> Background: I use the latest Bioperl version (installed it two weeks >>> before). >>> When I use Bio::Tools::Run::StandAloneBlast to BLAST one fasta file >>> including different sequences, I get a BLAST output with many queries >>> each having several hits / sbjcts. >>> >>> My problem is how to parse *all* hits of *one* query into a single new >>> file. And this for all the queries I have in my BLAST output file. >>> >>> Or is it better the other way round; first to make fasta files with only >>> single sequences inside and BLAST each file? But how can I automize that >>> using Bioperl? >>> >>> I tried Bio::SearchIO but can only parse all queries and their >>> respective hits in only one file... >>> I think iteration is also necessary here, but I do not really know how >>> to include that into Bio::SearchIO. >>> Or do I have to use Module:Bio::Index::Blast? >>> >>> I can index a file (see below), but I have no idea what comes next... >>> >>> ###How I index a file... >>> >>> #!/usr/bin/perl -w >>> >>> $ENV{BIOPERL_INDEX_TYPE} = "SDBM_File"; >>> >>> use Bio::Index::Fasta; >>> >>> >>> $file_name = "8_to_BLAST_two_seq_index.fasta"; >>> $id = "48882"; >>> $inx = Bio::Index::Fasta->new (-filename => $file_name . ".idx", >>> -write_flag => 1); >>> $inx->make_index($file_name); >>> >>> >>> Hopefully, you can give me at least hints what to look for. >>> >>> A big THANKS in advance! >>> >>> Cheers, >>> >>> Tim >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > Tim K?hler MPI for Terrestrial Microbiology Karl-von-Frisch-Stra?e D-35043 Marburg / Germany Email: koehlerd at mpi-marburg.mpg.de Phone: +49 6421 178-740 Fax: +49 6421 178-999 -------------------------------------------------------------------------- Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. -------------------------------------------------------------------------- From timbourine81 at googlemail.com Mon Nov 30 12:23:58 2009 From: timbourine81 at googlemail.com (Tim Koehler) Date: Mon, 30 Nov 2009 18:23:58 +0100 Subject: [Bioperl-l] How to parse BLAST output - all hits of each queryinnew file In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32B630E6D05@exchsth.agresearch.co.nz> References: <4B0D6C24.2080308@gmail.com> <53DE480F205E42CE8D2B9421592AAF0E@NewLife> <815D2A47BC9C4D89B8DEF0B10DA9EAF8@NewLife> <4B0EA44D.2050507@gmail.com> <18DF7D20DFEC044098A1062202F5FFF32B630E6C53@exchsth.agresearch.co.nz> <52D67F20A9CB4953B86FF794ADE0BE96@NewLife> <18DF7D20DFEC044098A1062202F5FFF32B630E6D05@exchsth.agresearch.co.nz> Message-ID: Hello everybody, thanks a lot for the overwhelming answers! All these codes are different flavors and worked all. For me the added code works the best. But I think I found a bug in ...Bio/SearchIO/blast.pm. There the DEFAULT_BLAST_... variable is set to Bio::Search::Writer::HitTableWriter instead of Bio::SearchIO::Writer::HitTableWriter. This variable I changed also to HTMLResultWriter and others. So again: THANKS for the support! Cheers, Tim #!/usr/bin/perl -w use strict; use Bio::Tools::Run::StandAloneBlast; use Bio::SeqIO; use Bio::SearchIO; ### add here the writer you want use Bio::SearchIO::Writer::HitTableWriter; use Bio::Search::Result::BlastResult; use Data::Dumper; my $Seq_in = Bio::SeqIO->new( -file => "/home/koehler/Programs/for_BLAST/1_to_BLAST_two_seq.fasta", -format => "fasta" ); while ( my $query = $Seq_in->next_seq() ) { warn "Processing ",$query->id, "\n"; my $factory = Bio::Tools::Run::StandAloneBlast->new( program => "blastn", database => "/home/koehler/Programs/for_BLAST/BLAST_Pipeline/3_BLAST_db", _READMETHOD => "Blast" ); my $blast_report = $factory->blastall($query); sleep 5; # just write the result we got for this query into a #new blast-formatted file...named after the id of the query seq... my $result = $blast_report->next_result; my $blio = Bio::SearchIO->new( -file => ">".$query->id.".bls", -format => "blast" ) or die $!; $blio->write_result($result); # below, just looking at the current blast result ###this does not appear in the output files while ( my $result = $blast_report->next_result ) { ## $result is a Bio::Search::Result::ResultI compliant object while ( my $hit = $result->next_hit ) { ## $hit is a Bio::Search::Hit::HitI compliant object while ( my $hsp = $hit->next_hsp ) { ## $hsp is a Bio::Search::HSP::HSPI compliant object if ( $hsp->length('total') > 50 ) { if ( $hsp->percent_identity >= 75 ) { print "Query= ", $result->query_name, "Hit= ", $hit->name, "Length= ", $hsp->length('total'), "Percent_id= ", $hsp->percent_identity, "Subject=", $hsp->hit_string, "\n"; } } } } } } On Sun, Nov 29, 2009 at 11:29 PM, Smithies, Russell < Russell.Smithies at agresearch.co.nz> wrote: > Changed it to a generic result and added a writer and it seems tio work: > > > > foreach my $qid ( keys %hits_by_query ) { > > warn "qid = $qid\n"; > > my $res = Bio::Search::Result::GenericResult->new(-algorithm => > "blastn") or die $!; > > # print Dumper $res; > > foreach my $h ( @{ $hits_by_query{$qid} } ){ > > warn "adding hit ", $h->name, "\n"; > > $res->add_hit($h) if defined($h); > > } > > my $writerhtml = Bio::SearchIO::Writer::HTMLResultWriter->new(); > > my $blio = Bio::SearchIO->new(-writer => $writerhtml, -file => > ">$qid\.bls\.html", -format => "blast" ) or die $!; > > $blio->write_result($res); > > } > > > > > > *From:* Mark A. Jensen [mailto:maj at fortinbras.us] > *Sent:* Monday, 30 November 2009 10:19 a.m. > *To:* Smithies, Russell; 'Tim Koehler' > > *Subject:* Re: [Bioperl-l] How to parse BLAST output - all hits of each > queryinnew file > > > > My thought here was that since Tim's already going one at a time thru > > his queries, my scrap was not really necessary: > > > > use strict; > > use Bio::Tools::Run::StandAloneBlast; > > use Bio::SeqIO; > > use Bio::SearchIO; > > use Bio::Search::Result::BlastResult; > > > > use Data::Dumper; > > > > my $Seq_in = Bio::SeqIO->new( -file => "sequences.fasta", > > -format => "fasta" ); > > > > while ( my $query = $Seq_in->next_seq() ) { > > warn "Processing ",$query->id, "\n"; > > my $factory = > > Bio::Tools::Run::StandAloneBlast->new( > > program => "blastn", > > database => > "/data/databases/flatfile/illuminati_blastdata/nt", > > _READMETHOD => "Blast" > > ); > > > > my $blast_report = $factory->blastall($query); > > sleep 5; > > > > # just write the result we got for this query into a > > #new blast-formatted file...named after the id of the query seq... > > my $result = $blast_report->next_result; > > my $blio = Bio::SearchIO->new( -file => ">".$query->id.".bls", -format => > "blast" ) or die $!; > > $blio->write_result($result); > > > > # below, just looking at the current blast result > > while ( my $result = $blast_report->next_result ) { > > ## $result is a Bio::Search::Result::ResultI compliant object > > while ( my $hit = $result->next_hit ) { > > ## $hit is a Bio::Search::Hit::HitI compliant object > > while ( my $hsp = $hit->next_hsp ) { > > ## $hsp is a Bio::Search::HSP::HSPI compliant object > > if ( $hsp->length('total') > 50 ) { > > if ( $hsp->percent_identity >= 75 ) { > > print "Query= ", $result->query_name, > > "Hit= ", $hit->name, > > "Length= ", $hsp->length('total'), > > "Percent_id= ", $hsp->percent_identity, > > "Subject=", $hsp->hit_string, "\n"; > > } > > } > > } > > } > > } > > } > > ----- Original Message ----- > > *From:* Smithies, Russell > > *To:* 'Tim Koehler' ; 'maj at fortinbras.us'<%27maj at fortinbras.us%27> > > *Sent:* Sunday, November 29, 2009 3:58 PM > > *Subject:* RE: [Bioperl-l] How to parse BLAST output - all hits of each > queryinnew file > > > > Hi Tim > > With various people writing the ?howtos? and other docs, the examples are > bound to have differing names for the variables used but as long as you?re > consistent, it should all fit together. > > > > I think I?ve almost got your code working, just getting errors from > Bio::Search::Result::BlastResult which I?m not entirely sure how to use. > Perhaps Mark can get this bit going? > > > > --Russell > > =============================== > > > > use strict; > > use Bio::Tools::Run::StandAloneBlast; > > use Bio::SeqIO; > > use Bio::SearchIO; > > use Bio::Search::Result::BlastResult; > > > > use Data::Dumper; > > > > my $Seq_in = Bio::SeqIO->new( -file => "sequences.fasta", > > -format => "fasta" ); > > > > while ( my $query = $Seq_in->next_seq() ) { > > warn "Processing ",$query->id, "\n"; > > my $factory = > > Bio::Tools::Run::StandAloneBlast->new( > > program => "blastn", > > database => > "/data/databases/flatfile/illuminati_blastdata/nt", > > _READMETHOD => "Blast" > > ); > > > > my $blast_report = $factory->blastall($query); > > sleep 5; > > > > > > my %hits_by_query; > > > > while ( my $result = $blast_report->next_result ) { > > foreach my $hit ( $result->hits ) { > > warn "Pushed a hit for ",$hit->name, "\n"; > > push( @{ $hits_by_query{ $hit->name } }, $hit ); > > } > > } > > > > foreach my $qid ( keys %hits_by_query ) { > > warn "qid = $qid\n"; > > my $res = Bio::Search::Result::BlastResult->new() or die $!; > > print Dumper $res; > > foreach my $h ( @{ $hits_by_query{$qid} } ){ > > warn "adding hit ", $h->name, "\n"; > > $res->add_hit($h) if defined($h); > > } > > my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", -format => > "blast" ) or die $!; > > $blio->write_result($res); > > } > > > > while ( my $result = $blast_report->next_result ) { > > ## $result is a Bio::Search::Result::ResultI compliant object > > while ( my $hit = $result->next_hit ) { > > ## $hit is a Bio::Search::Hit::HitI compliant object > > while ( my $hsp = $hit->next_hsp ) { > > ## $hsp is a Bio::Search::HSP::HSPI compliant object > > if ( $hsp->length('total') > 50 ) { > > if ( $hsp->percent_identity >= 75 ) { > > print "Query= ", $result->query_name, > > "Hit= ", $hit->name, > > "Length= ", $hsp->length('total'), > > "Percent_id= ", $hsp->percent_identity, > > "Subject=", $hsp->hit_string, "\n"; > > } > > } > > } > > } > > } > > } > > =============================== > > > > *From:* Tim Koehler [mailto:timbourine81 at googlemail.com] > *Sent:* Friday, 27 November 2009 10:24 p.m. > *To:* Smithies, Russell; maj at fortinbras.us > *Subject:* Re: [Bioperl-l] How to parse BLAST output - all hits of each > queryinnew file > > > > Hey guys, > > please, do not get me wrong that I wanted to put the workload on you. So > far I only found the HowTo's but in there in some way the language changed > with time (e.g. $in to $Seq_in) or some things I simply could not find. > Now I got a tip where else to search: the scrapbook and deobfuscator. > > I immediately will have a look at that. > > This is the first time for me touching linux / perl commands; that's why I > thought after several days of trial and many errors ;) asking the > mailinglist. > > I was very happy about your fast answers! > > Cheers and a nice weekend, > > Tim > > On Thu, Nov 26, 2009 at 5:02 PM, Tim Koehler > wrote: > > ups, sent too early... > > Hey Mark, > > thanks for the answer. But I am still struggling, especially where to put > in your code. > > Here ist the code I have, so far: > > #!/usr/bin/perl -w > > ### should I put your code here as push is a perl command? > > > my %hits_by_query; > for ($result->hits) { > > ### I inserted a comma after name}}; if there is no comma, there was the > error: Scalar found where operator expected at > 12_BLAST_two_sequence_each_query_one_file.PL line7, near "} $hit" > ### (Missing operator before $hit?) > ###Useless use of push with no values at > 12_BLAST_two_sequence_each_query_one_file.PL line 7. > ###syntax error at 12_BLAST_two_sequence_each_query_one_file.PL line 7, > near "} $hit" > ###BEGIN not safe after errors--compilation aborted at > 12_BLAST_two_sequence_each_query_one_file.PL line 13. > > > push @{$hits_by_query{$hit->name}}, $hit; > > ###here, every time this terror appears: Name "main::result" used only > once: possible typo at 12_BLAST_two_sequence_each_query_one_file.PL line 5. > ###error: Can't call method "hits" on an undefined value at > 12_BLAST_two_sequence_each_query_one_file.PL line 5. > > > } > > > use strict; > use Bio::Tools::Run::StandAloneBlast; > use Bio::SeqIO; > use Bio::SearchIO; > > use Bio::Search::Result::BlastResult; > > my $Seq_in = Bio::SeqIO->new ( > -file => > "/home/koehler/Programs/for_BLAST/BLAST_Pipeline/1_to_BLAST_two_seq.fasta", > -format => 'fasta' > ); > while (my $query = $Seq_in->next_seq()) { > > > my $factory = Bio::Tools::Run::StandAloneBlast->new( > > 'program' => 'blastn', > 'database' => '/home/koehler/Programs/for_BLAST/BLAST_Pipeline/3_BLAST_db', > _READMETHOD => "Blast" > ); > > my $blast_report = $factory->blastall($query); > > ### Should I need to use a module? are the commands here at the right > position? errors, e.g., Global symbol "$hit" requires explicit package name > #my %hits_by_query; > #for ($result->hits) { > ### inserted comma after name}} > # push @{$hits_by_query{$hit->name}}, $hit; > #} > > > > foreach my $qid ( keys %hits_by_query ) { > my $result = Bio::Search::Result::BlastResult->new(); > $result->add_hit($_) for ( @{$hits_by_query{$qid}} ); > my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", -format=>'blast' ); > $blio->write_result($result); > } > > ###where are the files stored? what is their name. Sorry, but I cannot get > behind that :( > > while( my $result = $blast_report->next_result ) { > ## $result is a Bio::Search::Result::ResultI compliant object > > > while( my $hit = $result->next_hit ) { > > ## $hit is a Bio::Search::Hit::HitI compliant object > > > while( my $hsp = $hit->next_hsp ) { > > ## $hsp is a Bio::Search::HSP::HSPI compliant object > if( $hsp->length('total') > 50 ) { > if ( $hsp->percent_identity >= 75 ) { > print "Query= ", $result->query_name, > "Hit= ", $hit->name, > "Length= ", $hsp->length('total'), > "Percent_id= ", $hsp->percent_identity, > "Subject=", $hsp->hit_string,"\n"; > } > } > } > } > } > } > > Again, a big thanks in advance :) > > All the best, > > Tim > > On Thu, Nov 26, 2009 at 4:52 PM, Tim wrote: > > Hey Mark, > > thanks for the answer > > > > > On 25.11.2009 20:21, Mark A. Jensen wrote: > > whoops: change the following line: > > my $blio = Bio::SearchIO->new( -file => $qid.".bls", -format=>'blast' ); > > > > to > > > > my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", -format=>'blast' ); > > > > (I always forget that...) > > MAJ > > > > ----- Original Message ----- From: "Mark A. Jensen" > > To: "Tim" ; > > Sent: Wednesday, November 25, 2009 1:20 PM > > Subject: Re: [Bioperl-l] How to parse BLAST output - all hits of each > > queryinnew file > > > > > >> hey Tim-- > >> > >> Sound like you need to go about collecting your queries inside out: > >> > >> my %hits_by_query; > >> for ($result->hits) { > >> push @{$hits_by_query{$hit->name}} $hit; > >> } > >> > >> I believe now each hash element, keyed by the query name, will contain > >> an arrayref to the set of hits assoc with that query. > >>> From here, I believe > >> > >> use Bio::Search::Result::BlastResult; > >> use Bio::SearchIO; > >> > >> foreach my $qid ( keys %hits_by_query ) { > >> my $result = Bio::Search::Result::BlastResult->new(); > >> $result->add_hit($_) for ( @{$hits_by_query{$qid}} ); > >> my $blio = Bio::SearchIO->new( -file => $qid.".bls", -format=>'blast' > ); > >> $blio->write_result($result); > >> } > >> > >> will do what you want. > >> > >> hope this helps - > >> Mark > >> > >> ----- Original Message ----- From: "Tim" > >> To: > >> Sent: Wednesday, November 25, 2009 12:40 PM > >> Subject: [Bioperl-l] How to parse BLAST output - all hits of each > >> query innew file > >> > >> > >>> Dear bioperl users, > >>> > >>> I am a real newbie and have - maybe a very trivial - question. > >>> > >>> I searched the mailing list archive and many howtos but I have not > found > >>> a concrete answer to my problem. So hopefully you can help me :) > >>> > >>> Background: I use the latest Bioperl version (installed it two weeks > >>> before). > >>> When I use Bio::Tools::Run::StandAloneBlast to BLAST one fasta file > >>> including different sequences, I get a BLAST output with many queries > >>> each having several hits / sbjcts. > >>> > >>> My problem is how to parse *all* hits of *one* query into a single new > >>> file. And this for all the queries I have in my BLAST output file. > >>> > >>> Or is it better the other way round; first to make fasta files with > only > >>> single sequences inside and BLAST each file? But how can I automize > that > >>> using Bioperl? > >>> > >>> I tried Bio::SearchIO but can only parse all queries and their > >>> respective hits in only one file... > >>> I think iteration is also necessary here, but I do not really know how > >>> to include that into Bio::SearchIO. > >>> Or do I have to use Module:Bio::Index::Blast? > >>> > >>> I can index a file (see below), but I have no idea what comes next... > >>> > >>> ###How I index a file... > >>> > >>> #!/usr/bin/perl -w > >>> > >>> $ENV{BIOPERL_INDEX_TYPE} = "SDBM_File"; > >>> > >>> use Bio::Index::Fasta; > >>> > >>> > >>> $file_name = "8_to_BLAST_two_seq_index.fasta"; > >>> $id = "48882"; > >>> $inx = Bio::Index::Fasta->new (-filename => $file_name . ".idx", > >>> -write_flag => 1); > >>> $inx->make_index($file_name); > >>> > >>> > >>> Hopefully, you can give me at least hints what to look for. > >>> > >>> A big THANKS in advance! > >>> > >>> Cheers, > >>> > >>> Tim > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >>> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > > > > Tim K?hler > MPI for Terrestrial Microbiology > Karl-von-Frisch-Stra?e > D-35043 Marburg / Germany > > Email: koehlerd at mpi-marburg.mpg.de > Phone: +49 6421 178-740 > Fax: +49 6421 178-999 > > > > > ------------------------------ > > *Attention: *The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities to > which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ------------------------------ > > > > From maj at fortinbras.us Mon Nov 2 04:47:15 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sun, 1 Nov 2009 23:47:15 -0500 Subject: [Bioperl-l] annotations Message-ID: <5150801225E0484D95DC51B2D00AE519@NewLife> I'm cogitating on features and annotations. For a RichSeq, one gets the set of annotations by $seq->annotation->get_Annotations while getting features by $seq->get_Features Is there a reason not to have a method in SeqI sub get_Annotations { shift->annotation->get_Annotations } to allow a user to do what seems natural from a user's perspective, viz. $seq->get_Annotations? I imagine this might save hundreds of hours of frustration, integrated over all newbies. MAJ From cjfields at illinois.edu Mon Nov 2 13:08:54 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 2 Nov 2009 07:08:54 -0600 Subject: [Bioperl-l] annotations In-Reply-To: <5150801225E0484D95DC51B2D00AE519@NewLife> References: <5150801225E0484D95DC51B2D00AE519@NewLife> Message-ID: <6920A9E1-D221-4CF8-9866-0ADBDB254C19@illinois.edu> On Nov 1, 2009, at 10:47 PM, Mark A. Jensen wrote: > I'm cogitating on features and annotations. For a RichSeq, one gets > the set of annotations by > > $seq->annotation->get_Annotations > > while getting features by > > $seq->get_Features > > Is there a reason not to have a method in SeqI > > sub get_Annotations { shift->annotation->get_Annotations } > > to allow a user to do what seems natural from a user's perspective, > viz. $seq->get_Annotations? I imagine this might save hundreds of > hours of frustration, integrated over all newbies. > MAJ One could add the methods to delegate to annotation() (that's essentially what I'm planning on doing for Biome). chris From kiekyon.huang at gmail.com Tue Nov 3 15:14:39 2009 From: kiekyon.huang at gmail.com (Kie Kyon Huang) Date: Tue, 3 Nov 2009 23:14:39 +0800 Subject: [Bioperl-l] render_blast problem Message-ID: Hi, I was trying to follow the HOWTO:Graphics at http://www.bioperl.org/wiki/HOWTO:Graphics When running the command line in cygwin $ perl render_blast1.pl data1.txt | display - I get the following error line, bash: display: command not found I also tried $ perl render_blast1.pl data1.txt > data1.png however, I was unable to open the data1.png file using Microsoft Office Picture Manager or windows Photo Gallery Thanks Huang From biopython at maubp.freeserve.co.uk Tue Nov 3 15:45:37 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 3 Nov 2009 15:45:37 +0000 Subject: [Bioperl-l] render_blast problem In-Reply-To: References: Message-ID: <320fb6e00911030745s68331ef7n729505f460863e21@mail.gmail.com> On Tue, Nov 3, 2009 at 3:14 PM, Kie Kyon Huang wrote: > Hi, > > I was trying to follow the HOWTO:Graphics at > http://www.bioperl.org/wiki/HOWTO:Graphics > > When running the command line in cygwin > > $ perl render_blast1.pl data1.txt | display - > > I get the following error line, > > bash: display: command not found That makes sense on Windows, since display is a Unix command line tool. > I also tried > > $ perl render_blast1.pl data1.txt > data1.png Based on the wiki, I think that ought to have worked. > however, I was unable to open the data1.png file using Microsoft > Office Picture Manager or windows Photo Gallery Did you do this step?: >> Important! If you are on a Windows platform, you need to put >> STDOUT into binary mode so that the PNG file does not go >> through Window's carriage return/linefeed transformations. >> Before the final print statement, put the statement >> binmode(STDOUT). This advice also applies to certain older >> versions of RedHat, which ship with a patched (and possibly >> broken) version of Perl. (BioPerl devs - couldn't that be added to the default render_blast1.pl script with an if statement checking for Windows?) Peter From biopython at maubp.freeserve.co.uk Tue Nov 3 16:04:59 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 3 Nov 2009 16:04:59 +0000 Subject: [Bioperl-l] render_blast problem In-Reply-To: References: <320fb6e00911030745s68331ef7n729505f460863e21@mail.gmail.com> Message-ID: <320fb6e00911030804r62e50da6w373bbb61e9823f28@mail.gmail.com> Mailing list CC'd - solved :) On Tue, Nov 3, 2009 at 3:55 PM, Kie Kyon Huang wrote: > > ok, that fix it > i forget sometimes what platform am i on. > thanks Great. Peter From amackey at virginia.edu Tue Nov 3 17:09:00 2009 From: amackey at virginia.edu (Aaron Mackey) Date: Tue, 3 Nov 2009 12:09:00 -0500 Subject: [Bioperl-l] svn errors? Message-ID: <24c96eca0911030909p7cfbf858h4de5a345cf8a0782@mail.gmail.com> [ajm6q at lc4 bioperl-live]$ svn update svn: Decompression of svndiff data failed I'll admit to not having svn updated in awhile; A clean, anonymous svn co failed with the same message: [...] A bioperl-live/Bio/Structure/StructureI.pm A bioperl-live/Bio/Structure/IO svn: Decompression of svndiff data failed -Aaron P.S. I used this command: svn co svn:// code.open-bio.org/bioperl/bioperl-live/trunk bioperl-live From cjfields at illinois.edu Tue Nov 3 17:17:10 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 3 Nov 2009 11:17:10 -0600 Subject: [Bioperl-l] svn errors? In-Reply-To: <24c96eca0911030909p7cfbf858h4de5a345cf8a0782@mail.gmail.com> References: <24c96eca0911030909p7cfbf858h4de5a345cf8a0782@mail.gmail.com> Message-ID: <8C5FC42D-F957-45AC-9AAC-876ACC9D77E0@illinois.edu> Aaron, Yep, this was reported to support (a couple of users on #bioperl reported the same problem). Chris D. is looking into it. I'm wondering if it's worth setting up a second mirror to github for this purpose. chris On Nov 3, 2009, at 11:09 AM, Aaron Mackey wrote: > [ajm6q at lc4 bioperl-live]$ svn update > svn: Decompression of svndiff data failed > > > I'll admit to not having svn updated in awhile; A clean, anonymous > svn co > failed with the same message: > > [...] > A bioperl-live/Bio/Structure/StructureI.pm > A bioperl-live/Bio/Structure/IO > svn: Decompression of svndiff data failed > > -Aaron > > P.S. I used this command: svn co svn:// > code.open-bio.org/bioperl/bioperl-live/trunk bioperl-live > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Nov 3 17:19:56 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 3 Nov 2009 11:19:56 -0600 Subject: [Bioperl-l] render_blast problem In-Reply-To: <320fb6e00911030745s68331ef7n729505f460863e21@mail.gmail.com> References: <320fb6e00911030745s68331ef7n729505f460863e21@mail.gmail.com> Message-ID: <8336341C-C7B4-4740-A7C3-E2DE5FDAF651@illinois.edu> On Nov 3, 2009, at 9:45 AM, Peter wrote: > ... > Did you do this step?: >>> Important! If you are on a Windows platform, you need to put >>> STDOUT into binary mode so that the PNG file does not go >>> through Window's carriage return/linefeed transformations. >>> Before the final print statement, put the statement >>> binmode(STDOUT). This advice also applies to certain older >>> versions of RedHat, which ship with a patched (and possibly >>> broken) version of Perl. > > (BioPerl devs - couldn't that be added to the default > render_blast1.pl script with an if statement checking for > Windows?) > > Peter Yes, that should be added. I'll work on it. chris From mauricio at open-bio.org Tue Nov 3 17:20:52 2009 From: mauricio at open-bio.org (Mauricio Herrera Cuadra) Date: Tue, 03 Nov 2009 11:20:52 -0600 Subject: [Bioperl-l] svn errors? In-Reply-To: <24c96eca0911030909p7cfbf858h4de5a345cf8a0782@mail.gmail.com> References: <24c96eca0911030909p7cfbf858h4de5a345cf8a0782@mail.gmail.com> Message-ID: <4AF06674.30506@open-bio.org> Hi Aaron, This was reported a few days ago. Chris Dagdigian is working today on a fix for it. Mauricio. Aaron Mackey wrote: > [ajm6q at lc4 bioperl-live]$ svn update > svn: Decompression of svndiff data failed > > > I'll admit to not having svn updated in awhile; A clean, anonymous svn co > failed with the same message: > > [...] > A bioperl-live/Bio/Structure/StructureI.pm > A bioperl-live/Bio/Structure/IO > svn: Decompression of svndiff data failed > > -Aaron > > P.S. I used this command: svn co svn:// > code.open-bio.org/bioperl/bioperl-live/trunk bioperl-live > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From rachitasharma at gmail.com Tue Nov 3 22:12:11 2009 From: rachitasharma at gmail.com (Rachita Sharma) Date: Tue, 3 Nov 2009 14:12:11 -0800 Subject: [Bioperl-l] Trouble parsing PSI-BLAST Message-ID: <48f9c0d0911031412v26935097ib06d13c2266cfd8a@mail.gmail.com> I am having trouble parsing PSI-BLAST results. Please help. The code is: my $in = new Bio::SearchIO( -format => 'blast', -file => "BS_XFpsiRblastoutputs/e${ev}/bloutput${i}.txt"); while( my $result = $in->next_result ) { while( my $hit = $result->next_hit ) { $sth->execute($result->query_name, $hit->name, $hit->significance); print "Query executed!\n"; } } The error is: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: no data for midline ***** No hits found ****** STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:359 STACK: Bio::SearchIO::blast::next_result /usr/lib/perl5/site_perl/5.8.8/Bio/SearchIO/blast.pm:1813 STACK: BSubVCpsiRblast.pl:92 ----------------------------------------------------------- From cjfields at illinois.edu Wed Nov 4 03:42:55 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 3 Nov 2009 21:42:55 -0600 Subject: [Bioperl-l] Trouble parsing PSI-BLAST In-Reply-To: <48f9c0d0911031412v26935097ib06d13c2266cfd8a@mail.gmail.com> References: <48f9c0d0911031412v26935097ib06d13c2266cfd8a@mail.gmail.com> Message-ID: Rachita, You'll have to give us more to go on than this. The best thing to do is file a bug report and attach an example PSI-BLAST report and code that causes the problem. The $sth->execute(...) is a bit odd, but that shouldn't cause the error in question. Also, make sure to stipulate the OS, version of BioPerl, and perl version. chris On Nov 3, 2009, at 4:12 PM, Rachita Sharma wrote: > I am having trouble parsing PSI-BLAST results. Please help. > > The code is: > my $in = new Bio::SearchIO( -format => 'blast', > -file => > "BS_XFpsiRblastoutputs/e${ev}/bloutput${i}.txt"); > > > while( my $result = $in->next_result ) { > while( my $hit = $result->next_hit ) { > > $sth->execute($result->query_name, $hit->name, $hit->significance); > print "Query executed!\n"; > > } > } > > The error is: > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: no data for midline ***** No hits found ****** > STACK: Error::throw > STACK: Bio::Root::Root::throw > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:359 > STACK: Bio::SearchIO::blast::next_result > /usr/lib/perl5/site_perl/5.8.8/Bio/SearchIO/blast.pm:1813 > STACK: BSubVCpsiRblast.pl:92 > ----------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From alexl at users.sourceforge.net Wed Nov 4 07:30:21 2009 From: alexl at users.sourceforge.net (Alex Lancaster) Date: Wed, 04 Nov 2009 02:30:21 -0500 Subject: [Bioperl-l] version of ExtUtils::Manifest too strict? Message-ID: Does the version of ExtUtils::Manifest really need to be strictly greater than or equal to 1.52? Currently this blocks me updating the Fedora package of BioPerl to 1.6.1, because the version of perl that Fedora ships is on 1.51 and hence the build fails with: Checking prerequisites... - ERROR: ExtUtils::Manifest (1.51_01) is installed, but we need version >= 1.52 Full logs are here: http://koji.fedoraproject.org/koji/taskinfo?taskID=1787483 http://koji.fedoraproject.org/koji/getfile?taskID=1787483&name=build.log This is true even with the version of Perl in rawhide/F-12 etc. (ExtUtils::Manifest is in the base perl package). If it really is necessary, I would like to be armed with a good argument why it needs to be updated, since the Perl package maintainer would have to update the entire Perl package simply to get a more recent version of one small subpackage. Regards, Alex From jluis.lavin at unavarra.es Wed Nov 4 08:43:35 2009 From: jluis.lavin at unavarra.es (jluis.lavin at unavarra.es) Date: Wed, 4 Nov 2009 09:43:35 +0100 (CET) Subject: [Bioperl-l] Trouble retrieving multiple sequences from NCBI in a single list query Message-ID: <1386.130.206.164.153.1257324215.squirrel@webmail.unavarra.es> Hello all, I?m a newbie who is having terrible troubles trying to retrieve a list multiple sequences from the NCBI and write them to a single file in Fasta format. The code I?ve written seems to read mylist and retrive the sequences, but it kinda overwrites them so that I only get the last sequence on the list. I?ve been told to ask the people on this mailing list for help, since you may have come across this problem also or at last will know how to solve it... Here is my code, which basically consist on an STDIN for the list to be read into an array and a loop to read each sequence (stopping when the list ends) and retrieve a sequence each time the loop is launched, writting that sequence to a fasta file. I only get a sequence back although it seems to perform the retrieving process with each of the sequences of the list... #!/usr/bin/perl -w use strict; use Bio::DB::GenPept; use Bio::DB::GenBank; use Bio::SeqIO; print "Enter your list name:"; my $archivo=; chomp $archivo; die ("Can?t open input\n") unless (open(INFILE, $archivo)); my @lista = ; foreach my $seq (@lista) { if ($seq eq '') { die ("empty list") } else { my $db = new Bio::DB::GenPept("-format" => "Fasta"); my $seqobj = $db->get_Seq_by_acc($seq); my $out = new Bio::SeqIO (-file => ">extracted_seqs.fasta", -format => 'fasta'); $out->write_seq($seqobj); } } exit; An example list of sequences can be this one: YP_003107578.1 YP_003106103.1 YP_003106552.1 YP_003106560.1 YP_003107053.1 YP_003107450.1 YP_003108000.1 YP_003105023.1 YP_003105264.1 Thanks in advance for your help ;) -- Jos? Luis Lav?n Trueba, PhD Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN From e.osimo at gmail.com Wed Nov 4 09:54:52 2009 From: e.osimo at gmail.com (Emanuele Osimo) Date: Wed, 4 Nov 2009 10:54:52 +0100 Subject: [Bioperl-l] Bio::Graphics and picture format Message-ID: <2ac05d0f0911040154h4eed4a1j8108f78e6e4761f3@mail.gmail.com> Hello everyone, do you know if it is possible to generate an image with Bio::Graphics in a vector format? Is there a list of available formats? Thanks Emanuele From David.Messina at sbc.su.se Wed Nov 4 09:52:53 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 4 Nov 2009 10:52:53 +0100 Subject: [Bioperl-l] Trouble retrieving multiple sequences from NCBI in a single list query In-Reply-To: <1386.130.206.164.153.1257324215.squirrel@webmail.unavarra.es> References: <1386.130.206.164.153.1257324215.squirrel@webmail.unavarra.es> Message-ID: <628aabb70911040152r19ed79dfnbc54f346295d28a8@mail.gmail.com> > > The code I?ve written seems to read mylist and retrive the sequences, but > it kinda overwrites them so that I only get the last sequence on the list. > With this line my $out = new Bio::SeqIO (-file => ">extracted_seqs.fasta", -format => 'fasta'); you are opening the filehandle for the output file inside your loop, so each time it is writing over the previous file with an empty file. Then, you write a single sequence to that file with this line $out->write_seq($seqobj); So when you are done, you just have the last sequence in the output file. If you move the opening of the output filehandle outside the loop (it needs to be done only once), then it should work as you expect. Also, I notice the newline characters are not being removed from your sequence IDs (actually I'm a little surprised that the sequences are being retrieved). Just to be safe, you may want to add the line chomp @lista; after my @lista = ; Dave From jluis.lavin at unavarra.es Wed Nov 4 10:14:40 2009 From: jluis.lavin at unavarra.es (jluis.lavin at unavarra.es) Date: Wed, 4 Nov 2009 11:14:40 +0100 (CET) Subject: [Bioperl-l] Trouble retrieving multiple sequences from NCBI in a single list query In-Reply-To: <628aabb70911040152r19ed79dfnbc54f346295d28a8@mail.gmail.com> References: <1386.130.206.164.153.1257324215.squirrel@webmail.unavarra.es> <628aabb70911040152r19ed79dfnbc54f346295d28a8@mail.gmail.com> Message-ID: <1791.130.206.164.153.1257329680.squirrel@webmail.unavarra.es> Thank you very very much Dave, I?ve had a really frustrating time trying to find out what I was doing wrong, it has been so frustrating that I was about to quit Bioperl. Now I can try to focus on BLAST parsing for my comparative genomic analysis You?re great in this mailing list, because you give a fast and neat advice to all the questions asked here by newbies like me ;) El Mie, 4 de Noviembre de 2009, 10:52, Dave Messina escribi?: >> >> The code I??ve written seems to read mylist and retrive the sequences, >> but >> it kinda overwrites them so that I only get the last sequence on the >> list. >> > > With this line > > my $out = new Bio::SeqIO (-file => ">extracted_seqs.fasta", -format => > 'fasta'); > > > you are opening the filehandle for the output file inside your loop, so > each > time it is writing over the previous file with an empty file. Then, you > write a single sequence to that file with this line > > $out->write_seq($seqobj); > > > So when you are done, you just have the last sequence in the output file. > > If you move the opening of the output filehandle outside the loop (it > needs > to be done only once), then it should work as you expect. > > Also, I notice the newline characters are not being removed from your > sequence IDs (actually I'm a little surprised that the sequences are > being > retrieved). Just to be safe, you may want to add the line > > chomp @lista; > > > after > > my @lista = ; > > > > > Dave > -- Dr. Jos? Luis Lav?n Trueba Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN From hrh at fmi.ch Wed Nov 4 10:05:17 2009 From: hrh at fmi.ch (Hotz, Hans-Rudolf) Date: Wed, 04 Nov 2009 11:05:17 +0100 Subject: [Bioperl-l] Trouble retrieving multiple sequences from NCBI in a single list query In-Reply-To: <1386.130.206.164.153.1257324215.squirrel@webmail.unavarra.es> Message-ID: Hi try my $out = new Bio::SeqIO (-file => ">>extracted_seqs.fasta", ^ this way you no longer overwrite your existing file, but append the next sequence. Regards, Hans On 11/4/09 9:43 AM, "jluis.lavin at unavarra.es" wrote: > > Hello all, > > I?m a newbie who is having terrible troubles trying to retrieve a list > multiple sequences from the NCBI and write them to a single file in Fasta > format. > The code I?ve written seems to read mylist and retrive the sequences, but > it kinda overwrites them so that I only get the last sequence on the list. > I?ve been told to ask the people on this mailing list for help, since you > may have come across this problem also or at last will know how to solve > it... > > Here is my code, which basically consist on an STDIN for the list to be > read into an array and a loop to read each sequence (stopping when the > list ends) and retrieve a sequence each time the loop is launched, > writting that sequence to a fasta file. I only get a sequence back > although it seems to perform the retrieving process with each of the > sequences of the list... > > > #!/usr/bin/perl -w > use strict; > use Bio::DB::GenPept; > use Bio::DB::GenBank; > use Bio::SeqIO; > print "Enter your list name:"; > my $archivo=; > chomp $archivo; > die ("Can?t open input\n") unless (open(INFILE, $archivo)); > my @lista = ; > foreach my $seq (@lista) { > if ($seq eq '') { > die ("empty list") > } > else { > my $db = new Bio::DB::GenPept("-format" => "Fasta"); > my $seqobj = $db->get_Seq_by_acc($seq); > my $out = new Bio::SeqIO (-file => ">extracted_seqs.fasta", > -format => 'fasta'); > $out->write_seq($seqobj); > } > } > exit; > > > An example list of sequences can be this one: > > YP_003107578.1 > YP_003106103.1 > YP_003106552.1 > YP_003106560.1 > YP_003107053.1 > YP_003107450.1 > YP_003108000.1 > YP_003105023.1 > YP_003105264.1 > > Thanks in advance for your help ;) From jluis.lavin at unavarra.es Wed Nov 4 10:25:38 2009 From: jluis.lavin at unavarra.es (jluis.lavin at unavarra.es) Date: Wed, 4 Nov 2009 11:25:38 +0100 (CET) Subject: [Bioperl-l] Trouble retrieving multiple sequences from NCBI in asingle list query In-Reply-To: References: <1386.130.206.164.153.1257324215.squirrel@webmail.unavarra.es> Message-ID: <1834.130.206.164.153.1257330338.squirrel@webmail.unavarra.es> Thank you very much for your answer Hans!!! It works perfectly,also a neat and fast solution, like Dave?s. Blessings to you all ;) El Mie, 4 de Noviembre de 2009, 11:05, Hotz, Hans-Rudolf escribi?: > Hi > > try > > my $out = new Bio::SeqIO (-file => ">>extracted_seqs.fasta", > ^ > > this way you no longer overwrite your existing file, but append the next > sequence. > > Regards, Hans > > > > On 11/4/09 9:43 AM, "jluis.lavin at unavarra.es" > wrote: > >> >> Hello all, >> >> I?m a newbie who is having terrible troubles trying to retrieve a list >> multiple sequences from the NCBI and write them to a single file in >> Fasta >> format. >> The code I?ve written seems to read mylist and retrive the sequences, >> but >> it kinda overwrites them so that I only get the last sequence on the >> list. >> I?ve been told to ask the people on this mailing list for help, since >> you >> may have come across this problem also or at last will know how to solve >> it... >> >> Here is my code, which basically consist on an STDIN for the list to be >> read into an array and a loop to read each sequence (stopping when the >> list ends) and retrieve a sequence each time the loop is launched, >> writting that sequence to a fasta file. I only get a sequence back >> although it seems to perform the retrieving process with each of the >> sequences of the list... >> >> >> #!/usr/bin/perl -w >> use strict; >> use Bio::DB::GenPept; >> use Bio::DB::GenBank; >> use Bio::SeqIO; >> print "Enter your list name:"; >> my $archivo=; >> chomp $archivo; >> die ("Can?t open input\n") unless (open(INFILE, $archivo)); >> my @lista = ; >> foreach my $seq (@lista) { >> if ($seq eq '') { >> die ("empty list") >> } >> else { >> my $db = new Bio::DB::GenPept("-format" => "Fasta"); >> my $seqobj = $db->get_Seq_by_acc($seq); >> my $out = new Bio::SeqIO (-file => ">extracted_seqs.fasta", >> -format => 'fasta'); >> $out->write_seq($seqobj); >> } >> } >> exit; >> >> >> An example list of sequences can be this one: >> >> YP_003107578.1 >> YP_003106103.1 >> YP_003106552.1 >> YP_003106560.1 >> YP_003107053.1 >> YP_003107450.1 >> YP_003108000.1 >> YP_003105023.1 >> YP_003105264.1 >> >> Thanks in advance for your help ;) > > -- Dr. Jos? Luis Lav?n Trueba Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN From scott at scottcain.net Wed Nov 4 13:26:02 2009 From: scott at scottcain.net (Scott Cain) Date: Wed, 4 Nov 2009 08:26:02 -0500 Subject: [Bioperl-l] Bio::Graphics and picture format In-Reply-To: <2ac05d0f0911040154h4eed4a1j8108f78e6e4761f3@mail.gmail.com> References: <2ac05d0f0911040154h4eed4a1j8108f78e6e4761f3@mail.gmail.com> Message-ID: <0FB17FBC-16BE-4A9F-AC75-983D3B4ECE7D@scottcain.net> Hi Emanuele, It is possible to use GD::SVG instead of GD to generate SVG graphics. To use it, you provide an argument of "-image_class GD::SVG" to the constructor of Bio::Graphics::Panel. See the perldoc of Bio::Graphics::Panel for more info. Scott On Nov 4, 2009, at 4:54 AM, Emanuele Osimo wrote: > Hello everyone, > do you know if it is possible to generate an image with > Bio::Graphics in a > vector format? Is there a list of available formats? > Thanks > Emanuele > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ----------------------------------------------------------------------- Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From b3sn7 at UNB.ca Tue Nov 3 17:30:24 2009 From: b3sn7 at UNB.ca (Sharma, Rachita) Date: Tue, 3 Nov 2009 13:30:24 -0400 Subject: [Bioperl-l] Trouble parsing PSI-BLAST Message-ID: <1257269424.4af068b045434@webmail.unb.ca> I am having trouble parsing PSI-BLAST results. Please help. The code is: my $in = new Bio::SearchIO( -format => 'blast', -file => "BS_XFpsiRblastoutputs/e${ev}/bloutput${i}.txt"); while( my $result = $in->next_result ) { while( my $hit = $result->next_hit ) { $sth->execute($result->query_name, $hit->name, $hit->significance); print "Query executed!\n"; } } The error is: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: no data for midline ***** No hits found ****** STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:359 STACK: Bio::SearchIO::blast::next_result /usr/lib/perl5/site_perl/5.8.8/Bio/SearchIO/blast.pm:1813 STACK: BSubVCpsiRblast.pl:92 ----------------------------------------------------------- ******************************* Rachita Sharma Research Assistant (PhD Student) University of New Brunswick, NB, CANADA email: Rachita.Sharma at unb.ca Phone no: 503-895-3619 ******************************* From cjfields at illinois.edu Wed Nov 4 13:53:35 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 4 Nov 2009 07:53:35 -0600 Subject: [Bioperl-l] version of ExtUtils::Manifest too strict? In-Reply-To: References: Message-ID: <1D9E943F-2EDC-49AB-83DE-78DED5A8AC23@illinois.edu> Alex, Not sure why ExtUtils::Manifest can't be bundled as a separate perl package alone. It is part of perl core but it's also available on CPAN separately from perl itself: http://search.cpan.org/~rkobes/ExtUtils-Manifest-1.57/lib/ExtUtils/Manifest.pm This is the commit message for that BTW. This allows spaces in file names for the MANIFEST. v1.52 is a bug fix and is required. http://code.open-bio.org/svnweb/index.cgi/bioperl/revision/?rev=15673 chris On Nov 4, 2009, at 1:30 AM, Alex Lancaster wrote: > Does the version of ExtUtils::Manifest really need to be strictly > greater than or equal to 1.52? > > Currently this blocks me updating the Fedora package of BioPerl to > 1.6.1, because the version of perl that Fedora ships is on 1.51 and > hence the build fails with: > > Checking prerequisites... > - ERROR: ExtUtils::Manifest (1.51_01) is installed, but we need > version >= 1.52 > > Full logs are here: > http://koji.fedoraproject.org/koji/taskinfo?taskID=1787483 > http://koji.fedoraproject.org/koji/getfile?taskID=1787483&name=build.log > > This is true even with the version of Perl in rawhide/F-12 etc. > (ExtUtils::Manifest is in the base perl package). > > If it really is necessary, I would like to be armed with a good > argument why this ca > why it needs to be updated, since the Perl package maintainer would > have > to update the entire Perl package simply to get a more recent > version of > one small subpackage. > > Regards, > Alex > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Nov 4 13:55:34 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 4 Nov 2009 07:55:34 -0600 Subject: [Bioperl-l] Trouble parsing PSI-BLAST In-Reply-To: <1257269424.4af068b045434@webmail.unb.ca> References: <1257269424.4af068b045434@webmail.unb.ca> Message-ID: <70E34111-4E70-463D-86EE-06926EA57073@illinois.edu> Rachita, Asked and answered yesterday. Please submit as a bug. chris On Nov 3, 2009, at 11:30 AM, Sharma, Rachita wrote: > > I am having trouble parsing PSI-BLAST results. Please help. > > The code is: > my $in = new Bio::SearchIO( -format => 'blast', > -file => "BS_XFpsiRblastoutputs/e${ev}/bloutput${i}.txt"); > > > while( my $result = $in->next_result ) { > while( my $hit = $result->next_hit ) { > > $sth->execute($result->query_name, $hit->name, $hit->significance); > print "Query executed!\n"; > > } > } > > The error is: > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: no data for midline ***** No hits found ****** > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/ > Root/Root.pm:359 > STACK: Bio::SearchIO::blast::next_result > /usr/lib/perl5/site_perl/5.8.8/Bio/SearchIO/blast.pm:1813 > STACK: BSubVCpsiRblast.pl:92 > ----------------------------------------------------------- > > > > > ******************************* > Rachita Sharma > Research Assistant (PhD Student) > University of New Brunswick, NB, CANADA > email: Rachita.Sharma at unb.ca > Phone no: 503-895-3619 > ******************************* > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Wed Nov 4 14:11:43 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 4 Nov 2009 15:11:43 +0100 Subject: [Bioperl-l] Trouble retrieving multiple sequences from NCBI in a single list query In-Reply-To: <1791.130.206.164.153.1257329680.squirrel@webmail.unavarra.es> References: <1386.130.206.164.153.1257324215.squirrel@webmail.unavarra.es> <628aabb70911040152r19ed79dfnbc54f346295d28a8@mail.gmail.com> <1791.130.206.164.153.1257329680.squirrel@webmail.unavarra.es> Message-ID: <628aabb70911040611q56b441c8o6888f326d0b314d@mail.gmail.com> Aw shucks, Jos?, glad I could be of help. There are plenty of people who answer questions around here, but my timezone sometimes gives me an advantage for the European ones. :) Dave From daniel.gaston at gmail.com Wed Nov 4 14:45:04 2009 From: daniel.gaston at gmail.com (Daniel Gaston) Date: Wed, 4 Nov 2009 10:45:04 -0400 Subject: [Bioperl-l] SwissProt and Subcellular localization information Message-ID: <50c615ba0911040645j1b28e727p5d7bf47a04db160b@mail.gmail.com> Hi Everyone, I have recently been playing around with SwissProt format flatfiles and want to extract sequences based on subcellular localization. I notice in going through the code for swiss.pm and swissdriver.pm that in both (more so in swissdriver.pm) there are several steps where organelle information based on the OG line could be extracted and added to data structure but isn't. It seems that in both cases the OG line is being added in to the generic lumping of data from the OC, OS, and OX lines in order to extract species names and taxonomy information but getting rid of everything else. Is there a particular reason for this or just a simple oversight? On the surface at least it looks like a relatively simple modification to make although I admit that I am not terribly adept at manipulating these SeqIO datastructures. Thanks for your time, Dan From daniel.gaston at gmail.com Wed Nov 4 17:12:10 2009 From: daniel.gaston at gmail.com (Daniel Gaston) Date: Wed, 4 Nov 2009 13:12:10 -0400 Subject: [Bioperl-l] SwissProt and Subcellular localization information Message-ID: <50c615ba0911040912pfd2483fwe44cd098beed73c7@mail.gmail.com> Sorry folks, it appears I was just being a bonehead and didn't look close enough into Bio:Annotations and Bio:Species objects that store all of this data. Dan On Wed, Nov 4, 2009 at 1:00 PM, wrote: > Send Bioperl-l mailing list submissions to > bioperl-l at lists.open-bio.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://lists.open-bio.org/mailman/listinfo/bioperl-l > or, via email, send a message with subject or body 'help' to > bioperl-l-request at lists.open-bio.org > > You can reach the person managing the list at > bioperl-l-owner at lists.open-bio.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Bioperl-l digest..." > > Today's Topics: > > 1. SwissProt and Subcellular localization information > (Daniel Gaston) > > > ---------- Forwarded message ---------- > From: Daniel Gaston > To: bioperl-l at lists.open-bio.org > Date: Wed, 4 Nov 2009 10:45:04 -0400 > Subject: [Bioperl-l] SwissProt and Subcellular localization information > Hi Everyone, > > I have recently been playing around with SwissProt format flatfiles and > want > to extract sequences based on subcellular localization. I notice in going > through the code for swiss.pm and swissdriver.pm that in both (more so in > swissdriver.pm) there are several steps where organelle information based > on > the OG line could be extracted and added to data structure but isn't. It > seems that in both cases the OG line is being added in to the generic > lumping of data from the OC, OS, and OX lines in order to extract species > names and taxonomy information but getting rid of everything else. Is there > a particular reason for this or just a simple oversight? On the surface at > least it looks like a relatively simple modification to make although I > admit that I am not terribly adept at manipulating these SeqIO > datastructures. > > Thanks for your time, > > Dan > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jluis.lavin at unavarra.es Thu Nov 5 15:28:23 2009 From: jluis.lavin at unavarra.es (jluis.lavin at unavarra.es) Date: Thu, 5 Nov 2009 16:28:23 +0100 (CET) Subject: [Bioperl-l] A question about iBio::Index: and its correct use Message-ID: <2120.130.206.164.153.1257434903.squirrel@webmail.unavarra.es> Hello to all, I?m trying to write a script to retrieve a list of sequences from a local FASTA file (for example a fasta archive where all the protein models of an organism are stored). This file would be used by me as some kind "local database" (sorry if I mistake a few concepts...) I?ve been reading the BioPerl HOWTOs and I came across the Bio::Index::Fasta tool. If I didn?t misunderstood what I read (which can be easy because my low level on programming) this Indexing tool should do the job. I wrote a couple of scripts based on the documentation i read about this tool, but I don?t seem to be able to create the index file to be used later (to retrieve the sequences from). -First of all, I want to ask the people in this forum if the Bio::Index::Fasta is the right one to chose for this tasks. -Then I?ll beg you to take a look at my scripts, because I don?t seem to catch the bug... Best wishes to you all and thanks in advance ;) -- Jos? Luis Lav?n Trueba, PhD Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN From maj at fortinbras.us Thu Nov 5 15:39:05 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 5 Nov 2009 10:39:05 -0500 Subject: [Bioperl-l] A question about iBio::Index: and its correct use In-Reply-To: <2120.130.206.164.153.1257434903.squirrel@webmail.unavarra.es> References: <2120.130.206.164.153.1257434903.squirrel@webmail.unavarra.es> Message-ID: Jos? -- It looks like this is a good solution to your problem. Please send you script so we can look at it- cheers Mark ----- Original Message ----- From: To: Sent: Thursday, November 05, 2009 10:28 AM Subject: [Bioperl-l] A question about iBio::Index: and its correct use Hello to all, I?m trying to write a script to retrieve a list of sequences from a local FASTA file (for example a fasta archive where all the protein models of an organism are stored). This file would be used by me as some kind "local database" (sorry if I mistake a few concepts...) I?ve been reading the BioPerl HOWTOs and I came across the Bio::Index::Fasta tool. If I didn?t misunderstood what I read (which can be easy because my low level on programming) this Indexing tool should do the job. I wrote a couple of scripts based on the documentation i read about this tool, but I don?t seem to be able to create the index file to be used later (to retrieve the sequences from). -First of all, I want to ask the people in this forum if the Bio::Index::Fasta is the right one to chose for this tasks. -Then I?ll beg you to take a look at my scripts, because I don?t seem to catch the bug... Best wishes to you all and thanks in advance ;) -- Jos? Luis Lav?n Trueba, PhD Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From jluis.lavin at unavarra.es Thu Nov 5 15:46:36 2009 From: jluis.lavin at unavarra.es (jluis.lavin at unavarra.es) Date: Thu, 5 Nov 2009 16:46:36 +0100 (CET) Subject: [Bioperl-l] [Fwd: Re: A question about iBio::Index: and its correct use] Message-ID: <2642.130.206.164.153.1257435996.squirrel@webmail.unavarra.es> ---------------------------- Mensaje original ---------------------------- Subject: Re: [Bioperl-l] A question about iBio::Index: and its correct use From: jluis.lavin at unavarra.es Fecha: Jue, 5 de Noviembre de 2009, 16:46 To: "Mark A. Jensen" -------------------------------------------------------------------------- Hi Mark, I?ve actually got two scripts, the first one is to create the index and the second one is to retrieve the sequence lis from the indexed file. 1)Here is the Index creation script: #!/c:/Perl -w use strict; use Bio::Index::Fasta; use strict; print "Enter file for indexing: \n"; my $Index_File_Name = ; my $inx = Bio::Index::Fasta->new(-filename => $Index_File_Name.".idx", -write_flag => 1); $inx->make_index(my $File_Name); 2)And here is the sequence retrieval script: #!/c:/Perl -w use Bio::Index::Fasta; use strict; #PC9.fasta is my genomic file my $Index_File_Name ="PC9.fasta"; my $inx = Bio::Index::Fasta->new($Index_File_Name); #LCS.txt is my sequences list @ARGV = ; foreach my $id (@ARGV) { if ($id eq ''){ die ("empty list") } else { my $seqobj = $inx->fetch($id); my $out = new Bio::SeqIO (-file => ">>extracted_seqs.fasta", -format => 'fasta'); $out->write_seq($seqobj); } } exit; } I hope this code is not a total scum... Thanks in advance ;) El Jue, 5 de Noviembre de 2009, 16:39, Mark A. Jensen escribi?: > Jos? -- It looks like this is a good solution to your problem. Please send > you > script so we can look at it- > cheers Mark > ----- Original Message ----- > From: > To: > Sent: Thursday, November 05, 2009 10:28 AM > Subject: [Bioperl-l] A question about iBio::Index: and its correct use > > > > Hello to all, > > I?m trying to write a script to retrieve a list of sequences from a local > FASTA file (for example a fasta archive where all the protein models of an > organism are stored). This file would be used by me as some kind "local > database" (sorry if I mistake a few concepts...) > I?ve been reading the BioPerl HOWTOs and I came across the > Bio::Index::Fasta tool. > If I didn?t misunderstood what I read (which can be easy because my low > level on programming) this Indexing tool should do the job. > I wrote a couple of scripts based on the documentation i read about this > tool, but I don?t seem to be able to create the index file to be used > later (to retrieve the sequences from). > -First of all, I want to ask the people in this forum if the > Bio::Index::Fasta is the right one to chose for this tasks. > -Then I?ll beg you to take a look at my scripts, because I don?t seem to > catch the bug... > > Best wishes to you all and thanks in advance ;) > > -- > Jos? Luis Lav?n Trueba, PhD > > Dpto. de Producci?n Agraria > Grupo de Gen?tica y Microbiolog?a > Universidad P?blica de Navarra > 31006 Pamplona > Navarra > SPAIN > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > -- Dr. Jos? Luis Lav?n Trueba Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN -- Dr. Jos? Luis Lav?n Trueba Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN From maj at fortinbras.us Thu Nov 5 15:37:53 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 5 Nov 2009 10:37:53 -0500 Subject: [Bioperl-l] Trouble retrieving multiple sequences from NCBI ina single list query In-Reply-To: <628aabb70911040611q56b441c8o6888f326d0b314d@mail.gmail.com> References: <1386.130.206.164.153.1257324215.squirrel@webmail.unavarra.es> <628aabb70911040152r19ed79dfnbc54f346295d28a8@mail.gmail.com> <1791.130.206.164.153.1257329680.squirrel@webmail.unavarra.es> <628aabb70911040611q56b441c8o6888f326d0b314d@mail.gmail.com> Message-ID: <49075FDFF6764EE48E932D95EB994221@NewLife> True, Dave, you compete only with crazed east coast core developers who're doing "just one more thing" at 2am.... ----- Original Message ----- From: "Dave Messina" To: Cc: Sent: Wednesday, November 04, 2009 9:11 AM Subject: Re: [Bioperl-l] Trouble retrieving multiple sequences from NCBI ina single list query > Aw shucks, Jos?, glad I could be of help. There are plenty of people who > answer questions around here, but my timezone sometimes gives me an > advantage for the European ones. :) > > > Dave > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hrh at fmi.ch Thu Nov 5 16:02:48 2009 From: hrh at fmi.ch (Hotz, Hans-Rudolf) Date: Thu, 05 Nov 2009 17:02:48 +0100 Subject: [Bioperl-l] A question about iBio::Index: and its correct use In-Reply-To: <2120.130.206.164.153.1257434903.squirrel@webmail.unavarra.es> Message-ID: Jluis > -Then I?ll beg you to take a look at my scripts, because I don?t seem to > catch the bug... you haven't attached/included any scripts, have you? Anyway, have you considered using BLAST indices (created with the additional flag "-o") together with the tool 'fastacmd' (which also included in the NCBI blast binaries) as a simple (and very fast) alternative for fetching sequences. Regards, Hans From maj at fortinbras.us Thu Nov 5 16:02:09 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 5 Nov 2009 11:02:09 -0500 Subject: [Bioperl-l] [Fwd: Re: A question about iBio::Index: and its correct use] In-Reply-To: <2642.130.206.164.153.1257435996.squirrel@webmail.unavarra.es> References: <2642.130.206.164.153.1257435996.squirrel@webmail.unavarra.es> Message-ID: <1984ED07F36C446284B25F617964B6C6@NewLife> Hey Jos?, The first thing that jumps out it the index file name. Looks like you create it as PC9.fasta.idx But you read it as PC9.fasta Not an unusual mistake. Do my $inx = Bio::Index::Fasta->new('PC9.fasta.idx'); and see if it works. MAJ ----- Original Message ----- From: To: Sent: Thursday, November 05, 2009 10:46 AM Subject: [Bioperl-l] [Fwd: Re: A question about iBio::Index: and its correct use] ---------------------------- Mensaje original ---------------------------- Subject: Re: [Bioperl-l] A question about iBio::Index: and its correct use From: jluis.lavin at unavarra.es Fecha: Jue, 5 de Noviembre de 2009, 16:46 To: "Mark A. Jensen" -------------------------------------------------------------------------- Hi Mark, I?ve actually got two scripts, the first one is to create the index and the second one is to retrieve the sequence lis from the indexed file. 1)Here is the Index creation script: #!/c:/Perl -w use strict; use Bio::Index::Fasta; use strict; print "Enter file for indexing: \n"; my $Index_File_Name = ; my $inx = Bio::Index::Fasta->new(-filename => $Index_File_Name.".idx", -write_flag => 1); $inx->make_index(my $File_Name); 2)And here is the sequence retrieval script: #!/c:/Perl -w use Bio::Index::Fasta; use strict; #PC9.fasta is my genomic file my $Index_File_Name ="PC9.fasta"; my $inx = Bio::Index::Fasta->new($Index_File_Name); #LCS.txt is my sequences list @ARGV = ; foreach my $id (@ARGV) { if ($id eq ''){ die ("empty list") } else { my $seqobj = $inx->fetch($id); my $out = new Bio::SeqIO (-file => ">>extracted_seqs.fasta", -format => 'fasta'); $out->write_seq($seqobj); } } exit; } I hope this code is not a total scum... Thanks in advance ;) El Jue, 5 de Noviembre de 2009, 16:39, Mark A. Jensen escribi?: > Jos? -- It looks like this is a good solution to your problem. Please send > you > script so we can look at it- > cheers Mark > ----- Original Message ----- > From: > To: > Sent: Thursday, November 05, 2009 10:28 AM > Subject: [Bioperl-l] A question about iBio::Index: and its correct use > > > > Hello to all, > > I?m trying to write a script to retrieve a list of sequences from a local > FASTA file (for example a fasta archive where all the protein models of an > organism are stored). This file would be used by me as some kind "local > database" (sorry if I mistake a few concepts...) > I?ve been reading the BioPerl HOWTOs and I came across the > Bio::Index::Fasta tool. > If I didn?t misunderstood what I read (which can be easy because my low > level on programming) this Indexing tool should do the job. > I wrote a couple of scripts based on the documentation i read about this > tool, but I don?t seem to be able to create the index file to be used > later (to retrieve the sequences from). > -First of all, I want to ask the people in this forum if the > Bio::Index::Fasta is the right one to chose for this tasks. > -Then I?ll beg you to take a look at my scripts, because I don?t seem to > catch the bug... > > Best wishes to you all and thanks in advance ;) > > -- > Jos? Luis Lav?n Trueba, PhD > > Dpto. de Producci?n Agraria > Grupo de Gen?tica y Microbiolog?a > Universidad P?blica de Navarra > 31006 Pamplona > Navarra > SPAIN > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > -- Dr. Jos? Luis Lav?n Trueba Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN -- Dr. Jos? Luis Lav?n Trueba Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From jluis.lavin at unavarra.es Thu Nov 5 16:21:57 2009 From: jluis.lavin at unavarra.es (jluis.lavin at unavarra.es) Date: Thu, 5 Nov 2009 17:21:57 +0100 (CET) Subject: [Bioperl-l] [Fwd: Re: A question about iBio::Index: and its correct use] In-Reply-To: <1984ED07F36C446284B25F617964B6C6@NewLife> References: <2642.130.206.164.153.1257435996.squirrel@webmail.unavarra.es> <1984ED07F36C446284B25F617964B6C6@NewLife> Message-ID: <2969.130.206.164.153.1257438117.squirrel@webmail.unavarra.es> Thank you very much Mark, that?s a good point :$ I guess your correction is referred to the second script, isn?t it? If it is so, there is still a problem with the first script, it doesn?t create the PC9.fasta.idx file, instead it creates two files named: -PC9.fasta.idx.pag -PC9.fasta.idx.dir which seem to be clearly related with some kind of indexing process...but, unless the PC9.fasta.idx file is only virtual or remains hidden, I can?t find it anywhere... Forgive me if I?m talking nosense... Thank you very much again for your help ;) El Jue, 5 de Noviembre de 2009, 17:02, Mark A. Jensen escribi?: > Hey Jos?, > The first thing that jumps out it the index file name. Looks > like you create it as > PC9.fasta.idx > But you read it as > PC9.fasta > Not an unusual mistake. Do > my $inx = Bio::Index::Fasta->new('PC9.fasta.idx'); > and see if it works. > MAJ > ----- Original Message ----- > From: > To: > Sent: Thursday, November 05, 2009 10:46 AM > Subject: [Bioperl-l] [Fwd: Re: A question about iBio::Index: and its > correct > use] > > > > > ---------------------------- Mensaje original ---------------------------- > Subject: Re: [Bioperl-l] A question about iBio::Index: and its correct use > From: jluis.lavin at unavarra.es > Fecha: Jue, 5 de Noviembre de 2009, 16:46 > To: "Mark A. Jensen" > -------------------------------------------------------------------------- > > Hi Mark, > > I?ve actually got two scripts, the first one is to create the index and > the second one is to retrieve the sequence lis from the indexed file. > > 1)Here is the Index creation script: > > #!/c:/Perl -w > use strict; > use Bio::Index::Fasta; > use strict; > > print "Enter file for indexing: \n"; > my $Index_File_Name = ; > my $inx = Bio::Index::Fasta->new(-filename => $Index_File_Name.".idx", > -write_flag => 1); > $inx->make_index(my $File_Name); > > 2)And here is the sequence retrieval script: > > #!/c:/Perl -w > use Bio::Index::Fasta; > use strict; > #PC9.fasta is my genomic file > my $Index_File_Name ="PC9.fasta"; > my $inx = Bio::Index::Fasta->new($Index_File_Name); > #LCS.txt is my sequences list > @ARGV = ; > foreach my $id (@ARGV) { > if ($id eq ''){ > die ("empty list") > } > else { > my $seqobj = $inx->fetch($id); > my $out = new Bio::SeqIO (-file => ">>extracted_seqs.fasta", > -format => 'fasta'); > $out->write_seq($seqobj); > } > } > exit; > } > > I hope this code is not a total scum... > > Thanks in advance ;) > > > > El Jue, 5 de Noviembre de 2009, 16:39, Mark A. Jensen escribi?: >> Jos? -- It looks like this is a good solution to your problem. Please >> send >> you >> script so we can look at it- >> cheers Mark >> ----- Original Message ----- >> From: >> To: >> Sent: Thursday, November 05, 2009 10:28 AM >> Subject: [Bioperl-l] A question about iBio::Index: and its correct use >> >> >> >> Hello to all, >> >> I?m trying to write a script to retrieve a list of sequences from a >> local >> FASTA file (for example a fasta archive where all the protein models of >> an >> organism are stored). This file would be used by me as some kind "local >> database" (sorry if I mistake a few concepts...) >> I?ve been reading the BioPerl HOWTOs and I came across the >> Bio::Index::Fasta tool. >> If I didn?t misunderstood what I read (which can be easy because my low >> level on programming) this Indexing tool should do the job. >> I wrote a couple of scripts based on the documentation i read about this >> tool, but I don?t seem to be able to create the index file to be used >> later (to retrieve the sequences from). >> -First of all, I want to ask the people in this forum if the >> Bio::Index::Fasta is the right one to chose for this tasks. >> -Then I?ll beg you to take a look at my scripts, because I don?t seem to >> catch the bug... >> >> Best wishes to you all and thanks in advance ;) >> >> -- >> Jos? Luis Lav?n Trueba, PhD >> >> Dpto. de Producci?n Agraria >> Grupo de Gen?tica y Microbiolog?a >> Universidad P?blica de Navarra >> 31006 Pamplona >> Navarra >> SPAIN >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > > > -- > Dr. Jos? Luis Lav?n Trueba > > Dpto. de Producci?n Agraria > Grupo de Gen?tica y Microbiolog?a > Universidad P?blica de Navarra > 31006 Pamplona > Navarra > SPAIN > > > > -- > Dr. Jos? Luis Lav?n Trueba > > Dpto. de Producci?n Agraria > Grupo de Gen?tica y Microbiolog?a > Universidad P?blica de Navarra > 31006 Pamplona > Navarra > SPAIN > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > -- Dr. Jos? Luis Lav?n Trueba Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN From maj at fortinbras.us Thu Nov 5 16:39:09 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 5 Nov 2009 11:39:09 -0500 Subject: [Bioperl-l] [Fwd: Re: A question about iBio::Index: and its correct use] In-Reply-To: <2969.130.206.164.153.1257438117.squirrel@webmail.unavarra.es> References: <2642.130.206.164.153.1257435996.squirrel@webmail.unavarra.es> <1984ED07F36C446284B25F617964B6C6@NewLife> <2969.130.206.164.153.1257438117.squirrel@webmail.unavarra.es> Message-ID: Yes, these are files created by the SDBM, Perl's internal db manager. You should be able to open the index by simply $inx = Bio::Index::Fasta->new('PC9.fasta.idx'); and the dbm will know what to do-- cheers MAJ ----- Original Message ----- From: To: "Mark A. Jensen" Cc: ; Sent: Thursday, November 05, 2009 11:21 AM Subject: Re: [Bioperl-l] [Fwd: Re: A question about iBio::Index: and its correct use] > Thank you very much Mark, that?s a good point :$ > I guess your correction is referred to the second script, isn?t it? > > If it is so, there is still a problem with the first script, it doesn?t > create the PC9.fasta.idx file, instead it creates two files named: > -PC9.fasta.idx.pag > -PC9.fasta.idx.dir > > which seem to be clearly related with some kind of indexing process...but, > unless the PC9.fasta.idx file is only virtual or remains hidden, I can?t > find it anywhere... > Forgive me if I?m talking nosense... > > Thank you very much again for your help ;) > > > El Jue, 5 de Noviembre de 2009, 17:02, Mark A. Jensen escribi?: >> Hey Jos?, >> The first thing that jumps out it the index file name. Looks >> like you create it as >> PC9.fasta.idx >> But you read it as >> PC9.fasta >> Not an unusual mistake. Do >> my $inx = Bio::Index::Fasta->new('PC9.fasta.idx'); >> and see if it works. >> MAJ >> ----- Original Message ----- >> From: >> To: >> Sent: Thursday, November 05, 2009 10:46 AM >> Subject: [Bioperl-l] [Fwd: Re: A question about iBio::Index: and its >> correct >> use] >> >> >> >> >> ---------------------------- Mensaje original ---------------------------- >> Subject: Re: [Bioperl-l] A question about iBio::Index: and its correct use >> From: jluis.lavin at unavarra.es >> Fecha: Jue, 5 de Noviembre de 2009, 16:46 >> To: "Mark A. Jensen" >> -------------------------------------------------------------------------- >> >> Hi Mark, >> >> I?ve actually got two scripts, the first one is to create the index and >> the second one is to retrieve the sequence lis from the indexed file. >> >> 1)Here is the Index creation script: >> >> #!/c:/Perl -w >> use strict; >> use Bio::Index::Fasta; >> use strict; >> >> print "Enter file for indexing: \n"; >> my $Index_File_Name = ; >> my $inx = Bio::Index::Fasta->new(-filename => $Index_File_Name.".idx", >> -write_flag => 1); >> $inx->make_index(my $File_Name); >> >> 2)And here is the sequence retrieval script: >> >> #!/c:/Perl -w >> use Bio::Index::Fasta; >> use strict; >> #PC9.fasta is my genomic file >> my $Index_File_Name ="PC9.fasta"; >> my $inx = Bio::Index::Fasta->new($Index_File_Name); >> #LCS.txt is my sequences list >> @ARGV = ; >> foreach my $id (@ARGV) { >> if ($id eq ''){ >> die ("empty list") >> } >> else { >> my $seqobj = $inx->fetch($id); >> my $out = new Bio::SeqIO (-file => ">>extracted_seqs.fasta", >> -format => 'fasta'); >> $out->write_seq($seqobj); >> } >> } >> exit; >> } >> >> I hope this code is not a total scum... >> >> Thanks in advance ;) >> >> >> >> El Jue, 5 de Noviembre de 2009, 16:39, Mark A. Jensen escribi?: >>> Jos? -- It looks like this is a good solution to your problem. Please >>> send >>> you >>> script so we can look at it- >>> cheers Mark >>> ----- Original Message ----- >>> From: >>> To: >>> Sent: Thursday, November 05, 2009 10:28 AM >>> Subject: [Bioperl-l] A question about iBio::Index: and its correct use >>> >>> >>> >>> Hello to all, >>> >>> I?m trying to write a script to retrieve a list of sequences from a >>> local >>> FASTA file (for example a fasta archive where all the protein models of >>> an >>> organism are stored). This file would be used by me as some kind "local >>> database" (sorry if I mistake a few concepts...) >>> I?ve been reading the BioPerl HOWTOs and I came across the >>> Bio::Index::Fasta tool. >>> If I didn?t misunderstood what I read (which can be easy because my low >>> level on programming) this Indexing tool should do the job. >>> I wrote a couple of scripts based on the documentation i read about this >>> tool, but I don?t seem to be able to create the index file to be used >>> later (to retrieve the sequences from). >>> -First of all, I want to ask the people in this forum if the >>> Bio::Index::Fasta is the right one to chose for this tasks. >>> -Then I?ll beg you to take a look at my scripts, because I don?t seem to >>> catch the bug... >>> >>> Best wishes to you all and thanks in advance ;) >>> >>> -- >>> Jos? Luis Lav?n Trueba, PhD >>> >>> Dpto. de Producci?n Agraria >>> Grupo de Gen?tica y Microbiolog?a >>> Universidad P?blica de Navarra >>> 31006 Pamplona >>> Navarra >>> SPAIN >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >> >> >> -- >> Dr. Jos? Luis Lav?n Trueba >> >> Dpto. de Producci?n Agraria >> Grupo de Gen?tica y Microbiolog?a >> Universidad P?blica de Navarra >> 31006 Pamplona >> Navarra >> SPAIN >> >> >> >> -- >> Dr. Jos? Luis Lav?n Trueba >> >> Dpto. de Producci?n Agraria >> Grupo de Gen?tica y Microbiolog?a >> Universidad P?blica de Navarra >> 31006 Pamplona >> Navarra >> SPAIN >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > > > -- > Dr. Jos? Luis Lav?n Trueba > > Dpto. de Producci?n Agraria > Grupo de Gen?tica y Microbiolog?a > Universidad P?blica de Navarra > 31006 Pamplona > Navarra > SPAIN > > > > From jluis.lavin at unavarra.es Thu Nov 5 17:48:12 2009 From: jluis.lavin at unavarra.es (jluis.lavin at unavarra.es) Date: Thu, 5 Nov 2009 18:48:12 +0100 (CET) Subject: [Bioperl-l] A question about iBio::Index: and its correct use In-Reply-To: References: <2120.130.206.164.153.1257434903.squirrel@webmail.unavarra.es> Message-ID: <3313.130.206.164.153.1257443292.squirrel@webmail.unavarra.es> Thanks a lot for your help Hans, It's a little bit to hard to understand and turn into script this awesome information you've just given me...I hope I can use it in a near future anyway ;) The issue here is that the sequences I,m indexing are not generated by the NCBI nor stored there...although I belive you?re just refering to the tool itself and not to a retrieval from the NCBI. Thanks again you?re all great giving advice to newbies like me ;) Best wishes to you all El Jue, 5 de Noviembre de 2009, 17:02, Hotz, Hans-Rudolf escribi?: > > > > Jluis > >> -Then I?ll beg you to take a look at my scripts, because I don?t seem to >> catch the bug... > > you haven't attached/included any scripts, have you? > > > Anyway, have you considered using BLAST indices (created with the > additional > flag "-o") together with the tool 'fastacmd' (which also included in the > NCBI blast binaries) as a simple (and very fast) alternative for fetching > sequences. > > > Regards, Hans > > > -- Dr. Jos? Luis Lav?n Trueba Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN From florent.angly at gmail.com Thu Nov 5 18:00:19 2009 From: florent.angly at gmail.com (Florent Angly) Date: Thu, 05 Nov 2009 10:00:19 -0800 Subject: [Bioperl-l] A question about iBio::Index: and its correct use In-Reply-To: <3313.130.206.164.153.1257443292.squirrel@webmail.unavarra.es> References: <2120.130.206.164.153.1257434903.squirrel@webmail.unavarra.es> <3313.130.206.164.153.1257443292.squirrel@webmail.unavarra.es> Message-ID: <4AF312B3.9060009@gmail.com> Hans-Rudolf was talking about a way to retrieve sequences from a BLAST database. If you use BLAST locally, then your database is local too. More info here: http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/formatdb_fastacmd.html Florent jluis.lavin at unavarra.es wrote: > Thanks a lot for your help Hans, > It's a little bit to hard to understand and turn into script this awesome > information you've just given me...I hope I can use it in a near future > anyway ;) > The issue here is that the sequences I,m indexing are not generated by the > NCBI nor stored there...although I belive you?re just refering to the tool > itself and not to a retrieval from the NCBI. > > Thanks again you?re all great giving advice to newbies like me ;) > > Best wishes to you all > > > El Jue, 5 de Noviembre de 2009, 17:02, Hotz, Hans-Rudolf escribi?: > >> >> Jluis >> >> >>> -Then I?ll beg you to take a look at my scripts, because I don?t seem to >>> catch the bug... >>> >> you haven't attached/included any scripts, have you? >> >> >> Anyway, have you considered using BLAST indices (created with the >> additional >> flag "-o") together with the tool 'fastacmd' (which also included in the >> NCBI blast binaries) as a simple (and very fast) alternative for fetching >> sequences. >> >> >> Regards, Hans >> >> >> >> > > > From valiente at lsi.upc.edu Fri Nov 6 08:06:48 2009 From: valiente at lsi.upc.edu (valiente at lsi.upc.edu) Date: Fri, 6 Nov 2009 09:06:48 +0100 (CET) Subject: [Bioperl-l] Bio::SeqIO::genbank.pm Message-ID: <45737.147.83.59.225.1257494808.squirrel@webmail.lsi.upc.edu> There is a line in Bio::SeqIO::genbank.pm to convert data in classification lines into a classification array by splitting only on ';' or '.' so that a classification that is 2 or more words will still get matched,my @class = map { s/^\s+//; s/\s+$//; s/\s{2,}/ /g; $_; } split /(? References: <2120.130.206.164.153.1257434903.squirrel@webmail.unavarra.es> < C718B5B8.5561%hrh@fmi.ch> <3313.130.206.164.153.1257443292.squirrel@webmail.unavarra.es> <4AF312B3.9060009@gmail.com> Message-ID: <1222.130.206.164.153.1257497085.squirrel@webmail.unavarra.es> Thank you for the info Florent! I?ll try to read al the information on the link you provided and try to figure out how to make it work and if it is worthy for me, I mean, I work with several sequence files that come from multiple databases (JGI, BROAD, Genolevures or NCBI). Protein IDs from each of those databases is different from NCBI. Maybe it could be easier to write a script that allows me to enter a fasta file with all the protein models of a single organism, parse it and then extract the sequences of a given list (using the "ID style" of the particular database) than creating a BLAST index for each organism I need to work with...Did I explain the issue correctly? Anyway, since I don?t know anything about this tool Hans and you provided me, I can easily be wrong... Thank you for showing me the local BLAST Index tool, I?ll read the documentation carefully and study all its possibilities. Best wishes JL El Jue, 5 de Noviembre de 2009, 19:00, Florent Angly escribi?: > Hans-Rudolf was talking about a way to retrieve sequences from a BLAST > database. If you use BLAST locally, then your database is local too. > More info here: > http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/formatdb_fastacmd.html > Florent > > > jluis.lavin at unavarra.es wrote: >> Thanks a lot for your help Hans, >> It's a little bit to hard to understand and turn into script this >> awesome >> information you've just given me...I hope I can use it in a near future >> anyway ;) >> The issue here is that the sequences I,m indexing are not generated by >> the >> NCBI nor stored there...although I belive you?re just refering to the >> tool >> itself and not to a retrieval from the NCBI. >> >> Thanks again you?re all great giving advice to newbies like me ;) >> >> Best wishes to you all >> >> >> El Jue, 5 de Noviembre de 2009, 17:02, Hotz, Hans-Rudolf escribi?: >> >>> >>> Jluis >>> >>> >>>> -Then I?ll beg you to take a look at my scripts, because I don?t seem >>>> to >>>> catch the bug... >>>> >>> you haven't attached/included any scripts, have you? >>> >>> >>> Anyway, have you considered using BLAST indices (created with the >>> additional >>> flag "-o") together with the tool 'fastacmd' (which also included in >>> the >>> NCBI blast binaries) as a simple (and very fast) alternative for >>> fetching >>> sequences. >>> >>> >>> Regards, Hans >>> >>> >>> >>> >> >> >> > > -- Dr. Jos? Luis Lav?n Trueba Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN From maj at fortinbras.us Fri Nov 6 12:45:01 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 6 Nov 2009 07:45:01 -0500 Subject: [Bioperl-l] Bioperl In-Reply-To: <16842715.26316.1257510446095.JavaMail.root@durga.amrita.ac.in> References: <16842715.26316.1257510446095.JavaMail.root@durga.amrita.ac.in> Message-ID: Hi Resmi- You should look at http://bioperl.org/ under "Installation" for information on getting and installing BioPerl. An introduction to working with trees in BioPerl is at this link: http://www.bioperl.org/wiki/HOWTO:Trees cheers, Mark ----- Original Message ----- From: Resmi S. To: maj at fortinbras.us Sent: Friday, November 06, 2009 7:27 AM Subject: Bioperl Respected Sir, I am Resmi S studying II MSc Bioinformatics.Now am doing my project in Phylogenetic Tree Construction using BioPerl.I am not much familiar on BioPerl modules.So could please send me the names of the Bioperl modules needed for my project.I also need to know , from where i will get these modules.If that is from CPAN,then send me the location or link.I kindly request you to send me the details soon. Yours Sincerely, Resmi S, II MSc Bioinformatics, School of Biotechnology, Amrita Vishwa Vidyapeetham, Email : amm08bi019 at students.amrita.ac.in ------------------------------------------------------------------------------ ------------------------------------------------------------------- This mail has been scanned by Amrita GAV Server, Amrita Vishwa Vidyapeetham, Amritapuri Campus From robert.bradbury at gmail.com Fri Nov 6 17:35:22 2009 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Fri, 6 Nov 2009 12:35:22 -0500 Subject: [Bioperl-l] Function that determines serious mutations Message-ID: Is there a function in the library (or has someone written one) that can take a genbank entry and determine which mutations are harmful? It would be used to produce a table summary of: GENE # SNP # BadSNP One kind of gets this from NCBI if you lookup in the "GENE" db a gene name and then go to the "GeneView" om dbSNP page it has the information I want but largely in a graphical format while I simply want numbers I can dump into a spreadsheet. I don't think it would be hard, fetch the gene, run through the features for the SNP database, figure out whether they are good or bad SNPs, accumulate the statistics and dump it. I think the functions available are flexible enough to do it but I can't believe nobody has already done it. It could be a bit more complex in that one could do an analysis to see if the mutations are in a conserved domain or mutations that code for Cysteine or Methionine (or othe potentially "critical" amino acids) but since "critical" is in the eye of the beholder there would have to be some kind of callback to a scoring function. Thanks, Robert From nevoband at igb.uiuc.edu Fri Nov 6 20:58:05 2009 From: nevoband at igb.uiuc.edu (kleenix) Date: Fri, 6 Nov 2009 12:58:05 -0800 (PST) Subject: [Bioperl-l] StandAloneBlast Unallowed parameter Message-ID: <26230896.post@talk.nabble.com> I'm not sure if i'm doing this wrong. I am trying to use the -m parameter in blastall using the StandAloneBlast bioperl class. when i add 'm'=>0 to @params i get Unallowed parameter: error. Am I adding the parameter wrong? i'm using StandAloneBlast version 1.51 Thanks -Nevo -- View this message in context: http://old.nabble.com/StandAloneBlast-Unallowed-parameter-tp26230896p26230896.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From veronica.xiaoyu at gmail.com Fri Nov 6 22:25:04 2009 From: veronica.xiaoyu at gmail.com (Xiaoyu Liang) Date: Fri, 6 Nov 2009 17:25:04 -0500 Subject: [Bioperl-l] Parsing BLAST out file to HTML. How to change the description's name of each hit? Message-ID: Hi, I'm using Bio::SearchIO::Writer HTMLResultWriter help me parse BLAST out file into HTML. Anybody knows how to parse and change the description name of each hit? By using hit->description can call hits' description, but it is not allowed to be modified. Thank you very much, Xiaoyu From maj at fortinbras.us Sat Nov 7 00:40:17 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 6 Nov 2009 19:40:17 -0500 Subject: [Bioperl-l] Parsing BLAST out file to HTML. How to change thedescription's name of each hit? In-Reply-To: References: Message-ID: <11592B31D9924FA7A8638D90AE4A3F4A@NewLife> Xiaoyu- That method should work to change the description; are you doing $hit->description('This is my new description'); This method returns the old description when you change the value: $hit->description('old'); $str = $hit->description('new'); # $str eq 'old' $str = $hit->description; # $str eq 'new' MAJ ----- Original Message ----- From: "Xiaoyu Liang" To: Sent: Friday, November 06, 2009 5:25 PM Subject: [Bioperl-l] Parsing BLAST out file to HTML. How to change thedescription's name of each hit? > Hi, > > I'm using Bio::SearchIO::Writer HTMLResultWriter help me parse BLAST out > file into HTML. > > Anybody knows how to parse and change the description name of each hit? > > By using hit->description can call hits' description, but it is not allowed > to be modified. > > Thank you very much, > Xiaoyu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From Daniel.Lang at biologie.uni-freiburg.de Sun Nov 8 14:50:48 2009 From: Daniel.Lang at biologie.uni-freiburg.de (Daniel Lang) Date: Sun, 08 Nov 2009 15:50:48 +0100 Subject: [Bioperl-l] arguments to call back functions in GBrowse2 Message-ID: <4AF6DAC8.8070204@biologie.uni-freiburg.de> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi Lincoln, a while back (May 29, 2009; 09:08pm) you replied to an even older thread ("Re: Access the parent of a Bio::DB::SeqFeature within a gbrowse config callback function"). I missed your reply and did follow it up back then, sorry! I'm currently facing the same issue again with gbrowse2. I have a callback function for "balloon click". Following your last reply I expected 5 arguments, but I am getting only three: $feature,$panel,$track. In principle, I am using the latest releases/checkouts... Which modules do I need to look at/update for this functionality? Furthermore, is there a possibility to share global variables between gbrowse2 and slaves? Should this work via init_code? Should modules initialized in a conf be in the scope of a slave? If not can I introduce modules via the slave config files, or do I need to alter the slave scripts? Thanks, again! Cheers, Daniel PS: gbrowse2 rocks! -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkr22sUACgkQmJnbCpJAG3A2MgCdG61bNRGMFVWExagzMFejKMjO FiUAn16nQNemDGSy8nJBS5dUHQMnDgrP =ODxn -----END PGP SIGNATURE----- From maj at fortinbras.us Sun Nov 8 16:09:43 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sun, 8 Nov 2009 11:09:43 -0500 Subject: [Bioperl-l] GuessSeqFormat: fastq? Message-ID: Hi All- Any plans in the works for a _possibly_fastq sequence guesser? MAJ From maj at fortinbras.us Sun Nov 8 16:20:55 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sun, 8 Nov 2009 11:20:55 -0500 Subject: [Bioperl-l] GuessSeqFormat: fastq? In-Reply-To: References: Message-ID: Never mind; got it covered-- MAJ ----- Original Message ----- From: "Mark A. Jensen" To: "bioperl-l" Sent: Sunday, November 08, 2009 11:09 AM Subject: [Bioperl-l] GuessSeqFormat: fastq? > Hi All- > Any plans in the works for a _possibly_fastq sequence guesser? > MAJ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From saikari78 at gmail.com Mon Nov 9 15:47:10 2009 From: saikari78 at gmail.com (saikari keitele) Date: Mon, 9 Nov 2009 15:47:10 +0000 Subject: [Bioperl-l] Retrieving link to protein from PubChem Message-ID: Hi, I'm using Bioperl to retrieve records from PubChem. I'm trying to find a way-but have been unsuccessful- to retrieve from a compound record, the reference to the protein(s) that can synthesize the compound. Thanks very much. saikari From saikari78 at gmail.com Mon Nov 9 16:05:57 2009 From: saikari78 at gmail.com (saikari keitele) Date: Mon, 9 Nov 2009 16:05:57 +0000 Subject: [Bioperl-l] [bioperl newbie] Retrieving link to protein from PubChem Message-ID: Hi, I'm using Bioperl to retrieve records from PubChem. I'm trying to find a way-but have been unsuccessful- to retrieve from a compound record, the reference to the protein(s) that can synthesize the compound. Thanks very much. saikari From cjfields at illinois.edu Mon Nov 9 16:27:10 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 9 Nov 2009 10:27:10 -0600 Subject: [Bioperl-l] [bioperl newbie] Retrieving link to protein from PubChem In-Reply-To: References: Message-ID: <1ECC543A-F923-4D5E-A0C1-5BBD35ECAAE8@illinois.edu> On Nov 9, 2009, at 10:05 AM, saikari keitele wrote: > Hi, > > I'm using Bioperl to retrieve records from PubChem. > I'm trying to find a way-but have been unsuccessful- to retrieve > from a > compound record, the reference to the protein(s) that can synthesize > the > compound. > Thanks very much. > > saikari The below bioperl script returns the GI for proteins that correspond to the substance passed on the command line; invoke using 'perl pc_substance.pl substance_requested'. It probably needs more fiddling to catch everything but it should get you started. For other bits and pieces (such as how to retrieve the raw sequence files), please see the EUtilities HOWTO: http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook chris ---------------------------------------- #!/usr/bin/perl -w use 5.010; use strict; use warnings; use Bio::DB::EUtilities; my $substance = shift; my $eutil = Bio::DB::EUtilities->new(-eutil => 'esearch', -db => 'pcsubstance', -term => $substance, -usehistory => 'y'); my $hist = $eutil->next_History || die; $eutil->reset_parameters(-eutil => 'elink', -history => $hist, -db => 'protein', -dbfrom => 'pcsubstance', -retmax => 1000); say join(',',$eutil->get_ids); From saikari78 at gmail.com Mon Nov 9 16:41:20 2009 From: saikari78 at gmail.com (saikari keitele) Date: Mon, 9 Nov 2009 16:41:20 +0000 Subject: [Bioperl-l] [bioperl newbie] Retrieving link to protein from PubChem In-Reply-To: <1ECC543A-F923-4D5E-A0C1-5BBD35ECAAE8@illinois.edu> References: <1ECC543A-F923-4D5E-A0C1-5BBD35ECAAE8@illinois.edu> Message-ID: Fabulous!. Huge help. saikari On Mon, Nov 9, 2009 at 4:27 PM, Chris Fields wrote: > On Nov 9, 2009, at 10:05 AM, saikari keitele wrote: > > Hi, >> >> I'm using Bioperl to retrieve records from PubChem. >> I'm trying to find a way-but have been unsuccessful- to retrieve from a >> compound record, the reference to the protein(s) that can synthesize the >> compound. >> Thanks very much. >> >> saikari >> > > The below bioperl script returns the GI for proteins that correspond to the > substance passed on the command line; invoke using 'perl pc_substance.plsubstance_requested'. It probably needs more fiddling to catch everything > but it should get you started. > > For other bits and pieces (such as how to retrieve the raw sequence files), > please see the EUtilities HOWTO: > > http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook > > chris > > ---------------------------------------- > > #!/usr/bin/perl -w > > use 5.010; > use strict; > use warnings; > use Bio::DB::EUtilities; > > my $substance = shift; > > my $eutil = Bio::DB::EUtilities->new(-eutil => 'esearch', > -db => 'pcsubstance', > -term => $substance, > -usehistory => 'y'); > > my $hist = $eutil->next_History || die; > > $eutil->reset_parameters(-eutil => 'elink', > -history => $hist, > -db => 'protein', > -dbfrom => 'pcsubstance', > -retmax => 1000); > > say join(',',$eutil->get_ids); > From gc11song at gmail.com Mon Nov 9 18:08:48 2009 From: gc11song at gmail.com (Guangchun Song) Date: Mon, 9 Nov 2009 12:08:48 -0600 Subject: [Bioperl-l] how to get the protein sequences from DNA sequences around novel SNPs? Message-ID: <794eafc20911091008g1f98b944ncbd66ac4962a85a3@mail.gmail.com> Hello, I'm new bioperl user. I' working on a project: To determine the status of all tutative SNPs such as non-synonymous vs. synonymous, and predict the tranlational effect of non-synonymous mutations as benign or malicious. I'm trying to use bioperl to get the DNA sequence and translate to protein sequence for the SNPs that are in gene's coding region. Could someone tell me how to do it? Thanks, -Guangchun Song From robert.bradbury at gmail.com Mon Nov 9 21:15:33 2009 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Mon, 9 Nov 2009 16:15:33 -0500 Subject: [Bioperl-l] how to get the protein sequences from DNA sequences around novel SNPs? In-Reply-To: <794eafc20911091008g1f98b944ncbd66ac4962a85a3@mail.gmail.com> References: <794eafc20911091008g1f98b944ncbd66ac4962a85a3@mail.gmail.com> Message-ID: On Mon, Nov 9, 2009 at 1:08 PM, Guangchun Song wrote: > > I'm new bioperl user. I' working on a project: To determine the > status of all tutative SNPs such as non-synonymous vs. synonymous, and > predict the tranlational effect of non-synonymous mutations as benign > or malicious. I'm trying to use bioperl to get the DNA sequence and > translate to protein sequence for the SNPs that are in gene's coding > region. Could someone tell me how to do it? > > I too would like to know if this information is available. I've recently been working with the dbSNP results from NCBI but they display the results in a graphical format rather than data that one can play with and ask questions of like "What is the most disease causing gene in the Human Genome?" or "What are the critical proteins damaged by gene defects in the Human Genome?" ... "In terms of premature deaths, extended health care requirements, loss of quality of life, etc.?" The same types of questions can be applied to the dog and cat genomes where there is emotional value or the cow, horse, pig, etc. genomes where there is economic value? The value of BioPerl would increase significantly if there were functionality that would allow easy access to "these mutations may have negative/positive impact" (which means you need a function that qualifies mutations by degree) and allow for impact to be subjectively determined (implying there must be some callback function to provide a user quality/impact rating). For example: $/@differences = protein_compare($mygene, $refseq_gene, @critical_aa, @critical_domain, $callback) Where $callback could "rate" differences about the protein and position and the "type of interest" (e.g. metal binding amino acids, structural changing amino acids, critical catalysis amino acids, etc.). A default callback would be based on some evolving definition of "critical" changes which result in human disease for example. This is a "required" capability to be able to determine things like the "adaptability" of a species -- those with fewest critical mutation points may have better adaptability to mutation increasing circumstances. Please pardon any errors in perl syntax/usage its been a while since I've written perl and I'd really rather be coding in C. Robert From maj at fortinbras.us Mon Nov 9 21:56:24 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 9 Nov 2009 16:56:24 -0500 Subject: [Bioperl-l] how to get the protein sequences from DNA sequencesaround novel SNPs? In-Reply-To: References: <794eafc20911091008g1f98b944ncbd66ac4962a85a3@mail.gmail.com> Message-ID: <3ED3D387B5DE4248A218D42882369925@NewLife> I agree that BioPerl would significantly increase in value with such a module; in fact, the BioTeam would probably buy us out. My opinion is that the entire GWAS enterprise is the search for such a callback function, for humans anyway. For those engaged in this quest, if BioPerl doesn't provide a Maserati, it at least provides good italian-made (among others) parts. MAJ ----- Original Message ----- From: "Robert Bradbury" To: "Guangchun Song" Cc: Sent: Monday, November 09, 2009 4:15 PM Subject: Re: [Bioperl-l] how to get the protein sequences from DNA sequencesaround novel SNPs? > On Mon, Nov 9, 2009 at 1:08 PM, Guangchun Song wrote: >> >> I'm new bioperl user. I' working on a project: To determine the >> status of all tutative SNPs such as non-synonymous vs. synonymous, and >> predict the tranlational effect of non-synonymous mutations as benign >> or malicious. I'm trying to use bioperl to get the DNA sequence and >> translate to protein sequence for the SNPs that are in gene's coding >> region. Could someone tell me how to do it? >> >> > I too would like to know if this information is available. I've recently > been working with the dbSNP results from NCBI but they display the results > in a graphical format rather than data that one can play with and ask > questions of like "What is the most disease causing gene in the Human > Genome?" or "What are the critical proteins damaged by gene defects in the > Human Genome?" ... "In terms of premature deaths, extended health care > requirements, loss of quality of life, etc.?" > > The same types of questions can be applied to the dog and cat genomes where > there is emotional value or the cow, horse, pig, etc. genomes where there is > economic value? > > The value of BioPerl would increase significantly if there were > functionality that would allow easy access to "these mutations may have > negative/positive impact" (which means you need a function that qualifies > mutations by degree) and allow for impact to be subjectively determined > (implying there must be some callback function to provide a user > quality/impact rating). > > For example: > $/@differences = protein_compare($mygene, $refseq_gene, @critical_aa, > @critical_domain, $callback) > Where $callback could "rate" differences about the protein and position and > the "type of interest" (e.g. metal binding amino acids, structural changing > amino acids, critical catalysis amino acids, etc.). > > A default callback would be based on some evolving definition of "critical" > changes which result in human disease for example. > > This is a "required" capability to be able to determine things like the > "adaptability" of a species -- those with fewest critical mutation points > may have better adaptability to mutation increasing circumstances. > > Please pardon any errors in perl syntax/usage its been a while since I've > written perl and I'd really rather be coding in C. > > Robert > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From alexl at users.sourceforge.net Mon Nov 9 23:44:07 2009 From: alexl at users.sourceforge.net (Alex Lancaster) Date: Mon, 09 Nov 2009 18:44:07 -0500 Subject: [Bioperl-l] version of ExtUtils::Manifest too strict? In-Reply-To: <1D9E943F-2EDC-49AB-83DE-78DED5A8AC23@illinois.edu> (Chris Fields's message of "Wed, 4 Nov 2009 07:53:35 -0600") References: <1D9E943F-2EDC-49AB-83DE-78DED5A8AC23@illinois.edu> Message-ID: >>>>> Chris Fields writes: > Alex, Not sure why ExtUtils::Manifest can't be bundled as a separate > perl package alone. It is part of perl core but it's also available > on CPAN separately from perl itself: > http://search.cpan.org/~rkobes/ExtUtils-Manifest-1.57/lib/ExtUtils/Manifest.pm Hi Chris, Yes, in principle it would be possible to have this split out as a separate package (currently it's a "subpackage" under the main perl package), unfortunately that's just not the way it's currently done in Fedora (probably because it's part of the core set and they like to update all relevant packages in one step) and I have little control over that. As I suspected, the perl maintainer is not at all enthusiastic for updating the whole of perl just for that package (except for rawhide which would mean that bioperl 1.6.1 would not be available until F-13, about 6 months from now). See: http://bugzilla.redhat.com/show_bug.cgi?id=533562#c1 Obviously I am not happy with this situation either, because it will freeze bioperl on Fedora at 1.6.0 for about 6 months, so can you recommend any temporary workarounds in the meantime? > This is the commit message for that BTW. This allows spaces in file > names for the MANIFEST. v1.52 is a bug fix and is required. > http://code.open-bio.org/svnweb/index.cgi/bioperl/revision/?rev=15673 Perhaps I could create a patch that renamed files with spaces in them to ones with no spaces and then rename them again upon installation. Can you point me to which files are the problematic ones that triggered the dependency for 1.52? Perhaps I can figure a workaround. Meanwhile I will press the maintainer of perl in Fedora to perhaps reconsider his position (e.g. if another update for perl is going out for another reason, like a security update, perhaps he could roll in the 1.52 update at the same time). Cheers, Alex From cjfields at illinois.edu Tue Nov 10 00:50:00 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 9 Nov 2009 18:50:00 -0600 Subject: [Bioperl-l] version of ExtUtils::Manifest too strict? In-Reply-To: References: <1D9E943F-2EDC-49AB-83DE-78DED5A8AC23@illinois.edu> Message-ID: <29EA2398-F60B-48F2-AFE7-39A44011C451@illinois.edu> On Nov 9, 2009, at 5:44 PM, Alex Lancaster wrote: >>>>>> Chris Fields writes: > >> Alex, Not sure why ExtUtils::Manifest can't be bundled as a separate >> perl package alone. It is part of perl core but it's also available >> on CPAN separately from perl itself: > >> http://search.cpan.org/~rkobes/ExtUtils-Manifest-1.57/lib/ExtUtils/Manifest.pm > > Hi Chris, > > Yes, in principle it would be possible to have this split out as a > separate package (currently it's a "subpackage" under the main perl > package), unfortunately that's just not the way it's currently done in > Fedora (probably because it's part of the core set and they like to > update all relevant packages in one step) and I have little control > over > that. > > As I suspected, the perl maintainer is not at all enthusiastic for > updating the whole of perl just for that package (except for rawhide > which would mean that bioperl 1.6.1 would not be available until F-13, > about 6 months from now). See: > > http://bugzilla.redhat.com/show_bug.cgi?id=533562#c1 > > Obviously I am not happy with this situation either, because it will > freeze bioperl on Fedora at 1.6.0 for about 6 months, so can you > recommend any temporary workarounds in the meantime? Well, if you don't absolutely require the MANIFEST for the final package you can forego the requirement. The file in question that triggered the requirement is a data file used only for testing: t/data/test 2.txt >> This is the commit message for that BTW. This allows spaces in file >> names for the MANIFEST. v1.52 is a bug fix and is required. > >> http://code.open-bio.org/svnweb/index.cgi/bioperl/revision/?rev=15673 > > Perhaps I could create a patch that renamed files with spaces in > them to > ones with no spaces and then rename them again upon installation. > > Can you point me to which files are the problematic ones that > triggered > the dependency for 1.52? Perhaps I can figure a workaround. > > Meanwhile I will press the maintainer of perl in Fedora to perhaps > reconsider his position (e.g. if another update for perl is going out > for another reason, like a security update, perhaps he could roll in > the > 1.52 update at the same time). > > Cheers, > Alex I would point out that this is a fairly significant bug fix for ExtUtils::Manifest. A newer point release of perl is now available (5.10.1) that contains the fix and has a fix for a performance regression that popped up in 5.10.0. chris From jay at jays.net Tue Nov 10 00:05:51 2009 From: jay at jays.net (Jay Hannah) Date: Mon, 9 Nov 2009 18:05:51 -0600 Subject: [Bioperl-l] Bio::Index::GenBank - by organism? Message-ID: <3B01A09C-198E-4691-B807-7ED3250BB81A@jays.net> Many thanks to Ewan Birney et. al. for Bio::Index::* I can throw away my awful grep based index-by-accession stuff. :) Any chance someone has also written an organism based index mechanism? Something like... while (my $seq = $inx?>get_Seq_by_organism('*Xanthomonas*')) { print $seq->display_id . "\n"; } Thanks, j From cjfields at illinois.edu Tue Nov 10 03:55:01 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 9 Nov 2009 21:55:01 -0600 Subject: [Bioperl-l] Bio::Index::GenBank - by organism? In-Reply-To: <3B01A09C-198E-4691-B807-7ED3250BB81A@jays.net> References: <3B01A09C-198E-4691-B807-7ED3250BB81A@jays.net> Message-ID: <12DFD22E-42DC-4626-9873-0DE3EBB5CFBD@illinois.edu> On Nov 9, 2009, at 6:05 PM, Jay Hannah wrote: > Many thanks to Ewan Birney et. al. for Bio::Index::* > > I can throw away my awful grep based index-by-accession stuff. :) > > Any chance someone has also written an organism based index > mechanism? Something like... > > while (my $seq = $inx?>get_Seq_by_organism('*Xanthomonas*')) { > print $seq->display_id . "\n"; > } > > Thanks, > > j It should work via id_parser(); from Bio::Index::GenBank: $inx->id_parser(\&get_id); # make the index $inx->make_index($file_name); # here is where the retrieval key is specified sub get_id { my $line = shift; $line =~ /clone="(\S+)"/; $1; } Change the code ref deal with the line you want and parse the name out. Caveat: this may not be absolutely perfect (it only passes in a line at a time, and some species lines will wrap). Also not sure how this would work in cases where multiple sequences from the same species are present. The other option is to preparse everything and tie a hash to store a species->UID map, then use that along with your Bio::Index index to grab what you need. chris From cjfields at illinois.edu Tue Nov 10 04:58:32 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 9 Nov 2009 22:58:32 -0600 Subject: [Bioperl-l] how to get the protein sequences from DNA sequences around novel SNPs? In-Reply-To: References: <794eafc20911091008g1f98b944ncbd66ac4962a85a3@mail.gmail.com> Message-ID: <435BA1A8-2CCB-4D7A-8909-84F8135C439F@illinois.edu> On Nov 9, 2009, at 3:15 PM, Robert Bradbury wrote: > On Mon, Nov 9, 2009 at 1:08 PM, Guangchun Song > wrote: >> >> I'm new bioperl user. I' working on a project: To determine the >> status of all tutative SNPs such as non-synonymous vs. synonymous, >> and >> predict the tranlational effect of non-synonymous mutations as benign >> or malicious. I'm trying to use bioperl to get the DNA sequence and >> translate to protein sequence for the SNPs that are in gene's coding >> region. Could someone tell me how to do it? >> >> > I too would like to know if this information is available. I've > recently > been working with the dbSNP results from NCBI but they display the > results > in a graphical format rather than data that one can play with and ask > questions of like "What is the most disease causing gene in the Human > Genome?" or "What are the critical proteins damaged by gene defects > in the > Human Genome?" ... "In terms of premature deaths, extended health care > requirements, loss of quality of life, etc.?" > > The same types of questions can be applied to the dog and cat > genomes where > there is emotional value or the cow, horse, pig, etc. genomes where > there is > economic value? > > The value of BioPerl would increase significantly if there were > functionality that would allow easy access to "these mutations may > have > negative/positive impact" (which means you need a function that > qualifies > mutations by degree) and allow for impact to be subjectively > determined > (implying there must be some callback function to provide a user > quality/impact rating). > > For example: > $/@differences = protein_compare($mygene, $refseq_gene, > @critical_aa, > @critical_domain, $callback) > Where $callback could "rate" differences about the protein and > position and > the "type of interest" (e.g. metal binding amino acids, structural > changing > amino acids, critical catalysis amino acids, etc.). > > A default callback would be based on some evolving definition of > "critical" > changes which result in human disease for example. > > This is a "required" capability to be able to determine things like > the > "adaptability" of a species -- those with fewest critical mutation > points > may have better adaptability to mutation increasing circumstances. > > Please pardon any errors in perl syntax/usage its been a while since > I've > written perl and I'd really rather be coding in C. > > Robert I will say that most of the information from the SNP database is available in various formats (see following link under 'Retrieval Types'): http://www.ncbi.nlm.nih.gov/corehtml/query/static/efetchseq_help.html You can access this information, as well as the full XML, using something like the following script. chris ------------------------------------------------ #!/usr/bin/perl -w use 5.010; use strict; use warnings; use Bio::DB::EUtilities; my $term = shift; my $eutil = Bio::DB::EUtilities->new(-eutil => 'esearch', -db => 'snp', -term => $term, -usehistory => 'y', -retmax => 100); my $hist = $eutil->next_History || die "No history returned"; # for SNP XML, change retmode to 'xml' $eutil->set_parameters(-eutil => 'efetch', -history => $hist, -retmode => 'text', -rettype => 'flt'); # dumps to STDOUT say $eutil->get_Response->content; From jluis.lavin at unavarra.es Tue Nov 10 10:43:40 2009 From: jluis.lavin at unavarra.es (jluis.lavin at unavarra.es) Date: Tue, 10 Nov 2009 11:43:40 +0100 (CET) Subject: [Bioperl-l] [Fwd: Re: A question about iBio::Index: and itscorrect use] In-Reply-To: References: <2642.130.206.164.153.1257435996.squirrel@webmail.unavarra.es><1 984ED07F36C446284B25F617964B6C6@NewLife><2969.130.206.164.153.1257438117.squirrel@webmail.unavarra.es> Message-ID: <3471.130.206.164.153.1257849820.squirrel@webmail.unavarra.es> Hello again, I tried what Mark told me modifying the code line he told me but there?s still a problem that I believe must be due to the sequences name. My secuences header on the Fasta file have this format: >PleosPC9_1_103820|fgenesh1_pg.3_#_1 Th part on the right of the pipe changes depending on the program used to create the gene model, for example: >PleosPC9_1_103820|fgenesh1_pg.3_#_1 >PleosPC9_1_123413|genemark.2731_g >PleosPC9_1_52065|e_gw1.3.64.1 So I guess I need to parse my ids somehow for thr program to detect only the first part of the fasta header (the "protein name") and not to get messed with the other side of the pipe... This is the corrected code I wrote following Mark?s indications, but I still don?t have any idea about the parsing issue... #!/c:/Perl -w use Bio::Index::Fasta; use strict; #PC9.fasta is my genomic file my $Index_File_Name ="PC9.fasta"; my $inx = Bio::Index::Fasta->new('PC9.fasta.idx'); #LCS.txt is my sequences list @ARGV = ; foreach my $id (@ARGV) { if ($id eq ''){ die ("empty list") } else { my $seqobj = $inx->fetch($id); my $out = new Bio::SeqIO (-file => ">>index_extracted.fasta", -format => 'fasta'); $out->write_seq($seqobj); } } exit; } Thanks in advance PD. May it be a faster way of extracting those sequences using plain PERL? El Jue, 5 de Noviembre de 2009, 17:39, Mark A. Jensen escribi?: > Yes, these are files created by the SDBM, Perl's internal db manager. You > should > be able to > open the index by simply > $inx = Bio::Index::Fasta->new('PC9.fasta.idx'); > and the dbm will know what to do-- > cheers MAJ > ----- Original Message ----- > From: > To: "Mark A. Jensen" > Cc: ; > Sent: Thursday, November 05, 2009 11:21 AM > Subject: Re: [Bioperl-l] [Fwd: Re: A question about iBio::Index: and its > correct > use] > > >> Thank you very much Mark, that?s a good point :$ >> I guess your correction is referred to the second script, isn?t it? >> >> If it is so, there is still a problem with the first script, it doesn?t >> create the PC9.fasta.idx file, instead it creates two files named: >> -PC9.fasta.idx.pag >> -PC9.fasta.idx.dir >> >> which seem to be clearly related with some kind of indexing >> process...but, >> unless the PC9.fasta.idx file is only virtual or remains hidden, I can?t >> find it anywhere... >> Forgive me if I?m talking nosense... >> >> Thank you very much again for your help ;) >> >> >> El Jue, 5 de Noviembre de 2009, 17:02, Mark A. Jensen escribi?: >>> Hey Jos?, >>> The first thing that jumps out it the index file name. Looks >>> like you create it as >>> PC9.fasta.idx >>> But you read it as >>> PC9.fasta >>> Not an unusual mistake. Do >>> my $inx = Bio::Index::Fasta->new('PC9.fasta.idx'); >>> and see if it works. >>> MAJ >>> ----- Original Message ----- >>> From: >>> To: >>> Sent: Thursday, November 05, 2009 10:46 AM >>> Subject: [Bioperl-l] [Fwd: Re: A question about iBio::Index: and its >>> correct >>> use] >>> >>> >>> >>> >>> ---------------------------- Mensaje original >>> ---------------------------- >>> Subject: Re: [Bioperl-l] A question about iBio::Index: and its correct >>> use >>> From: jluis.lavin at unavarra.es >>> Fecha: Jue, 5 de Noviembre de 2009, 16:46 >>> To: "Mark A. Jensen" >>> -------------------------------------------------------------------------- >>> >>> Hi Mark, >>> >>> I?ve actually got two scripts, the first one is to create the index and >>> the second one is to retrieve the sequence lis from the indexed file. >>> >>> 1)Here is the Index creation script: >>> >>> #!/c:/Perl -w >>> use strict; >>> use Bio::Index::Fasta; >>> use strict; >>> >>> print "Enter file for indexing: \n"; >>> my $Index_File_Name = ; >>> my $inx = Bio::Index::Fasta->new(-filename => $Index_File_Name.".idx", >>> -write_flag => 1); >>> $inx->make_index(my $File_Name); >>> >>> 2)And here is the sequence retrieval script: >>> >>> #!/c:/Perl -w >>> use Bio::Index::Fasta; >>> use strict; >>> #PC9.fasta is my genomic file >>> my $Index_File_Name ="PC9.fasta"; >>> my $inx = Bio::Index::Fasta->new($Index_File_Name); >>> #LCS.txt is my sequences list >>> @ARGV = ; >>> foreach my $id (@ARGV) { >>> if ($id eq ''){ >>> die ("empty list") >>> } >>> else { >>> my $seqobj = $inx->fetch($id); >>> my $out = new Bio::SeqIO (-file => ">>extracted_seqs.fasta", >>> -format => 'fasta'); >>> $out->write_seq($seqobj); >>> } >>> } >>> exit; >>> } >>> >>> I hope this code is not a total scum... >>> >>> Thanks in advance ;) >>> >>> >>> >>> El Jue, 5 de Noviembre de 2009, 16:39, Mark A. Jensen escribi?: >>>> Jos? -- It looks like this is a good solution to your problem. Please >>>> send >>>> you >>>> script so we can look at it- >>>> cheers Mark >>>> ----- Original Message ----- >>>> From: >>>> To: >>>> Sent: Thursday, November 05, 2009 10:28 AM >>>> Subject: [Bioperl-l] A question about iBio::Index: and its correct use >>>> >>>> >>>> >>>> Hello to all, >>>> >>>> I?m trying to write a script to retrieve a list of sequences from a >>>> local >>>> FASTA file (for example a fasta archive where all the protein models >>>> of >>>> an >>>> organism are stored). This file would be used by me as some kind >>>> "local >>>> database" (sorry if I mistake a few concepts...) >>>> I?ve been reading the BioPerl HOWTOs and I came across the >>>> Bio::Index::Fasta tool. >>>> If I didn?t misunderstood what I read (which can be easy because my >>>> low >>>> level on programming) this Indexing tool should do the job. >>>> I wrote a couple of scripts based on the documentation i read about >>>> this >>>> tool, but I don?t seem to be able to create the index file to be used >>>> later (to retrieve the sequences from). >>>> -First of all, I want to ask the people in this forum if the >>>> Bio::Index::Fasta is the right one to chose for this tasks. >>>> -Then I?ll beg you to take a look at my scripts, because I don?t seem >>>> to >>>> catch the bug... >>>> >>>> Best wishes to you all and thanks in advance ;) >>>> >>>> -- >>>> Jos? Luis Lav?n Trueba, PhD >>>> >>>> Dpto. de Producci?n Agraria >>>> Grupo de Gen?tica y Microbiolog?a >>>> Universidad P?blica de Navarra >>>> 31006 Pamplona >>>> Navarra >>>> SPAIN >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> >>> >>> >>> -- >>> Dr. Jos? Luis Lav?n Trueba >>> >>> Dpto. de Producci?n Agraria >>> Grupo de Gen?tica y Microbiolog?a >>> Universidad P?blica de Navarra >>> 31006 Pamplona >>> Navarra >>> SPAIN >>> >>> >>> >>> -- >>> Dr. Jos? Luis Lav?n Trueba >>> >>> Dpto. de Producci?n Agraria >>> Grupo de Gen?tica y Microbiolog?a >>> Universidad P?blica de Navarra >>> 31006 Pamplona >>> Navarra >>> SPAIN >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >> >> >> -- >> Dr. Jos? Luis Lav?n Trueba >> >> Dpto. de Producci?n Agraria >> Grupo de Gen?tica y Microbiolog?a >> Universidad P?blica de Navarra >> 31006 Pamplona >> Navarra >> SPAIN >> >> >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Dr. Jos? Luis Lav?n Trueba Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN From saikari78 at gmail.com Tue Nov 10 11:41:11 2009 From: saikari78 at gmail.com (saikari keitele) Date: Tue, 10 Nov 2009 11:41:11 +0000 Subject: [Bioperl-l] [bioperl newbie] Retrieving link to protein from PubChem In-Reply-To: References: <1ECC543A-F923-4D5E-A0C1-5BBD35ECAAE8@illinois.edu> Message-ID: Thanks again very much for your help and the script. i've been trying it, however I fail to find any protein record linked to a record in the pcsubstance database. Do you think that its is because no links have been defined between the 2 databases, or that I am just unlucky and that no link exists for the particular records I'm testing? Thanks again saikari On Mon, Nov 9, 2009 at 4:41 PM, saikari keitele wrote: > Fabulous!. Huge help. > saikari > > On Mon, Nov 9, 2009 at 4:27 PM, Chris Fields wrote: > >> On Nov 9, 2009, at 10:05 AM, saikari keitele wrote: >> >> Hi, >>> >>> I'm using Bioperl to retrieve records from PubChem. >>> I'm trying to find a way-but have been unsuccessful- to retrieve from a >>> compound record, the reference to the protein(s) that can synthesize the >>> compound. >>> Thanks very much. >>> >>> saikari >>> >> >> The below bioperl script returns the GI for proteins that correspond to >> the substance passed on the command line; invoke using 'perl >> pc_substance.pl substance_requested'. It probably needs more fiddling to >> catch everything but it should get you started. >> >> For other bits and pieces (such as how to retrieve the raw sequence >> files), please see the EUtilities HOWTO: >> >> http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook >> >> chris >> >> ---------------------------------------- >> >> #!/usr/bin/perl -w >> >> use 5.010; >> use strict; >> use warnings; >> use Bio::DB::EUtilities; >> >> my $substance = shift; >> >> my $eutil = Bio::DB::EUtilities->new(-eutil => 'esearch', >> -db => 'pcsubstance', >> -term => $substance, >> -usehistory => 'y'); >> >> my $hist = $eutil->next_History || die; >> >> $eutil->reset_parameters(-eutil => 'elink', >> -history => $hist, >> -db => 'protein', >> -dbfrom => 'pcsubstance', >> -retmax => 1000); >> >> say join(',',$eutil->get_ids); >> > > From heyne at informatik.uni-freiburg.de Tue Nov 10 12:55:06 2009 From: heyne at informatik.uni-freiburg.de (Steffen Heyne) Date: Tue, 10 Nov 2009 13:55:06 +0100 Subject: [Bioperl-l] problem with alignments and sequence locations Message-ID: <4AF962AA.7060908@informatik.uni-freiburg.de> Hi, I'm using Bioperl for my research and it is very useful! Thank you! Currently I have a problem with locations tags of sequences. I read in seed alignments of Rfam (in stockholm format, but I think it is similar to other formats). If the location is like: AB194432.1/908-846 the start/end values are changed to $seq->start = 846 $seq->end = 908 and therefore the new location (e.g.$seq->get_nse) is: AB194432.1/846-908 The $seq->strand tag is correctly set to -1 in this case, but if the alignment is written out again (clustal, stockholm,...) this strand info is lost and the sequences have this "wrong" location. But this information is important in respect to the sequence accession number. Is there a way to set the location back to the original one or is this behavior desired? Any manually setting with $seq->start($val) failed due to automatic checking. I'm using bioperl 1.6.1 Thanks! steffen -- --- Steffen Heyne, Dipl.-Bioinf. Lehrstuhl f?r Bioinformatik Institut f?r Informatik Albert-Ludwigs-Universit?t Freiburg Georges-K?hler-Allee 106 79110 Freiburg, Germany Tel: (+49) 761 203 8239 Fax: (+49) 761 203 7462 Mail: heyne at informatik.uni-freiburg.de From cjfields at illinois.edu Tue Nov 10 13:58:52 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 10 Nov 2009 07:58:52 -0600 Subject: [Bioperl-l] problem with alignments and sequence locations In-Reply-To: <4AF962AA.7060908@informatik.uni-freiburg.de> References: <4AF962AA.7060908@informatik.uni-freiburg.de> Message-ID: On Nov 10, 2009, at 6:55 AM, Steffen Heyne wrote: > Hi, > > I'm using Bioperl for my research and it is very useful! Thank you! > > Currently I have a problem with locations tags of sequences. I read > in seed alignments of Rfam (in stockholm format, but I think it is > similar to other formats). > > If the location is like: > > AB194432.1/908-846 > > the start/end values are changed to > > $seq->start = 846 > $seq->end = 908 > > and therefore the new location (e.g.$seq->get_nse) is: > > AB194432.1/846-908 > > The $seq->strand tag is correctly set to -1 in this case, but if the > alignment is written out again (clustal, stockholm,...) this strand > info is lost and the sequences have this "wrong" location. But this > information is important in respect to the sequence accession number. > > Is there a way to set the location back to the original one or is > this behavior desired? Any manually setting with $seq->start($val) > failed due to automatic checking. > > I'm using bioperl 1.6.1 > > Thanks! > > steffen This is a definite bug. We recently discussed amending the NSE format due to this (the subject came up over the last few months or so); it's fallen through the cracks. Fortunaely it is very easy to fix (the relevant method is in LocatableSeq). Does anyone have a problem with me adding this in? It will change output for only those instances where the strand is -1, so AB194432.1/908-846 would be start = 846, end = 908, strand = -1 AB194432.1/846-908 would be start = 846, end = 908, strand = 1 chris From cjfields at illinois.edu Tue Nov 10 14:05:51 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 10 Nov 2009 08:05:51 -0600 Subject: [Bioperl-l] [bioperl newbie] Retrieving link to protein from PubChem In-Reply-To: References: <1ECC543A-F923-4D5E-A0C1-5BBD35ECAAE8@illinois.edu> Message-ID: <738F6320-B87A-4541-B9FA-20273ABA96B9@illinois.edu> On Nov 10, 2009, at 5:41 AM, saikari keitele wrote: > Thanks again very much for your help and the script. > i've been trying it, however I fail to find any protein record > linked to a > record in the pcsubstance database. > Do you think that its is because no links have been defined between > the 2 > databases, or that I am just unlucky and that no link exists for the > particular records I'm testing? > Thanks again > > saikari It's probably that no links have been defined. I have found similar problems in the past with pubchem, in that not all substances have proteins associated with them. Most proteins linked to are those with a deposited structure. There are a few other databases to check out; KEGG, the BioCyc dbs (like EcoCyc), come to mind. I don't think we have a generic remote query engine set up for any of those unfortunately (unless there is one I'm unaware of), but I know BioCyc comes with it's own set of tools (including perl- and java-based query tools) and can be set up locally, which is likely much faster and more in lines with what you need. chris ... From vebaev at gmail.com Tue Nov 10 17:38:54 2009 From: vebaev at gmail.com (Vesselin Baev) Date: Tue, 10 Nov 2009 09:38:54 -0800 (PST) Subject: [Bioperl-l] Invitation to connect on LinkedIn Message-ID: <1983273212.597925.1257874734811.JavaMail.app@ech3-cdn07.prod> LinkedIn ------------ Vesselin Baev requested to add you as a connection on LinkedIn: ------------------------------------------ Bolotin,, I'd like to add you to my professional network on LinkedIn. - Vesselin Accept invitation from Vesselin Baev http://www.linkedin.com/e/uez6TYkHzbaXxXM-lUk23auFwJZodcPlXc2UWC0Ao8h/blk/I1572789477_2/pmpxnSRJrSdvj4R5fnhv9ClRsDgZp6lQs6lzoQ5AomZIpn8_cBYTdPgVe3sOdPkNiiZFlAN1oPlOp2YMdPsTcz8OdjwLrCBxbOYWrSlI/EML_comm_afe/ View invitation from Vesselin Baev http://www.linkedin.com/e/uez6TYkHzbaXxXM-lUk23auFwJZodcPlXc2UWC0Ao8h/blk/I1572789477_2/39vdPsQejwTczsRckALqnpPbOYWrSlI/svi/ ------------------------------------------ DID YOU KNOW your LinkedIn profile helps you control your public image when people search for you? Setting your profile as public means your LinkedIn profile will come up when people enter your name in leading search engines. Take control of your image! http://www.linkedin.com/e/ewp/inv-22/ ------ (c) 2009, LinkedIn Corporation From jason at bioperl.org Tue Nov 10 18:47:02 2009 From: jason at bioperl.org (Jason Stajich) Date: Tue, 10 Nov 2009 10:47:02 -0800 Subject: [Bioperl-l] [Fwd: Re: A question about iBio::Index: and itscorrect use] In-Reply-To: <3471.130.206.164.153.1257849820.squirrel@webmail.unavarra.es> References: <2642.130.206.164.153.1257435996.squirrel@webmail.unavarra.es><1 984ED07F36C446284B25F617964B6C6@NewLife><2969.130.206.164.153.1257438117.squirrel@webmail.unavarra.es> <3471.130.206.164.153.1257849820.squirrel@webmail.unavarra.es> Message-ID: Page 44 has the custom ID info or look at documentation for Bio::DB::Fasta - there is a similar syntax for Bio::Index::Fasta if you read the perldoc for the module. http://jason.open-bio.org/Bioperl_Tutorials/ProgrammingBiology2008/ProgBiology_BioPerl_I.pdf Don't re-opening SeqIO each time just do it once at the beginning outside of the loop and then call write_seq within the loop. This is one nuance of doing OO programming vs procedural is that there is some outside state information that can persist in an object, but conceptually, you want to open a filehandle once and just keep writing to it. -jason On Nov 10, 2009, at 2:43 AM, jluis.lavin at unavarra.es wrote: > Hello again, > > I tried what Mark told me modifying the code line he told me but > there?s > still a problem that I believe must be due to the sequences name. > My secuences header on the Fasta file have this format: > >> PleosPC9_1_103820|fgenesh1_pg.3_#_1 > > Th part on the right of the pipe changes depending on the program > used to > create the gene model, for example: > >> PleosPC9_1_103820|fgenesh1_pg.3_#_1 >> PleosPC9_1_123413|genemark.2731_g >> PleosPC9_1_52065|e_gw1.3.64.1 > > So I guess I need to parse my ids somehow for thr program to detect > only > the first part of the fasta header (the "protein name") and not to get > messed with the other side of the pipe... > > This is the corrected code I wrote following Mark?s indications, but I > still don?t have any idea about the parsing issue... > > #!/c:/Perl -w > use Bio::Index::Fasta; > use strict; > #PC9.fasta is my genomic file > my $Index_File_Name ="PC9.fasta"; > my $inx = Bio::Index::Fasta->new('PC9.fasta.idx'); > #LCS.txt is my sequences list > @ARGV = ; > foreach my $id (@ARGV) { > if ($id eq ''){ > die ("empty list") > } > else { > my $seqobj = $inx->fetch($id); > my $out = new Bio::SeqIO (-file => ">>index_extracted.fasta", > -format => 'fasta'); > $out->write_seq($seqobj); > } > } > exit; > } > > Thanks in advance > > PD. May it be a faster way of extracting those sequences using plain > PERL? > > > > > El Jue, 5 de Noviembre de 2009, 17:39, Mark A. Jensen escribi?: >> Yes, these are files created by the SDBM, Perl's internal db >> manager. You >> should >> be able to >> open the index by simply >> $inx = Bio::Index::Fasta->new('PC9.fasta.idx'); >> and the dbm will know what to do-- >> cheers MAJ >> ----- Original Message ----- >> From: >> To: "Mark A. Jensen" >> Cc: ; >> Sent: Thursday, November 05, 2009 11:21 AM >> Subject: Re: [Bioperl-l] [Fwd: Re: A question about iBio::Index: >> and its >> correct >> use] >> >> >>> Thank you very much Mark, that?s a good point :$ >>> I guess your correction is referred to the second script, isn?t it? >>> >>> If it is so, there is still a problem with the first script, it >>> doesn?t >>> create the PC9.fasta.idx file, instead it creates two files named: >>> -PC9.fasta.idx.pag >>> -PC9.fasta.idx.dir >>> >>> which seem to be clearly related with some kind of indexing >>> process...but, >>> unless the PC9.fasta.idx file is only virtual or remains hidden, I >>> can?t >>> find it anywhere... >>> Forgive me if I?m talking nosense... >>> >>> Thank you very much again for your help ;) >>> >>> >>> El Jue, 5 de Noviembre de 2009, 17:02, Mark A. Jensen escribi?: >>>> Hey Jos?, >>>> The first thing that jumps out it the index file name. Looks >>>> like you create it as >>>> PC9.fasta.idx >>>> But you read it as >>>> PC9.fasta >>>> Not an unusual mistake. Do >>>> my $inx = Bio::Index::Fasta->new('PC9.fasta.idx'); >>>> and see if it works. >>>> MAJ >>>> ----- Original Message ----- >>>> From: >>>> To: >>>> Sent: Thursday, November 05, 2009 10:46 AM >>>> Subject: [Bioperl-l] [Fwd: Re: A question about iBio::Index: and >>>> its >>>> correct >>>> use] >>>> >>>> >>>> >>>> >>>> ---------------------------- Mensaje original >>>> ---------------------------- >>>> Subject: Re: [Bioperl-l] A question about iBio::Index: and its >>>> correct >>>> use >>>> From: jluis.lavin at unavarra.es >>>> Fecha: Jue, 5 de Noviembre de 2009, 16:46 >>>> To: "Mark A. Jensen" >>>> -------------------------------------------------------------------------- >>>> >>>> Hi Mark, >>>> >>>> I?ve actually got two scripts, the first one is to create the >>>> index and >>>> the second one is to retrieve the sequence lis from the indexed >>>> file. >>>> >>>> 1)Here is the Index creation script: >>>> >>>> #!/c:/Perl -w >>>> use strict; >>>> use Bio::Index::Fasta; >>>> use strict; >>>> >>>> print "Enter file for indexing: \n"; >>>> my $Index_File_Name = ; >>>> my $inx = Bio::Index::Fasta->new(-filename => >>>> $Index_File_Name.".idx", >>>> -write_flag => 1); >>>> $inx->make_index(my $File_Name); >>>> >>>> 2)And here is the sequence retrieval script: >>>> >>>> #!/c:/Perl -w >>>> use Bio::Index::Fasta; >>>> use strict; >>>> #PC9.fasta is my genomic file >>>> my $Index_File_Name ="PC9.fasta"; >>>> my $inx = Bio::Index::Fasta->new($Index_File_Name); >>>> #LCS.txt is my sequences list >>>> @ARGV = ; >>>> foreach my $id (@ARGV) { >>>> if ($id eq ''){ >>>> die ("empty list") >>>> } >>>> else { >>>> my $seqobj = $inx->fetch($id); >>>> my $out = new Bio::SeqIO (-file => ">>extracted_seqs.fasta", >>>> -format => 'fasta'); >>>> $out->write_seq($seqobj); >>>> } >>>> } >>>> exit; >>>> } >>>> >>>> I hope this code is not a total scum... >>>> >>>> Thanks in advance ;) >>>> >>>> >>>> >>>> El Jue, 5 de Noviembre de 2009, 16:39, Mark A. Jensen escribi?: >>>>> Jos? -- It looks like this is a good solution to your problem. >>>>> Please >>>>> send >>>>> you >>>>> script so we can look at it- >>>>> cheers Mark >>>>> ----- Original Message ----- >>>>> From: >>>>> To: >>>>> Sent: Thursday, November 05, 2009 10:28 AM >>>>> Subject: [Bioperl-l] A question about iBio::Index: and its >>>>> correct use >>>>> >>>>> >>>>> >>>>> Hello to all, >>>>> >>>>> I?m trying to write a script to retrieve a list of sequences >>>>> from a >>>>> local >>>>> FASTA file (for example a fasta archive where all the protein >>>>> models >>>>> of >>>>> an >>>>> organism are stored). This file would be used by me as some kind >>>>> "local >>>>> database" (sorry if I mistake a few concepts...) >>>>> I?ve been reading the BioPerl HOWTOs and I came across the >>>>> Bio::Index::Fasta tool. >>>>> If I didn?t misunderstood what I read (which can be easy because >>>>> my >>>>> low >>>>> level on programming) this Indexing tool should do the job. >>>>> I wrote a couple of scripts based on the documentation i read >>>>> about >>>>> this >>>>> tool, but I don?t seem to be able to create the index file to be >>>>> used >>>>> later (to retrieve the sequences from). >>>>> -First of all, I want to ask the people in this forum if the >>>>> Bio::Index::Fasta is the right one to chose for this tasks. >>>>> -Then I?ll beg you to take a look at my scripts, because I don?t >>>>> seem >>>>> to >>>>> catch the bug... >>>>> >>>>> Best wishes to you all and thanks in advance ;) >>>>> >>>>> -- >>>>> Jos? Luis Lav?n Trueba, PhD >>>>> >>>>> Dpto. de Producci?n Agraria >>>>> Grupo de Gen?tica y Microbiolog?a >>>>> Universidad P?blica de Navarra >>>>> 31006 Pamplona >>>>> Navarra >>>>> SPAIN >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Dr. Jos? Luis Lav?n Trueba >>>> >>>> Dpto. de Producci?n Agraria >>>> Grupo de Gen?tica y Microbiolog?a >>>> Universidad P?blica de Navarra >>>> 31006 Pamplona >>>> Navarra >>>> SPAIN >>>> >>>> >>>> >>>> -- >>>> Dr. Jos? Luis Lav?n Trueba >>>> >>>> Dpto. de Producci?n Agraria >>>> Grupo de Gen?tica y Microbiolog?a >>>> Universidad P?blica de Navarra >>>> 31006 Pamplona >>>> Navarra >>>> SPAIN >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> >>> >>> >>> -- >>> Dr. Jos? Luis Lav?n Trueba >>> >>> Dpto. de Producci?n Agraria >>> Grupo de Gen?tica y Microbiolog?a >>> Universidad P?blica de Navarra >>> 31006 Pamplona >>> Navarra >>> SPAIN >>> >>> >>> >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > -- > Dr. Jos? Luis Lav?n Trueba > > Dpto. de Producci?n Agraria > Grupo de Gen?tica y Microbiolog?a > Universidad P?blica de Navarra > 31006 Pamplona > Navarra > SPAIN > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From jason at bioperl.org Tue Nov 10 18:50:00 2009 From: jason at bioperl.org (Jason Stajich) Date: Tue, 10 Nov 2009 10:50:00 -0800 Subject: [Bioperl-l] Bio::Index::GenBank - by organism? In-Reply-To: <12DFD22E-42DC-4626-9873-0DE3EBB5CFBD@illinois.edu> References: <3B01A09C-198E-4691-B807-7ED3250BB81A@jays.net> <12DFD22E-42DC-4626-9873-0DE3EBB5CFBD@illinois.edu> Message-ID: <2BA451B1-6E18-483E-B655-74D1146772CC@bioperl.org> You might also look at what mygenbank does: http://homepage.mac.com/iankorf/mygenbank.html On Nov 9, 2009, at 7:55 PM, Chris Fields wrote: > On Nov 9, 2009, at 6:05 PM, Jay Hannah wrote: > >> Many thanks to Ewan Birney et. al. for Bio::Index::* >> >> I can throw away my awful grep based index-by-accession stuff. :) >> >> Any chance someone has also written an organism based index >> mechanism? Something like... >> >> while (my $seq = $inx?>get_Seq_by_organism('*Xanthomonas*')) { >> print $seq->display_id . "\n"; >> } >> >> Thanks, >> >> j > > It should work via id_parser(); from Bio::Index::GenBank: > > $inx->id_parser(\&get_id); > # make the index > $inx->make_index($file_name); > > # here is where the retrieval key is specified > sub get_id { > my $line = shift; > $line =~ /clone="(\S+)"/; > $1; > } > > Change the code ref deal with the line you want and parse the name > out. Caveat: this may not be absolutely perfect (it only passes in > a line at a time, and some species lines will wrap). Also not sure > how this would work in cases where multiple sequences from the same > species are present. > > The other option is to preparse everything and tie a hash to store a > species->UID map, then use that along with your Bio::Index index to > grab what you need. > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From jluis.lavin at unavarra.es Wed Nov 11 15:01:18 2009 From: jluis.lavin at unavarra.es (jluis.lavin at unavarra.es) Date: Wed, 11 Nov 2009 16:01:18 +0100 (CET) Subject: [Bioperl-l] [Fwd: Re: A question about iBio::Index: anditscorrect use] In-Reply-To: References: <2642.130.206.164.153.1257435996.squirrel@webmail.unavarra.es><1 984ED07F36C446284B25F617964B6C6@NewLife><2969.130.206.164.153.1257438117.sq uirrel@webmail.unavarra.es><3471. 130.206.164.153.1257849820.squirrel@webmail.unavarra.es> Message-ID: <2979.130.206.164.153.1257951678.squirrel@webmail.unavarra.es> Hi once again, I have modified the script following the instructions Jason gave me (at last what I understood, remember it is my first time trying to learn a programming language...and I?m not the smartest guy in the class, hehe)but it seems I didn?t fix the problem... Here?s the new code I wrote: #!/c:/Perl -w use strict; use Bio::Index::Fasta; use Bio::DB::Fasta; use Bio::SeqIO; use IO::File; # assign files to scalars my $index_file = 'PC91.fasta'; my $id_list = 'LCS2.txt'; # open index file my $db = Bio::DB::Fasta->new($index_file) or die; # open the id list my $in = IO::File->new($id_list) or die; # open FASTA to write my $out = new Bio::SeqIO (-file => ">>index_extracted.fasta", -format => 'fasta'); # retrieve ids loop foreach my $id ($in) { if ($id eq ''){ die ("empty list") } else { my $seqobj = my $inx->fetch($id); $out->write_seq($seqobj); } } # parse fasta headers sub my_makeid { my $id = shift; if ( $id =~ /^>[^:]+:(\S+)/ ) { return $1; } elsif ($id =~ /^>(\S+)/) { return $1; } else { warn("cannot parse ID for $id\n"); } } exit; Would anyone, please take a look at it ... Thanks in advance ;) El Mar, 10 de Noviembre de 2009, 19:47, Jason Stajich escribi?: > Page 44 has the custom ID info or look at documentation for > Bio::DB::Fasta - there is a similar syntax for Bio::Index::Fasta if > you read the perldoc for the module. > > http://jason.open-bio.org/Bioperl_Tutorials/ProgrammingBiology2008/ProgBiology_BioPerl_I.pdf > > Don't re-opening SeqIO each time just do it once at the beginning > outside of the loop and then call write_seq within the loop. > > This is one nuance of doing OO programming vs procedural is that there > is some outside state information that can persist in an object, but > conceptually, you want to open a filehandle once and just keep writing > to it. > > -jason > On Nov 10, 2009, at 2:43 AM, jluis.lavin at unavarra.es wrote: > >> Hello again, >> >> I tried what Mark told me modifying the code line he told me but >> there?s >> still a problem that I believe must be due to the sequences name. >> My secuences header on the Fasta file have this format: >> >>> PleosPC9_1_103820|fgenesh1_pg.3_#_1 >> >> Th part on the right of the pipe changes depending on the program >> used to >> create the gene model, for example: >> >>> PleosPC9_1_103820|fgenesh1_pg.3_#_1 >>> PleosPC9_1_123413|genemark.2731_g >>> PleosPC9_1_52065|e_gw1.3.64.1 >> >> So I guess I need to parse my ids somehow for thr program to detect >> only >> the first part of the fasta header (the "protein name") and not to get >> messed with the other side of the pipe... >> >> This is the corrected code I wrote following Mark?s indications, but I >> still don?t have any idea about the parsing issue... >> >> #!/c:/Perl -w >> use Bio::Index::Fasta; >> use strict; >> #PC9.fasta is my genomic file >> my $Index_File_Name ="PC9.fasta"; >> my $inx = Bio::Index::Fasta->new('PC9.fasta.idx'); >> #LCS.txt is my sequences list >> @ARGV = ; >> foreach my $id (@ARGV) { >> if ($id eq ''){ >> die ("empty list") >> } >> else { >> my $seqobj = $inx->fetch($id); >> my $out = new Bio::SeqIO (-file => ">>index_extracted.fasta", >> -format => 'fasta'); >> $out->write_seq($seqobj); >> } >> } >> exit; >> } >> >> Thanks in advance >> >> PD. May it be a faster way of extracting those sequences using plain >> PERL? >> >> >> >> >> El Jue, 5 de Noviembre de 2009, 17:39, Mark A. Jensen escribi?: >>> Yes, these are files created by the SDBM, Perl's internal db >>> manager. You >>> should >>> be able to >>> open the index by simply >>> $inx = Bio::Index::Fasta->new('PC9.fasta.idx'); >>> and the dbm will know what to do-- >>> cheers MAJ >>> ----- Original Message ----- >>> From: >>> To: "Mark A. Jensen" >>> Cc: ; >>> Sent: Thursday, November 05, 2009 11:21 AM >>> Subject: Re: [Bioperl-l] [Fwd: Re: A question about iBio::Index: >>> and its >>> correct >>> use] >>> >>> >>>> Thank you very much Mark, that?s a good point :$ >>>> I guess your correction is referred to the second script, isn?t it? >>>> >>>> If it is so, there is still a problem with the first script, it >>>> doesn?t >>>> create the PC9.fasta.idx file, instead it creates two files named: >>>> -PC9.fasta.idx.pag >>>> -PC9.fasta.idx.dir >>>> >>>> which seem to be clearly related with some kind of indexing >>>> process...but, >>>> unless the PC9.fasta.idx file is only virtual or remains hidden, I >>>> can?t >>>> find it anywhere... >>>> Forgive me if I?m talking nosense... >>>> >>>> Thank you very much again for your help ;) >>>> >>>> >>>> El Jue, 5 de Noviembre de 2009, 17:02, Mark A. Jensen escribi?: >>>>> Hey Jos?, >>>>> The first thing that jumps out it the index file name. Looks >>>>> like you create it as >>>>> PC9.fasta.idx >>>>> But you read it as >>>>> PC9.fasta >>>>> Not an unusual mistake. Do >>>>> my $inx = Bio::Index::Fasta->new('PC9.fasta.idx'); >>>>> and see if it works. >>>>> MAJ >>>>> ----- Original Message ----- >>>>> From: >>>>> To: >>>>> Sent: Thursday, November 05, 2009 10:46 AM >>>>> Subject: [Bioperl-l] [Fwd: Re: A question about iBio::Index: and >>>>> its >>>>> correct >>>>> use] >>>>> >>>>> >>>>> >>>>> >>>>> ---------------------------- Mensaje original >>>>> ---------------------------- >>>>> Subject: Re: [Bioperl-l] A question about iBio::Index: and its >>>>> correct >>>>> use >>>>> From: jluis.lavin at unavarra.es >>>>> Fecha: Jue, 5 de Noviembre de 2009, 16:46 >>>>> To: "Mark A. Jensen" >>>>> -------------------------------------------------------------------------- >>>>> >>>>> Hi Mark, >>>>> >>>>> I?ve actually got two scripts, the first one is to create the >>>>> index and >>>>> the second one is to retrieve the sequence lis from the indexed >>>>> file. >>>>> >>>>> 1)Here is the Index creation script: >>>>> >>>>> #!/c:/Perl -w >>>>> use strict; >>>>> use Bio::Index::Fasta; >>>>> use strict; >>>>> >>>>> print "Enter file for indexing: \n"; >>>>> my $Index_File_Name = ; >>>>> my $inx = Bio::Index::Fasta->new(-filename => >>>>> $Index_File_Name.".idx", >>>>> -write_flag => 1); >>>>> $inx->make_index(my $File_Name); >>>>> >>>>> 2)And here is the sequence retrieval script: >>>>> >>>>> #!/c:/Perl -w >>>>> use Bio::Index::Fasta; >>>>> use strict; >>>>> #PC9.fasta is my genomic file >>>>> my $Index_File_Name ="PC9.fasta"; >>>>> my $inx = Bio::Index::Fasta->new($Index_File_Name); >>>>> #LCS.txt is my sequences list >>>>> @ARGV = ; >>>>> foreach my $id (@ARGV) { >>>>> if ($id eq ''){ >>>>> die ("empty list") >>>>> } >>>>> else { >>>>> my $seqobj = $inx->fetch($id); >>>>> my $out = new Bio::SeqIO (-file => ">>extracted_seqs.fasta", >>>>> -format => 'fasta'); >>>>> $out->write_seq($seqobj); >>>>> } >>>>> } >>>>> exit; >>>>> } >>>>> >>>>> I hope this code is not a total scum... >>>>> >>>>> Thanks in advance ;) >>>>> >>>>> >>>>> >>>>> El Jue, 5 de Noviembre de 2009, 16:39, Mark A. Jensen escribi?: >>>>>> Jos? -- It looks like this is a good solution to your problem. >>>>>> Please >>>>>> send >>>>>> you >>>>>> script so we can look at it- >>>>>> cheers Mark >>>>>> ----- Original Message ----- >>>>>> From: >>>>>> To: >>>>>> Sent: Thursday, November 05, 2009 10:28 AM >>>>>> Subject: [Bioperl-l] A question about iBio::Index: and its >>>>>> correct use >>>>>> >>>>>> >>>>>> >>>>>> Hello to all, >>>>>> >>>>>> I?m trying to write a script to retrieve a list of sequences >>>>>> from a >>>>>> local >>>>>> FASTA file (for example a fasta archive where all the protein >>>>>> models >>>>>> of >>>>>> an >>>>>> organism are stored). This file would be used by me as some kind >>>>>> "local >>>>>> database" (sorry if I mistake a few concepts...) >>>>>> I?ve been reading the BioPerl HOWTOs and I came across the >>>>>> Bio::Index::Fasta tool. >>>>>> If I didn?t misunderstood what I read (which can be easy because >>>>>> my >>>>>> low >>>>>> level on programming) this Indexing tool should do the job. >>>>>> I wrote a couple of scripts based on the documentation i read >>>>>> about >>>>>> this >>>>>> tool, but I don?t seem to be able to create the index file to be >>>>>> used >>>>>> later (to retrieve the sequences from). >>>>>> -First of all, I want to ask the people in this forum if the >>>>>> Bio::Index::Fasta is the right one to chose for this tasks. >>>>>> -Then I?ll beg you to take a look at my scripts, because I don?t >>>>>> seem >>>>>> to >>>>>> catch the bug... >>>>>> >>>>>> Best wishes to you all and thanks in advance ;) >>>>>> >>>>>> -- >>>>>> Jos? Luis Lav?n Trueba, PhD >>>>>> >>>>>> Dpto. de Producci?n Agraria >>>>>> Grupo de Gen?tica y Microbiolog?a >>>>>> Universidad P?blica de Navarra >>>>>> 31006 Pamplona >>>>>> Navarra >>>>>> SPAIN >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Dr. Jos? Luis Lav?n Trueba >>>>> >>>>> Dpto. de Producci?n Agraria >>>>> Grupo de Gen?tica y Microbiolog?a >>>>> Universidad P?blica de Navarra >>>>> 31006 Pamplona >>>>> Navarra >>>>> SPAIN >>>>> >>>>> >>>>> >>>>> -- >>>>> Dr. Jos? Luis Lav?n Trueba >>>>> >>>>> Dpto. de Producci?n Agraria >>>>> Grupo de Gen?tica y Microbiolog?a >>>>> Universidad P?blica de Navarra >>>>> 31006 Pamplona >>>>> Navarra >>>>> SPAIN >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Dr. Jos? Luis Lav?n Trueba >>>> >>>> Dpto. de Producci?n Agraria >>>> Grupo de Gen?tica y Microbiolog?a >>>> Universidad P?blica de Navarra >>>> 31006 Pamplona >>>> Navarra >>>> SPAIN >>>> >>>> >>>> >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> >> -- >> Dr. Jos? Luis Lav?n Trueba >> >> Dpto. de Producci?n Agraria >> Grupo de Gen?tica y Microbiolog?a >> Universidad P?blica de Navarra >> 31006 Pamplona >> Navarra >> SPAIN >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Dr. Jos? Luis Lav?n Trueba Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN From maj at fortinbras.us Wed Nov 11 23:48:33 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 11 Nov 2009 18:48:33 -0500 Subject: [Bioperl-l] Maq assembly wrapper ready for beta testing Message-ID: <4057E5A862B845EA8BB153888075590C@NewLife> Hi All- New modules are available in the core and in bioperl-run for working with Heng Li's short read assembler "maq" (http://maq.sourceforge.net/maq-man.shtml). Bio::Tools::Run::Maq allows a quick assembly call with a canned a maq pipeline, and also allows individual maq commands to be called separately. It uses Bio::Assembly::IO::maq (a read-only module) to deliver a Bio::Assembly::Scaffold from maq output. If you're interested, see http://www.bioperl.org/wiki/HOWTO:Short-read_assemblies_with_maq and update your core and bioperl-run. The code inherits from Florent's excellent new Bio::Tools::Run::AssemblerBase -- kudos to him!! tests are in bioperl-run/trunk/t/Maq.t, see them for myriad examples send me the bugs MAJ From clarsen at vecna.com Thu Nov 12 17:22:26 2009 From: clarsen at vecna.com (Chris Larsen) Date: Thu, 12 Nov 2009 12:22:26 -0500 Subject: [Bioperl-l] Polyproteins, ribo slippage, and mat_peptide in viruses? In-Reply-To: <320fb6e00910271029m26f07564l727fb78adae81c11@mail.gmail.com> References: <320fb6e00910271029m26f07564l727fb78adae81c11@mail.gmail.com> Message-ID: <7BBAE077-4D76-46C2-BF66-363F5A017278@vecna.com> All, This is a short followup on the prior thread of discussion, regarding computing mature peptide sequences for viruses. The topic has gone underwater for the time being as we solve some problems with source data. While the biopython effort and contributors on this board have given good guidance, and we now have scripts that function (thanks mostly to pcock), however, the source data on which everything relies is suspect: mat_peptide 15118..16914 <=== /product="nsp13" /note="helicase" I can tell you the virus community does not want to rely heavily, on those position numbers. Furthermore we have found fewer compete source genomes for viruses than bacteria, more virus-to-virus variation in the data fields annotated in the GBK file, (Gene, CDS, ORF, Protein, Polyprotein, mat_peptide, db_xref) and in fact the community will have to come together significantly on how these molecules are defined in public repositories, before a mature scripting effort becomes reliable, public and well received. Because of the variation in viruses, it's not even clear at this point what a 'gene' is. I will let you know how we proceed when more sequence data has been fully analyzed, and we can think about making any perl based solution a new viral protein module. Thanks, Chris -- Christopher Larsen, Ph.D. Sr. Scientist / Grants Manager Vecna Technologies 6404 Ivy Lane #500 Greenbelt, MD 20770 Phone: (240) 965-4525 Fax: (240) 547-6133 240-737-4525 From David.Messina at sbc.su.se Thu Nov 12 19:20:54 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 12 Nov 2009 20:20:54 +0100 Subject: [Bioperl-l] highest PAML version supported? Message-ID: <628aabb70911121120w4c609056v50204b9bd9e5c3fb@mail.gmail.com> Hi everyone, What is the latest version of PAML (specifically codeml) that I can use with bioperl-live and bioperl-run? I looked around and couldn't find where (or if) this is documented. With PAML version 4.3a against the current trunk of both -live and -run I see this: ------------- EXCEPTION Bio::Root::NotImplemented ------------- MSG: Unknown format of PAML output did not see seqtype STACK Bio::Tools::Phylo::PAML::_parse_summary /Users/dave/src/bioperl-live/Bio/Tools/Phylo/PAML.pm:461 STACK Bio::Tools::Phylo::PAML::next_result /Users/dave/src/bioperl-live/Bio/Tools/Phylo/PAML.pm:270 STACK toplevel ../bin/cluster_kaks:251 --------------------------------------------------------------- ...which I suspect (but haven't confirmed) is due to a change in the file format. Dave From jason at bioperl.org Thu Nov 12 19:29:22 2009 From: jason at bioperl.org (Jason Stajich) Date: Thu, 12 Nov 2009 11:29:22 -0800 Subject: [Bioperl-l] highest PAML version supported? In-Reply-To: <628aabb70911121120w4c609056v50204b9bd9e5c3fb@mail.gmail.com> References: <628aabb70911121120w4c609056v50204b9bd9e5c3fb@mail.gmail.com> Message-ID: prolly 3.15 or so. it really needs a maintainer!!! On Nov 12, 2009, at 11:20 AM, Dave Messina wrote: > Hi everyone, > > What is the latest version of PAML (specifically codeml) that I can > use with > bioperl-live and bioperl-run? > > I looked around and couldn't find where (or if) this is documented. > > > With PAML version 4.3a against the current trunk of both -live and - > run I > see this: > ------------- EXCEPTION Bio::Root::NotImplemented ------------- > MSG: Unknown format of PAML output did not see seqtype > STACK Bio::Tools::Phylo::PAML::_parse_summary > /Users/dave/src/bioperl-live/Bio/Tools/Phylo/PAML.pm:461 > STACK Bio::Tools::Phylo::PAML::next_result > /Users/dave/src/bioperl-live/Bio/Tools/Phylo/PAML.pm:270 > STACK toplevel ../bin/cluster_kaks:251 > --------------------------------------------------------------- > > ...which I suspect (but haven't confirmed) is due to a change in the > file > format. > > > Dave > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From scott at scottcain.net Fri Nov 13 14:48:43 2009 From: scott at scottcain.net (Scott Cain) Date: Fri, 13 Nov 2009 09:48:43 -0500 Subject: [Bioperl-l] January GMOD meeting announcement Message-ID: <4536f7700911130648j40eb2d82g2594adaccf476d73@mail.gmail.com> Hello, I am pleased to announce that the January GMOD meeting will be taking place on January 14 and 15 in San Diego at the Best Western Seven Seas (the same location as last year). Please see this page for registration information: http://gmod.org/wiki/January_2010_GMOD_Meeting When you go to that page, please take a moment to add suggestions for the agenda. There is no registration fee for this meeting, however there is limited space, so please register early. The proprietors of the Best Western have given us an excellent room rate, and extended it to the previous week, so that people attending the GMOD meeting and the Plant and Animal Genome meeting before it may stay at the Best Western the entire time. Please direct follow up questions to the gmod-devel mailing list: https://lists.sourceforge.net/lists/listinfo/gmod-devel Thanks and I look forward to seeing you in San Diego! Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From j.inoue at ucl.ac.uk Sat Nov 14 19:20:29 2009 From: j.inoue at ucl.ac.uk (Jun Inoue) Date: Sat, 14 Nov 2009 19:20:29 +0000 Subject: [Bioperl-l] Bio::TreeIO, Root-tip branch lengths Message-ID: Dear All, I just started to learn BioPerl for phylogenetics. Usually I am using perl v5.10.0 on my Mac OS 10.5.8. I would like to ask you a hint to calculate the Branch lengths from root to tip for all species in NEWICK TREE format. Please see the following web site. I am explaining what I want to do and showing my easy script (not completed). http://www.geocities.jp/ancientfishtree/BioPerl_BLRootTip.html Thank you for your help. Best, Jun Inoue http://www.geocities.jp/ancientfishtree/index_eng.html From maj at fortinbras.us Sat Nov 14 21:47:37 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sat, 14 Nov 2009 16:47:37 -0500 Subject: [Bioperl-l] Bio::TreeIO, Root-tip branch lengths In-Reply-To: References: Message-ID: <3BC179984D5E49868C4F12D181D82B8D@NewLife> Hi Jun, Some hints: incorporate @leaves = $tree->get_leaf_nodes; and use Bio::Tree::TreeFunctionsI; $distance = $tree->distance( $node_a, $node_b ); cheers, Mark ----- Original Message ----- From: "Jun Inoue" To: Cc: "?? ?" Sent: Saturday, November 14, 2009 2:20 PM Subject: [Bioperl-l] Bio::TreeIO, Root-tip branch lengths > Dear All, > > I just started to learn BioPerl for phylogenetics. > Usually I am using perl v5.10.0 on my Mac OS 10.5.8. > I would like to ask you a hint to calculate the Branch lengths > from root to tip for all species in NEWICK TREE format. > > Please see the following web site. > I am explaining what I want to do and > showing my easy script (not completed). > http://www.geocities.jp/ancientfishtree/BioPerl_BLRootTip.html > > Thank you for your help. > > Best, > Jun Inoue > http://www.geocities.jp/ancientfishtree/index_eng.html > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From jay at jays.net Mon Nov 16 01:23:38 2009 From: jay at jays.net (Jay Hannah) Date: Sun, 15 Nov 2009 19:23:38 -0600 Subject: [Bioperl-l] Bio::Index::GenBank - by organism? In-Reply-To: <12DFD22E-42DC-4626-9873-0DE3EBB5CFBD@illinois.edu> References: <3B01A09C-198E-4691-B807-7ED3250BB81A@jays.net> <12DFD22E-42DC-4626-9873-0DE3EBB5CFBD@illinois.edu> Message-ID: On Nov 9, 2009, at 9:55 PM, Chris Fields wrote: > It should work via id_parser(); from Bio::Index::GenBank: > > $inx->id_parser(\&get_id); > # make the index > $inx->make_index($file_name); > > # here is where the retrieval key is specified > sub get_id { > my $line = shift; > $line =~ /clone="(\S+)"/; > $1; > } This worked great for me today (tackling a different problem than the original). Thanks!! j From veronica.xiaoyu at gmail.com Fri Nov 13 20:35:48 2009 From: veronica.xiaoyu at gmail.com (Xiaoyu Liang) Date: Fri, 13 Nov 2009 15:35:48 -0500 Subject: [Bioperl-l] Bio::Graphics::Panel question Message-ID: Hi, I'm using Bio::Graphics to parse the blast result and generate images. But, sometimes, in the middle of the output image, the hit's color is white, eventhough I set it to other colors. I attached the picture here for an example. This doesn't occur all the time, usually, it works well. I'm wondering if I did something wrong? or depends on the blast result? Thank you, Xiaoyu -------------- next part -------------- A non-text attachment was scrubbed... Name: BLAST_problem.jpg Type: image/jpeg Size: 51888 bytes Desc: not available URL: From ryan_bogard at hms.harvard.edu Mon Nov 16 03:30:22 2009 From: ryan_bogard at hms.harvard.edu (rbogard) Date: Sun, 15 Nov 2009 19:30:22 -0800 (PST) Subject: [Bioperl-l] Problems with bioperl in Mac OS X 10.6 (Perl 5.10.0) Message-ID: <26366421.post@talk.nabble.com> In advance, any advice would be grealy appreciated! I have installed bioperl-588pm via fink but I am having difficulties calling the modules in script. The following is added to .profile (bash): PERL5LIB=/sw/lib/perl5/5.8.8:$PERL5LIB If I change this to /sw/lib/perl5 then I get an @INC error, as use Bio::PERL cannot be located. The environment variables are as follows: MANPATH=/sw/share/man:/usr/share/man:/usr/X11/man:/sw/lib/perl5/5.10.0/man:/usr/X11R6/man:/sw/lib/perl5-core/5.8.8/man:/sw/lib/perl5/5.8.8/man PERL5LIB=/sw/lib/perl5/5.8.8:/sw/lib/perl5:/sw/lib/perl5/darwin:/sw/lib/perl5/5.8.8 PATH=/sw/bin:/sw/sbin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin:/usr/X11R6/bin INFOPATH=/sw/share/info:/sw/info:/usr/share/info This is the perl script I'm attempting to run: #!/sw/bin/perl5.8.8 use strict; use Bio::Perl; $seq_object = get_sequence('swiss',"ROA1_HUMAN"); write_sequence(">roa1.fasta",'fasta',$seq_object); Here is the error output: dyld: lazy symbol binding failed: Symbol not found: _Perl_Tstack_sp_ptr Referenced from: /sw/lib/perl5/5.8.8/darwin-thread-multi-2level/auto/IO/IO.bundle Expected in: dynamic lookup dyld: Symbol not found: _Perl_Tstack_sp_ptr Referenced from: /sw/lib/perl5/5.8.8/darwin-thread-multi-2level/auto/IO/IO.bundle Expected in: dynamic lookup Trace/BPT trap I have looked through many forum postings and attempted the solutions offered in those instances, but none seem to work in my case. I'm not sure if it's because I have perl 5.10.0 installed while attempting to call bioperl 5.8.8; however, others seem to have it working just fine. Thank you, Ryan -- View this message in context: http://old.nabble.com/Problems-with-bioperl-in-Mac-OS-X-10.6-%28Perl-5.10.0%29-tp26366421p26366421.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From e.osimo at gmail.com Mon Nov 16 07:04:40 2009 From: e.osimo at gmail.com (Emanuele Osimo) Date: Mon, 16 Nov 2009 08:04:40 +0100 Subject: [Bioperl-l] Problems with bioperl in Mac OS X 10.6 (Perl 5.10.0) In-Reply-To: <26366421.post@talk.nabble.com> References: <26366421.post@talk.nabble.com> Message-ID: <2ac05d0f0911152304v58985cb5x6ea0501bff7a41ab@mail.gmail.com> Hello Ryan, unfortunately, if you upgraded to 10.6 without formatting, I have to tell you that you'll be in big trouble with perl and with everything you installed from the commandline... Because in the upgrade process everything in the system folders, perl and bioperl being some of these things, is erased without being uninstalled, so you'll find a lot of folders with the same name but no contents. I suggest you, as I did, to format your pc and reinstall 10.6 from scratch. Then youl'll be able to install mysql (I had to install mysql-5.4.3-beta-osx10.5, the only to work on 10.6), and, working with perl 5.10 that is already installed, you'll install bioperl with no effort. Bye Emanuele On Mon, Nov 16, 2009 at 04:30, rbogard wrote: > > In advance, any advice would be grealy appreciated! I have installed > bioperl-588pm via fink but I am having difficulties calling the modules in > script. The following is added to .profile (bash): > PERL5LIB=/sw/lib/perl5/5.8.8:$PERL5LIB > > If I change this to /sw/lib/perl5 then I get an @INC error, as use > Bio::PERL > cannot be located. > > The environment variables are as follows: > > > MANPATH=/sw/share/man:/usr/share/man:/usr/X11/man:/sw/lib/perl5/5.10.0/man:/usr/X11R6/man:/sw/lib/perl5-core/5.8.8/man:/sw/lib/perl5/5.8.8/man > > PERL5LIB=/sw/lib/perl5/5.8.8:/sw/lib/perl5:/sw/lib/perl5/darwin:/sw/lib/perl5/5.8.8 > > PATH=/sw/bin:/sw/sbin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin:/usr/X11R6/bin > INFOPATH=/sw/share/info:/sw/info:/usr/share/info > > > This is the perl script I'm attempting to run: > #!/sw/bin/perl5.8.8 > use strict; > use Bio::Perl; > $seq_object = get_sequence('swiss',"ROA1_HUMAN"); > write_sequence(">roa1.fasta",'fasta',$seq_object); > > Here is the error output: > > dyld: lazy symbol binding failed: Symbol not found: _Perl_Tstack_sp_ptr > Referenced from: > /sw/lib/perl5/5.8.8/darwin-thread-multi-2level/auto/IO/IO.bundle > Expected in: dynamic lookup > > dyld: Symbol not found: _Perl_Tstack_sp_ptr > Referenced from: > /sw/lib/perl5/5.8.8/darwin-thread-multi-2level/auto/IO/IO.bundle > Expected in: dynamic lookup > > Trace/BPT trap > > I have looked through many forum postings and attempted the solutions > offered in those instances, but none seem to work in my case. I'm not sure > if it's because I have perl 5.10.0 installed while attempting to call > bioperl 5.8.8; however, others seem to have it working just fine. > > Thank you, Ryan > -- > View this message in context: > http://old.nabble.com/Problems-with-bioperl-in-Mac-OS-X-10.6-%28Perl-5.10.0%29-tp26366421p26366421.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From ryan_bogard at hms.harvard.edu Mon Nov 16 13:43:19 2009 From: ryan_bogard at hms.harvard.edu (rbogard) Date: Mon, 16 Nov 2009 05:43:19 -0800 (PST) Subject: [Bioperl-l] Problems with bioperl in Mac OS X 10.6 (Perl 5.10.0) In-Reply-To: <2ac05d0f0911152304v58985cb5x6ea0501bff7a41ab@mail.gmail.com> References: <26366421.post@talk.nabble.com> <2ac05d0f0911152304v58985cb5x6ea0501bff7a41ab@mail.gmail.com> Message-ID: <26372079.post@talk.nabble.com> The Mac OS X 10.6 was a fresh install on a new Mac Book Pro. Not sure if I will have the same issues, but it's worth a shot as I have little on my computer and reinstalling to start over wouldn't be too difficult. What method did you use to install bioperl? I used fink and I am not sure the available stable version is the one I need. I will install from the command line this time around, and let you know how it turns out. Thank you! Emanuele Osimo wrote: > > Hello Ryan, > unfortunately, if you upgraded to 10.6 without formatting, I have to tell > you that you'll be in big trouble with perl and with everything you > installed from the commandline... Because in the upgrade process > everything > in the system folders, perl and bioperl being some of these things, is > erased without being uninstalled, so you'll find a lot of folders with the > same name but no contents. > I suggest you, as I did, to format your pc and reinstall 10.6 from > scratch. > Then youl'll be able to install mysql (I had to install > mysql-5.4.3-beta-osx10.5, the only to work on 10.6), and, working with > perl > 5.10 that is already installed, you'll install bioperl with no effort. > Bye > Emanuele > > On Mon, Nov 16, 2009 at 04:30, rbogard > wrote: > >> >> In advance, any advice would be grealy appreciated! I have installed >> bioperl-588pm via fink but I am having difficulties calling the modules >> in >> script. The following is added to .profile (bash): >> PERL5LIB=/sw/lib/perl5/5.8.8:$PERL5LIB >> >> If I change this to /sw/lib/perl5 then I get an @INC error, as use >> Bio::PERL >> cannot be located. >> >> The environment variables are as follows: >> >> >> MANPATH=/sw/share/man:/usr/share/man:/usr/X11/man:/sw/lib/perl5/5.10.0/man:/usr/X11R6/man:/sw/lib/perl5-core/5.8.8/man:/sw/lib/perl5/5.8.8/man >> >> PERL5LIB=/sw/lib/perl5/5.8.8:/sw/lib/perl5:/sw/lib/perl5/darwin:/sw/lib/perl5/5.8.8 >> >> PATH=/sw/bin:/sw/sbin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin:/usr/X11R6/bin >> INFOPATH=/sw/share/info:/sw/info:/usr/share/info >> >> >> This is the perl script I'm attempting to run: >> #!/sw/bin/perl5.8.8 >> use strict; >> use Bio::Perl; >> $seq_object = get_sequence('swiss',"ROA1_HUMAN"); >> write_sequence(">roa1.fasta",'fasta',$seq_object); >> >> Here is the error output: >> >> dyld: lazy symbol binding failed: Symbol not found: _Perl_Tstack_sp_ptr >> Referenced from: >> /sw/lib/perl5/5.8.8/darwin-thread-multi-2level/auto/IO/IO.bundle >> Expected in: dynamic lookup >> >> dyld: Symbol not found: _Perl_Tstack_sp_ptr >> Referenced from: >> /sw/lib/perl5/5.8.8/darwin-thread-multi-2level/auto/IO/IO.bundle >> Expected in: dynamic lookup >> >> Trace/BPT trap >> >> I have looked through many forum postings and attempted the solutions >> offered in those instances, but none seem to work in my case. I'm not >> sure >> if it's because I have perl 5.10.0 installed while attempting to call >> bioperl 5.8.8; however, others seem to have it working just fine. >> >> Thank you, Ryan >> -- >> View this message in context: >> http://old.nabble.com/Problems-with-bioperl-in-Mac-OS-X-10.6-%28Perl-5.10.0%29-tp26366421p26366421.html >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://old.nabble.com/Problems-with-bioperl-in-Mac-OS-X-10.6-%28Perl-5.10.0%29-tp26366421p26372079.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From maj at fortinbras.us Mon Nov 16 13:48:17 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 16 Nov 2009 08:48:17 -0500 Subject: [Bioperl-l] Problems with bioperl in Mac OS X 10.6 (Perl 5.10.0) In-Reply-To: <26372079.post@talk.nabble.com> References: <26366421.post@talk.nabble.com><2ac05d0f0911152304v58985cb5x6ea0501bff7a41ab@mail.gmail.com> <26372079.post@talk.nabble.com> Message-ID: <8D822081B13F49C2A37677D3A47F38B4@NewLife> Ryan, I'm not a mac person, but Koen has said (see http://www.bioperl.org/wiki/Getting_BioPerl#Mac_OS_X_using_fink ) to use the unstable tree to get BioPerl 1.6.1, which is likely to be what you want. cheers Mark ----- Original Message ----- From: "rbogard" To: Sent: Monday, November 16, 2009 8:43 AM Subject: Re: [Bioperl-l] Problems with bioperl in Mac OS X 10.6 (Perl 5.10.0) > > The Mac OS X 10.6 was a fresh install on a new Mac Book Pro. Not sure if I > will have the same issues, but it's worth a shot as I have little on my > computer and reinstalling to start over wouldn't be too difficult. What > method did you use to install bioperl? I used fink and I am not sure the > available stable version is the one I need. I will install from the command > line this time around, and let you know how it turns out. > > Thank you! > > > > Emanuele Osimo wrote: >> >> Hello Ryan, >> unfortunately, if you upgraded to 10.6 without formatting, I have to tell >> you that you'll be in big trouble with perl and with everything you >> installed from the commandline... Because in the upgrade process >> everything >> in the system folders, perl and bioperl being some of these things, is >> erased without being uninstalled, so you'll find a lot of folders with the >> same name but no contents. >> I suggest you, as I did, to format your pc and reinstall 10.6 from >> scratch. >> Then youl'll be able to install mysql (I had to install >> mysql-5.4.3-beta-osx10.5, the only to work on 10.6), and, working with >> perl >> 5.10 that is already installed, you'll install bioperl with no effort. >> Bye >> Emanuele >> >> On Mon, Nov 16, 2009 at 04:30, rbogard >> wrote: >> >>> >>> In advance, any advice would be grealy appreciated! I have installed >>> bioperl-588pm via fink but I am having difficulties calling the modules >>> in >>> script. The following is added to .profile (bash): >>> PERL5LIB=/sw/lib/perl5/5.8.8:$PERL5LIB >>> >>> If I change this to /sw/lib/perl5 then I get an @INC error, as use >>> Bio::PERL >>> cannot be located. >>> >>> The environment variables are as follows: >>> >>> >>> MANPATH=/sw/share/man:/usr/share/man:/usr/X11/man:/sw/lib/perl5/5.10.0/man:/usr/X11R6/man:/sw/lib/perl5-core/5.8.8/man:/sw/lib/perl5/5.8.8/man >>> >>> PERL5LIB=/sw/lib/perl5/5.8.8:/sw/lib/perl5:/sw/lib/perl5/darwin:/sw/lib/perl5/5.8.8 >>> >>> PATH=/sw/bin:/sw/sbin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin:/usr/X11R6/bin >>> INFOPATH=/sw/share/info:/sw/info:/usr/share/info >>> >>> >>> This is the perl script I'm attempting to run: >>> #!/sw/bin/perl5.8.8 >>> use strict; >>> use Bio::Perl; >>> $seq_object = get_sequence('swiss',"ROA1_HUMAN"); >>> write_sequence(">roa1.fasta",'fasta',$seq_object); >>> >>> Here is the error output: >>> >>> dyld: lazy symbol binding failed: Symbol not found: _Perl_Tstack_sp_ptr >>> Referenced from: >>> /sw/lib/perl5/5.8.8/darwin-thread-multi-2level/auto/IO/IO.bundle >>> Expected in: dynamic lookup >>> >>> dyld: Symbol not found: _Perl_Tstack_sp_ptr >>> Referenced from: >>> /sw/lib/perl5/5.8.8/darwin-thread-multi-2level/auto/IO/IO.bundle >>> Expected in: dynamic lookup >>> >>> Trace/BPT trap >>> >>> I have looked through many forum postings and attempted the solutions >>> offered in those instances, but none seem to work in my case. I'm not >>> sure >>> if it's because I have perl 5.10.0 installed while attempting to call >>> bioperl 5.8.8; however, others seem to have it working just fine. >>> >>> Thank you, Ryan >>> -- >>> View this message in context: >>> http://old.nabble.com/Problems-with-bioperl-in-Mac-OS-X-10.6-%28Perl-5.10.0%29-tp26366421p26366421.html >>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > -- > View this message in context: > http://old.nabble.com/Problems-with-bioperl-in-Mac-OS-X-10.6-%28Perl-5.10.0%29-tp26366421p26372079.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Mon Nov 16 15:00:09 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 16 Nov 2009 09:00:09 -0600 Subject: [Bioperl-l] Problems with bioperl in Mac OS X 10.6 (Perl 5.10.0) In-Reply-To: <2ac05d0f0911152304v58985cb5x6ea0501bff7a41ab@mail.gmail.com> References: <26366421.post@talk.nabble.com> <2ac05d0f0911152304v58985cb5x6ea0501bff7a41ab@mail.gmail.com> Message-ID: <49681E01-E95D-4FC6-AE42-6E57ED43AAA2@illinois.edu> On Nov 16, 2009, at 1:04 AM, Emanuele Osimo wrote: > Hello Ryan, > unfortunately, if you upgraded to 10.6 without formatting, I have to tell > you that you'll be in big trouble with perl and with everything you > installed from the commandline... Because in the upgrade process everything > in the system folders, perl and bioperl being some of these things, is > erased without being uninstalled, so you'll find a lot of folders with the > same name but no contents. > I suggest you, as I did, to format your pc and reinstall 10.6 from scratch. > Then youl'll be able to install mysql (I had to install > mysql-5.4.3-beta-osx10.5, the only to work on 10.6), and, working with perl > 5.10 that is already installed, you'll install bioperl with no effort. > Bye > Emanuele Just starting from scratch isn't always the best solution (though it is the cleanest). In this case I don't think anything you mention applies, as there are conflicting symbols being reported. My guess is conflicting perl builds, probably between your system 5.10.0 (snow leopard) and your fink-installed perl 5.8.8 (they are binary incompatible). Also, remember that snow leopard is primarily 64-bit, so it might be best to try working out whether your fink is attempting to compile 64- vs 32-bit. In this case, I would just uninstall the fink-based perl and either use the system one (snow leopard = 5.10.0), or roll your own and install 5.10.1 locally or in /usr/local. Do NOT replace the system one, as that will likely break your OS. In my experience, and not to bash on fink or MacPorts, I never had much luck with their perl installs. Unless I plan on only using fink or macports for my OS (not likely in my case), I find they tend to cause problems in the long term unless one uses them to install packages with very few dependencies, and even then you need to make sure fink is configure to compile the correct binary. For instance, they're fairly good for gd, libxml2, etc., but beyond that one may get into issues with odd, version-specific dependencies with some packages, such as relying on perl 5.8.8 (but not perl 5.10.x), db42 (instead of db44), etc. I've ended up in the past with 2-3 different perl versions, berkeley db versions, etc. chris > On Mon, Nov 16, 2009 at 04:30, rbogard wrote: > >> >> In advance, any advice would be grealy appreciated! I have installed >> bioperl-588pm via fink but I am having difficulties calling the modules in >> script. The following is added to .profile (bash): >> PERL5LIB=/sw/lib/perl5/5.8.8:$PERL5LIB >> >> If I change this to /sw/lib/perl5 then I get an @INC error, as use >> Bio::PERL >> cannot be located. >> >> The environment variables are as follows: >> >> >> MANPATH=/sw/share/man:/usr/share/man:/usr/X11/man:/sw/lib/perl5/5.10.0/man:/usr/X11R6/man:/sw/lib/perl5-core/5.8.8/man:/sw/lib/perl5/5.8.8/man >> >> PERL5LIB=/sw/lib/perl5/5.8.8:/sw/lib/perl5:/sw/lib/perl5/darwin:/sw/lib/perl5/5.8.8 >> >> PATH=/sw/bin:/sw/sbin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin:/usr/X11R6/bin >> INFOPATH=/sw/share/info:/sw/info:/usr/share/info >> >> >> This is the perl script I'm attempting to run: >> #!/sw/bin/perl5.8.8 >> use strict; >> use Bio::Perl; >> $seq_object = get_sequence('swiss',"ROA1_HUMAN"); >> write_sequence(">roa1.fasta",'fasta',$seq_object); >> >> Here is the error output: >> >> dyld: lazy symbol binding failed: Symbol not found: _Perl_Tstack_sp_ptr >> Referenced from: >> /sw/lib/perl5/5.8.8/darwin-thread-multi-2level/auto/IO/IO.bundle >> Expected in: dynamic lookup >> >> dyld: Symbol not found: _Perl_Tstack_sp_ptr >> Referenced from: >> /sw/lib/perl5/5.8.8/darwin-thread-multi-2level/auto/IO/IO.bundle >> Expected in: dynamic lookup >> >> Trace/BPT trap >> >> I have looked through many forum postings and attempted the solutions >> offered in those instances, but none seem to work in my case. I'm not sure >> if it's because I have perl 5.10.0 installed while attempting to call >> bioperl 5.8.8; however, others seem to have it working just fine. >> >> Thank you, Ryan >> -- >> View this message in context: >> http://old.nabble.com/Problems-with-bioperl-in-Mac-OS-X-10.6-%28Perl-5.10.0%29-tp26366421p26366421.html >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Nov 16 15:01:01 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 16 Nov 2009 09:01:01 -0600 Subject: [Bioperl-l] Problems with bioperl in Mac OS X 10.6 (Perl 5.10.0) In-Reply-To: <8D822081B13F49C2A37677D3A47F38B4@NewLife> References: <26366421.post@talk.nabble.com><2ac05d0f0911152304v58985cb5x6ea0501bff7a41ab@mail.gmail.com> <26372079.post@talk.nabble.com> <8D822081B13F49C2A37677D3A47F38B4@NewLife> Message-ID: <58912861-CD59-4AFC-8F30-B0AA2E77AECB@illinois.edu> Actually, why not just install via CPAN? Any particular reason? chris On Nov 16, 2009, at 7:48 AM, Mark A. Jensen wrote: > Ryan, > I'm not a mac person, but Koen has said (see http://www.bioperl.org/wiki/Getting_BioPerl#Mac_OS_X_using_fink ) > to use the unstable tree to get BioPerl 1.6.1, which is likely to be what you want. > cheers > Mark > ----- Original Message ----- From: "rbogard" > To: > Sent: Monday, November 16, 2009 8:43 AM > Subject: Re: [Bioperl-l] Problems with bioperl in Mac OS X 10.6 (Perl 5.10.0) > > >> >> The Mac OS X 10.6 was a fresh install on a new Mac Book Pro. Not sure if I >> will have the same issues, but it's worth a shot as I have little on my >> computer and reinstalling to start over wouldn't be too difficult. What >> method did you use to install bioperl? I used fink and I am not sure the >> available stable version is the one I need. I will install from the command >> line this time around, and let you know how it turns out. >> >> Thank you! >> >> >> >> Emanuele Osimo wrote: >>> >>> Hello Ryan, >>> unfortunately, if you upgraded to 10.6 without formatting, I have to tell >>> you that you'll be in big trouble with perl and with everything you >>> installed from the commandline... Because in the upgrade process >>> everything >>> in the system folders, perl and bioperl being some of these things, is >>> erased without being uninstalled, so you'll find a lot of folders with the >>> same name but no contents. >>> I suggest you, as I did, to format your pc and reinstall 10.6 from >>> scratch. >>> Then youl'll be able to install mysql (I had to install >>> mysql-5.4.3-beta-osx10.5, the only to work on 10.6), and, working with >>> perl >>> 5.10 that is already installed, you'll install bioperl with no effort. >>> Bye >>> Emanuele >>> >>> On Mon, Nov 16, 2009 at 04:30, rbogard >>> wrote: >>> >>>> >>>> In advance, any advice would be grealy appreciated! I have installed >>>> bioperl-588pm via fink but I am having difficulties calling the modules >>>> in >>>> script. The following is added to .profile (bash): >>>> PERL5LIB=/sw/lib/perl5/5.8.8:$PERL5LIB >>>> >>>> If I change this to /sw/lib/perl5 then I get an @INC error, as use >>>> Bio::PERL >>>> cannot be located. >>>> >>>> The environment variables are as follows: >>>> >>>> >>>> MANPATH=/sw/share/man:/usr/share/man:/usr/X11/man:/sw/lib/perl5/5.10.0/man:/usr/X11R6/man:/sw/lib/perl5-core/5.8.8/man:/sw/lib/perl5/5.8.8/man >>>> >>>> PERL5LIB=/sw/lib/perl5/5.8.8:/sw/lib/perl5:/sw/lib/perl5/darwin:/sw/lib/perl5/5.8.8 >>>> >>>> PATH=/sw/bin:/sw/sbin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin:/usr/X11R6/bin >>>> INFOPATH=/sw/share/info:/sw/info:/usr/share/info >>>> >>>> >>>> This is the perl script I'm attempting to run: >>>> #!/sw/bin/perl5.8.8 >>>> use strict; >>>> use Bio::Perl; >>>> $seq_object = get_sequence('swiss',"ROA1_HUMAN"); >>>> write_sequence(">roa1.fasta",'fasta',$seq_object); >>>> >>>> Here is the error output: >>>> >>>> dyld: lazy symbol binding failed: Symbol not found: _Perl_Tstack_sp_ptr >>>> Referenced from: >>>> /sw/lib/perl5/5.8.8/darwin-thread-multi-2level/auto/IO/IO.bundle >>>> Expected in: dynamic lookup >>>> >>>> dyld: Symbol not found: _Perl_Tstack_sp_ptr >>>> Referenced from: >>>> /sw/lib/perl5/5.8.8/darwin-thread-multi-2level/auto/IO/IO.bundle >>>> Expected in: dynamic lookup >>>> >>>> Trace/BPT trap >>>> >>>> I have looked through many forum postings and attempted the solutions >>>> offered in those instances, but none seem to work in my case. I'm not >>>> sure >>>> if it's because I have perl 5.10.0 installed while attempting to call >>>> bioperl 5.8.8; however, others seem to have it working just fine. >>>> >>>> Thank you, Ryan >>>> -- >>>> View this message in context: >>>> http://old.nabble.com/Problems-with-bioperl-in-Mac-OS-X-10.6-%28Perl-5.10.0%29-tp26366421p26366421.html >>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> -- >> View this message in context: http://old.nabble.com/Problems-with-bioperl-in-Mac-OS-X-10.6-%28Perl-5.10.0%29-tp26366421p26372079.html >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Kevin.M.Brown at asu.edu Mon Nov 16 15:49:13 2009 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Mon, 16 Nov 2009 08:49:13 -0700 Subject: [Bioperl-l] Bio::Graphics::Panel question In-Reply-To: References: Message-ID: <1A4207F8295607498283FE9E93B775B40663EDB9@EX02.asurite.ad.asu.edu> To really be able to tell if this was a bug, I (and probably the real devs) would need to see that part of your code and the Blast file that is having this issue as it could be your callback for color choice vs the blast object (e.g. your color picker is missing an option that the data comes in with and so returns with a blank value). -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Xiaoyu Liang Sent: Friday, November 13, 2009 1:36 PM To: Bioperl-l at lists.open-bio.org Subject: [Bioperl-l] Bio::Graphics::Panel question Hi, I'm using Bio::Graphics to parse the blast result and generate images. But, sometimes, in the middle of the output image, the hit's color is white, eventhough I set it to other colors. I attached the picture here for an example. This doesn't occur all the time, usually, it works well. I'm wondering if I did something wrong? or depends on the blast result? Thank you, Xiaoyu From ryan_bogard at hms.harvard.edu Mon Nov 16 16:57:16 2009 From: ryan_bogard at hms.harvard.edu (rbogard) Date: Mon, 16 Nov 2009 08:57:16 -0800 (PST) Subject: [Bioperl-l] Problems with bioperl in Mac OS X 10.6 (Perl 5.10.0) In-Reply-To: <58912861-CD59-4AFC-8F30-B0AA2E77AECB@illinois.edu> References: <26366421.post@talk.nabble.com> <2ac05d0f0911152304v58985cb5x6ea0501bff7a41ab@mail.gmail.com> <26372079.post@talk.nabble.com> <8D822081B13F49C2A37677D3A47F38B4@NewLife> <58912861-CD59-4AFC-8F30-B0AA2E77AECB@illinois.edu> Message-ID: <26375418.post@talk.nabble.com> I read that posting by Koen and used the unstable tree after the first attempt; however, the errors still persisted. I just finished a fresh install and I will just follow Mr. Fields advice and use CPAN. Thank you all for the help! Chris Fields-5 wrote: > > Actually, why not just install via CPAN? Any particular reason? > > chris > > On Nov 16, 2009, at 7:48 AM, Mark A. Jensen wrote: > >> Ryan, >> I'm not a mac person, but Koen has said (see >> http://www.bioperl.org/wiki/Getting_BioPerl#Mac_OS_X_using_fink ) >> to use the unstable tree to get BioPerl 1.6.1, which is likely to be what >> you want. >> cheers >> Mark >> ----- Original Message ----- From: "rbogard" >> >> To: >> Sent: Monday, November 16, 2009 8:43 AM >> Subject: Re: [Bioperl-l] Problems with bioperl in Mac OS X 10.6 (Perl >> 5.10.0) >> >> >>> >>> The Mac OS X 10.6 was a fresh install on a new Mac Book Pro. Not sure if >>> I >>> will have the same issues, but it's worth a shot as I have little on my >>> computer and reinstalling to start over wouldn't be too difficult. What >>> method did you use to install bioperl? I used fink and I am not sure the >>> available stable version is the one I need. I will install from the >>> command >>> line this time around, and let you know how it turns out. >>> >>> Thank you! >>> >>> >>> >>> Emanuele Osimo wrote: >>>> >>>> Hello Ryan, >>>> unfortunately, if you upgraded to 10.6 without formatting, I have to >>>> tell >>>> you that you'll be in big trouble with perl and with everything you >>>> installed from the commandline... Because in the upgrade process >>>> everything >>>> in the system folders, perl and bioperl being some of these things, is >>>> erased without being uninstalled, so you'll find a lot of folders with >>>> the >>>> same name but no contents. >>>> I suggest you, as I did, to format your pc and reinstall 10.6 from >>>> scratch. >>>> Then youl'll be able to install mysql (I had to install >>>> mysql-5.4.3-beta-osx10.5, the only to work on 10.6), and, working with >>>> perl >>>> 5.10 that is already installed, you'll install bioperl with no effort. >>>> Bye >>>> Emanuele >>>> >>>> On Mon, Nov 16, 2009 at 04:30, rbogard >>>> wrote: >>>> >>>>> >>>>> In advance, any advice would be grealy appreciated! I have installed >>>>> bioperl-588pm via fink but I am having difficulties calling the >>>>> modules >>>>> in >>>>> script. The following is added to .profile (bash): >>>>> PERL5LIB=/sw/lib/perl5/5.8.8:$PERL5LIB >>>>> >>>>> If I change this to /sw/lib/perl5 then I get an @INC error, as use >>>>> Bio::PERL >>>>> cannot be located. >>>>> >>>>> The environment variables are as follows: >>>>> >>>>> >>>>> MANPATH=/sw/share/man:/usr/share/man:/usr/X11/man:/sw/lib/perl5/5.10.0/man:/usr/X11R6/man:/sw/lib/perl5-core/5.8.8/man:/sw/lib/perl5/5.8.8/man >>>>> >>>>> PERL5LIB=/sw/lib/perl5/5.8.8:/sw/lib/perl5:/sw/lib/perl5/darwin:/sw/lib/perl5/5.8.8 >>>>> >>>>> PATH=/sw/bin:/sw/sbin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin:/usr/X11R6/bin >>>>> INFOPATH=/sw/share/info:/sw/info:/usr/share/info >>>>> >>>>> >>>>> This is the perl script I'm attempting to run: >>>>> #!/sw/bin/perl5.8.8 >>>>> use strict; >>>>> use Bio::Perl; >>>>> $seq_object = get_sequence('swiss',"ROA1_HUMAN"); >>>>> write_sequence(">roa1.fasta",'fasta',$seq_object); >>>>> >>>>> Here is the error output: >>>>> >>>>> dyld: lazy symbol binding failed: Symbol not found: >>>>> _Perl_Tstack_sp_ptr >>>>> Referenced from: >>>>> /sw/lib/perl5/5.8.8/darwin-thread-multi-2level/auto/IO/IO.bundle >>>>> Expected in: dynamic lookup >>>>> >>>>> dyld: Symbol not found: _Perl_Tstack_sp_ptr >>>>> Referenced from: >>>>> /sw/lib/perl5/5.8.8/darwin-thread-multi-2level/auto/IO/IO.bundle >>>>> Expected in: dynamic lookup >>>>> >>>>> Trace/BPT trap >>>>> >>>>> I have looked through many forum postings and attempted the solutions >>>>> offered in those instances, but none seem to work in my case. I'm not >>>>> sure >>>>> if it's because I have perl 5.10.0 installed while attempting to call >>>>> bioperl 5.8.8; however, others seem to have it working just fine. >>>>> >>>>> Thank you, Ryan >>>>> -- >>>>> View this message in context: >>>>> http://old.nabble.com/Problems-with-bioperl-in-Mac-OS-X-10.6-%28Perl-5.10.0%29-tp26366421p26366421.html >>>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>> >>> -- >>> View this message in context: >>> http://old.nabble.com/Problems-with-bioperl-in-Mac-OS-X-10.6-%28Perl-5.10.0%29-tp26366421p26372079.html >>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://old.nabble.com/Problems-with-bioperl-in-Mac-OS-X-10.6-%28Perl-5.10.0%29-tp26366421p26375418.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From krishna.aneesh at gmail.com Mon Nov 16 07:00:15 2009 From: krishna.aneesh at gmail.com (Aneesh K) Date: Mon, 16 Nov 2009 12:30:15 +0530 Subject: [Bioperl-l] Regarding Bio::TreeIO Object Message-ID: <9cb9dfd70911152300y34789f88qc69dd14bf505f57d@mail.gmail.com> Hi, I just started to use Bioperl modules. It's really useful and interesting. Now I have in stuck with "Tree objects and phylogenetic trees". I couldn't get any documentation/examples about reading/parsing phylip tree files. Please tell me from where I can get some sample codes for this. Waiting for your reply. Thanks Aneesh.K Mob. 09646181517 From David.Messina at sbc.su.se Mon Nov 16 17:33:36 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 16 Nov 2009 18:33:36 +0100 Subject: [Bioperl-l] highest PAML version supported? In-Reply-To: References: <628aabb70911121120w4c609056v50204b9bd9e5c3fb@mail.gmail.com> Message-ID: Hi everyone, I just committed support for parsing codeml 4.3a (August 2009) to bioperl-live. I added new tests and all PAML-related tests pass, but please report any problems you have to the list. Note that I haven't tested the other PAML 4.3a executables to see if there are format changes with those. If you get the chance to try any and it doesn't work, let me know and I'll try to add support for them. (Note that these changes are only to the PAML parsing code; Bio::Tools::Run already appears to handle 4.3a just fine.) Dave From jason at bioperl.org Mon Nov 16 17:34:57 2009 From: jason at bioperl.org (Jason Stajich) Date: Mon, 16 Nov 2009 09:34:57 -0800 Subject: [Bioperl-l] Regarding Bio::TreeIO Object In-Reply-To: <9cb9dfd70911152300y34789f88qc69dd14bf505f57d@mail.gmail.com> References: <9cb9dfd70911152300y34789f88qc69dd14bf505f57d@mail.gmail.com> Message-ID: Is this at all helpful to your questions. http://www.bioperl.org/wiki/HOWTO:Trees The trees are in 'newick' or new hampshire format though I don't think there is a phylip format for trees. -jason On Nov 15, 2009, at 11:00 PM, Aneesh K wrote: > Hi, > > I just started to use Bioperl modules. It's really useful and > interesting. > Now I have in stuck with "Tree objects and phylogenetic trees". > I couldn't get any documentation/examples about reading/parsing > phylip tree > files. > > Please tell me from where I can get some sample codes for this. > > Waiting for your reply. > > Thanks > Aneesh.K > Mob. 09646181517 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From roy.chaudhuri at gmail.com Mon Nov 16 17:31:49 2009 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Mon, 16 Nov 2009 17:31:49 +0000 Subject: [Bioperl-l] Regarding Bio::TreeIO Object In-Reply-To: <9cb9dfd70911152300y34789f88qc69dd14bf505f57d@mail.gmail.com> References: <9cb9dfd70911152300y34789f88qc69dd14bf505f57d@mail.gmail.com> Message-ID: <4B018C85.6020801@gmail.com> Hi Aneesh, See the Bioperl trees howto: http://www.bioperl.org/wiki/HOWTO:Trees Roy. Aneesh K wrote: > Hi, > > I just started to use Bioperl modules. It's really useful and interesting. > Now I have in stuck with "Tree objects and phylogenetic trees". > I couldn't get any documentation/examples about reading/parsing phylip tree > files. > > Please tell me from where I can get some sample codes for this. > > Waiting for your reply. > > Thanks > Aneesh.K > Mob. 09646181517 -- Dr. Roy Chaudhuri Department of Veterinary Medicine University of Cambridge, U.K. From Kevin.M.Brown at asu.edu Mon Nov 16 18:22:07 2009 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Mon, 16 Nov 2009 11:22:07 -0700 Subject: [Bioperl-l] FW: Bio::Graphics::Panel question Message-ID: <1A4207F8295607498283FE9E93B775B40663EE37@EX02.asurite.ad.asu.edu> Please keep your responses on the list for more timely help. Kevin Brown Center for Innovations in Medicine Biodesign Institute Arizona State University ________________________________ From: Xiaoyu Liang [mailto:veronica.xiaoyu at gmail.com] Sent: Monday, November 16, 2009 9:34 AM To: Kevin Brown Subject: Re: [Bioperl-l] Bio::Graphics::Panel question Hi Kevin, Thank you for ur quick response. I attached the BLAST .out file here. And the follow is my code part. I have an array keeping the color for each hit, and I printed it out the array, there is no missing. my $track = $panel->add_track( -glyph => 'graded_segments', -label => 1, -connector => 'dashed', -font2color => 'red', -sort_order => 'high_score', -description => sub { $feature = shift; #print "--".$feature."\n"; return unless $feature->has_tag('description'); my ($description) = $feature->each_tag_value('description'); my ($id) = $feature->display_name; my @records= split(/\|/,$description); my $score = $feature->score; #print $id.":".$score."\n"; if($score >=200){ push (@color_array,1); }elsif($score >=80){ push (@color_array,2); }elsif($score >=50){ push (@color_array,3); }elsif($score >= 40){ push (@color_array,4); }else{ push (@color_array,5); } if($type == 1){ "Species:Arabidopsis TF Family:$records[1] Score=$score"; }elsif($type == 2){ if(scalar(@records)==5){ "Species:$records[1] TF Family:$records[2] Accepted Name:$records[3] Score=$score"; }else{ "Species:$records[1] TF Family:$records[2] Score=$score"; } }else{ ""; } }, -bgcolor => sub{ return unless $feature->has_tag('description'); if($color_array[$index] == 1 ){ $color = 'red'; } if($color_array[$index]== 2){ $color = 'orange'; } if($color_array[$index]== 3){ $color = 'green'; } if($color_array[$index]== 4){ $color = 'blue'; } if($color_array[$index]== 5){ $color = 'black'; } #if ($index == 20){ # $color = 'black'; #} #print $index."--".$color_array[$index]."\n"; $index++; #print $feature."\n"; #print $feature->display_name."\n"; return $color; }, ); Best regrads, Xiaoyu On Mon, Nov 16, 2009 at 10:49 AM, Kevin Brown wrote: To really be able to tell if this was a bug, I (and probably the real devs) would need to see that part of your code and the Blast file that is having this issue as it could be your callback for color choice vs the blast object (e.g. your color picker is missing an option that the data comes in with and so returns with a blank value). -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Xiaoyu Liang Sent: Friday, November 13, 2009 1:36 PM To: Bioperl-l at lists.open-bio.org Subject: [Bioperl-l] Bio::Graphics::Panel question Hi, I'm using Bio::Graphics to parse the blast result and generate images. But, sometimes, in the middle of the output image, the hit's color is white, eventhough I set it to other colors. I attached the picture here for an example. This doesn't occur all the time, usually, it works well. I'm wondering if I did something wrong? or depends on the blast result? Thank you, Xiaoyu _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l -------------- next part -------------- A non-text attachment was scrubbed... Name: 1258388779.out Type: application/octet-stream Size: 32599 bytes Desc: 1258388779.out URL: From paolo.pavan at gmail.com Mon Nov 16 19:06:06 2009 From: paolo.pavan at gmail.com (Paolo Pavan) Date: Mon, 16 Nov 2009 20:06:06 +0100 Subject: [Bioperl-l] bioperl-ext installation issue Message-ID: <56be91b60911161106w69e20fd9k133a465e8d4f8a3f@mail.gmail.com> Hi everybody, I have problems installing the bioperl-ext package, any help is much appreciated. 1) - I start trying with cpan i /bioperl-ext/ the only resource available is /B/BI/BIRNEY/bioperl-ext-1.4 (is it ok?) - I install Inline::MakeMaker and Inline::C then - i/BIRNEY/bioperl-ext-1.4/ fails bacause I don't have staden package 2) I try to install io_lib-1.8.10.tar as suggested by the README ( ftp://ftp.mrc-lmb.cam.ac.uk/pub/staden/io_lib/), installation fails after: ... gcc -g -O2 -o makeSCF makeSCF.o ../read/.libs/libread.a -lz -lm ../read/.libs/libread.a(compress.o): In function `fopen_compressed': /root/Download/staden/io_lib-1.8.10/utils/compress.c:321: warning: the use of `tempnam' is dangerous, better use `mkstemp' gcc -DHAVE_CONFIG_H -I. -I. -I.. -I.. -I../read -I../alf -I../abi -I../ctf -I../ztr -I../plain -I../scf -I../exp_file -I../utils -I/usr/local/include -g -O2 -c -o extract_seq.o `test -f extract_seq.c || echo './'`extract_seq.c /bin/sh ../libtool --mode=link gcc -g -O2 -o extract_seq extract_seq.o ../read/libread.la gcc -g -O2 -o extract_seq extract_seq.o ../read/.libs/libread.a -lz -lm ../read/.libs/libread.a(compress.o): In function `fopen_compressed': /root/Download/staden/io_lib-1.8.10/utils/compress.c:321: warning: the use of `tempnam' is dangerous, better use `mkstemp' gcc -DHAVE_CONFIG_H -I. -I. -I.. -I.. -I../read -I../alf -I../abi -I../ctf -I../ztr -I../plain -I../scf -I../exp_file -I../utils -I/usr/local/include -g -O2 -c -o index_tar.o `test -f index_tar.c || echo './'`index_tar.c index_tar.c: In function ?main?: index_tar.c:12: error: two or more data types in declaration specifiers make[2]: *** [index_tar.o] Error 1 make[2]: Leaving directory `/home/root/Download/staden/io_lib-1.8.10/progs' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/home/root/Download/staden/io_lib-1.8.10' make: *** [all-recursive-am] Error 2 3) I give up staden, because I actually need pSW, and try to install from Makefile.PL in Bio/Ext/Align but installation fails after: ... Align.xs:18: warning: ?not_here? defined but not used Running Mkbootstrap for Bio::Ext::Align () chmod 644 Align.bs rm -f ../blib/arch/auto/Bio/Ext/Align/Align.so gcc -shared -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic Align.o -o ../blib/arch/auto/Bio/Ext/Align/Align.so libs/libsw.a \ -lm \ /usr/bin/ld: libs/libsw.a(aln.o): relocation R_X86_64_32 against `a local symbol' can not be used when making a shared object; recompile with -fPIC libs/libsw.a: could not read symbols: Bad value collect2: ld returned 1 exit status make[1]: *** [../blib/arch/auto/Bio/Ext/Align/Align.so] Error 1 make[1]: Leaving directory `/home/root/.cpan/sources/authors/id/B/BI/BIRNEY/bioperl-ext-1.4/Bio/Ext/Align' make: *** [subdirs] Error 2 I have also made some other tries such force install Bio::Ext:Align without success but I'm sure I miss something trivial that I can't catch. Can someone help me? Thank you, Paolo From lincoln.stein at gmail.com Mon Nov 16 20:08:20 2009 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Mon, 16 Nov 2009 15:08:20 -0500 Subject: [Bioperl-l] FW: Bio::Graphics::Panel question In-Reply-To: <1A4207F8295607498283FE9E93B775B40663EE37@EX02.asurite.ad.asu.edu> References: <1A4207F8295607498283FE9E93B775B40663EE37@EX02.asurite.ad.asu.edu> Message-ID: <6dce9a0b0911161208q2f826d83s319184f0cacca097@mail.gmail.com> Hi, I think you should modify your color selection code as follows: if($color_array[$index] == 1 ){ $color = 'red'; } elsif($color_array[$index]== 2){ $color = 'orange'; } elsif($color_array[$index]== 3){ $color = 'green'; } elsif($color_array[$index]== 4){ $color = 'blue'; } elsif($color_array[$index]== 5){ $color = 'black'; } else { die "unexpected color array value $color_array[$index]" } Lincoln On Mon, Nov 16, 2009 at 1:22 PM, Kevin Brown wrote: > Please keep your responses on the list for more timely help. > > > Kevin Brown > Center for Innovations in Medicine > Biodesign Institute > Arizona State University > > > > ________________________________ > > From: Xiaoyu Liang [mailto:veronica.xiaoyu at gmail.com] > Sent: Monday, November 16, 2009 9:34 AM > To: Kevin Brown > Subject: Re: [Bioperl-l] Bio::Graphics::Panel question > > > Hi Kevin, > > Thank you for ur quick response. I attached the BLAST .out file here. > And the follow is my code part. I have an array keeping the color for > each hit, and I printed it out the array, there is no missing. > > my $track = $panel->add_track( > -glyph => 'graded_segments', > -label => 1, > -connector => 'dashed', > -font2color => 'red', > -sort_order => 'high_score', > -description => sub { > $feature = shift; > #print "--".$feature."\n"; > return unless > $feature->has_tag('description'); > my ($description) = > $feature->each_tag_value('description'); > my ($id) = $feature->display_name; > my @records= split(/\|/,$description); > my $score = $feature->score; > #print $id.":".$score."\n"; > if($score >=200){ > push (@color_array,1); > }elsif($score >=80){ > push (@color_array,2); > }elsif($score >=50){ > push (@color_array,3); > }elsif($score >= 40){ > push (@color_array,4); > }else{ > push (@color_array,5); > } > > if($type == 1){ > "Species:Arabidopsis TF > Family:$records[1] Score=$score"; > }elsif($type == 2){ > if(scalar(@records)==5){ > "Species:$records[1] TF > Family:$records[2] Accepted Name:$records[3] Score=$score"; > }else{ > "Species:$records[1] TF > Family:$records[2] Score=$score"; > } > }else{ > ""; > } > }, > -bgcolor => sub{ > return unless > $feature->has_tag('description'); > if($color_array[$index] == 1 ){ > $color = 'red'; > } > if($color_array[$index]== 2){ > $color = 'orange'; > } > if($color_array[$index]== 3){ > $color = 'green'; > } > if($color_array[$index]== 4){ > $color = 'blue'; > } > if($color_array[$index]== 5){ > $color = 'black'; > } > #if ($index == 20){ > # $color = 'black'; > #} > #print > $index."--".$color_array[$index]."\n"; > $index++; > > #print $feature."\n"; > #print > $feature->display_name."\n"; > return $color; > }, > ); > > > Best regrads, > Xiaoyu > > > On Mon, Nov 16, 2009 at 10:49 AM, Kevin Brown > wrote: > > > To really be able to tell if this was a bug, I (and probably the > real > devs) would need to see that part of your code and the Blast > file that > is having this issue as it could be your callback for color > choice vs > the blast object (e.g. your color picker is missing an option > that the > data comes in with and so returns with a blank value). > > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Xiaoyu Liang > Sent: Friday, November 13, 2009 1:36 PM > To: Bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Bio::Graphics::Panel question > > Hi, > > I'm using Bio::Graphics to parse the blast result and generate > images. > But, sometimes, in the middle of the output image, the hit's > color is > white, eventhough I set it to other colors. I attached the > picture here > for an example. This doesn't occur all the time, usually, it > works well. > I'm wondering if I did something wrong? or depends on the blast > result? > > Thank you, > Xiaoyu > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa From ryan_bogard at hms.harvard.edu Mon Nov 16 21:44:25 2009 From: ryan_bogard at hms.harvard.edu (rbogard) Date: Mon, 16 Nov 2009 13:44:25 -0800 (PST) Subject: [Bioperl-l] Problems with bioperl in Mac OS X 10.6 (Perl 5.10.0) In-Reply-To: <26366421.post@talk.nabble.com> References: <26366421.post@talk.nabble.com> Message-ID: <26379710.post@talk.nabble.com> Thank you all for your help! I was able to get bioperl working via manual download and install. It was a combination of permissions issues and X86_64 vs. X86_32 compatibility issues. Using fink to download and install seems to have given me a combination of 32 and 64 associated files (I probably did something wrong in config). rbogard wrote: > > In advance, any advice would be grealy appreciated! I have installed > bioperl-588pm via fink but I am having difficulties calling the modules in > script. The following is added to .profile (bash): > PERL5LIB=/sw/lib/perl5/5.8.8:$PERL5LIB > > If I change this to /sw/lib/perl5 then I get an @INC error, as use > Bio::PERL cannot be located. > > The environment variables are as follows: > > MANPATH=/sw/share/man:/usr/share/man:/usr/X11/man:/sw/lib/perl5/5.10.0/man:/usr/X11R6/man:/sw/lib/perl5-core/5.8.8/man:/sw/lib/perl5/5.8.8/man > PERL5LIB=/sw/lib/perl5/5.8.8:/sw/lib/perl5:/sw/lib/perl5/darwin:/sw/lib/perl5/5.8.8 > PATH=/sw/bin:/sw/sbin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin:/usr/X11R6/bin > INFOPATH=/sw/share/info:/sw/info:/usr/share/info > > > This is the perl script I'm attempting to run: > #!/sw/bin/perl5.8.8 > use strict; > use Bio::Perl; > $seq_object = get_sequence('swiss',"ROA1_HUMAN"); > write_sequence(">roa1.fasta",'fasta',$seq_object); > > Here is the error output: > > dyld: lazy symbol binding failed: Symbol not found: _Perl_Tstack_sp_ptr > Referenced from: > /sw/lib/perl5/5.8.8/darwin-thread-multi-2level/auto/IO/IO.bundle > Expected in: dynamic lookup > > dyld: Symbol not found: _Perl_Tstack_sp_ptr > Referenced from: > /sw/lib/perl5/5.8.8/darwin-thread-multi-2level/auto/IO/IO.bundle > Expected in: dynamic lookup > > Trace/BPT trap > > I have looked through many forum postings and attempted the solutions > offered in those instances, but none seem to work in my case. I'm not sure > if it's because I have perl 5.10.0 installed while attempting to call > bioperl 5.8.8; however, others seem to have it working just fine. > > Thank you, Ryan > -- View this message in context: http://old.nabble.com/Problems-with-bioperl-in-Mac-OS-X-10.6-%28Perl-5.10.0%29-tp26366421p26379710.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From jay at jays.net Mon Nov 16 22:02:10 2009 From: jay at jays.net (Jay Hannah) Date: Mon, 16 Nov 2009 16:02:10 -0600 Subject: [Bioperl-l] Bio::Index::GenBank - by organism? In-Reply-To: <2BA451B1-6E18-483E-B655-74D1146772CC@bioperl.org> References: <3B01A09C-198E-4691-B807-7ED3250BB81A@jays.net> <12DFD22E-42DC-4626-9873-0DE3EBB5CFBD@illinois.edu> <2BA451B1-6E18-483E-B655-74D1146772CC@bioperl.org> Message-ID: <60ADD3A9-D38B-4A39-A5CE-C8118DEC1242@jays.net> On Nov 10, 2009, at 12:50 PM, Jason Stajich wrote: > You might also look at what mygenbank does: > http://homepage.mac.com/iankorf/mygenbank.html It appears, perhaps, that BioSQL can provide *foo* searching like so: http://www.biosql.org/wiki/Schema_Overview#TAXON.2C_TAXON_NAME SELECT DISTINCT include.ncbi_taxon_id FROM taxon INNER JOIN taxon AS include ON (include.left_value BETWEEN taxon.left_value AND taxon.right_value) WHERE taxon.taxon_id IN (SELECT taxon_id FROM taxon_name WHERE name LIKE '%fungi%') So I think we're going to chase that for a while. I didn't see a *foo* search in MyGenBank? Thanks, j http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah From roy.chaudhuri at gmail.com Tue Nov 17 11:24:07 2009 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Tue, 17 Nov 2009 11:24:07 +0000 Subject: [Bioperl-l] Regarding Bio::TreeIO Object In-Reply-To: <9cb9dfd70911162117nfac0e52gea3d638e34337b16@mail.gmail.com> References: <9cb9dfd70911152300y34789f88qc69dd14bf505f57d@mail.gmail.com> <4B018C85.6020801@gmail.com> <9cb9dfd70911162117nfac0e52gea3d638e34337b16@mail.gmail.com> Message-ID: <4B0287D7.5050702@gmail.com> Hi Aneesh, Please keep your replies on the mailing list, that way someone else can respond, which would be particularly useful in this case since I know nothing about MapIO. Roy. Aneesh K wrote: > Thanks for your reply. > > I would like to know about "Genetic Maps" also. I would like to > use MapIO object. > But I'm not aware about genetic maps and the mapmaker format. > > Please tell me from where I can get some examples for mapmaker format > and some example scripts to use MapIO object. > > Hoping your reply. > > Aneesh.K > Mob. 09646181517 > > > > On Mon, Nov 16, 2009 at 11:01 PM, Roy Chaudhuri > wrote: > > Hi Aneesh, > > See the Bioperl trees howto: > http://www.bioperl.org/wiki/HOWTO:Trees > > Roy. > > > Aneesh K wrote: > > Hi, > > I just started to use Bioperl modules. It's really useful and > interesting. > Now I have in stuck with "Tree objects and phylogenetic trees". > I couldn't get any documentation/examples about reading/parsing > phylip tree > files. > > Please tell me from where I can get some sample codes for this. > > Waiting for your reply. > > Thanks > Aneesh.K > Mob. 09646181517 > > > > -- > Dr. Roy Chaudhuri > Department of Veterinary Medicine > University of Cambridge, U.K. > > From maj at fortinbras.us Tue Nov 17 12:50:06 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 17 Nov 2009 07:50:06 -0500 Subject: [Bioperl-l] Regarding Bio::TreeIO Object In-Reply-To: <4B0287D7.5050702@gmail.com> References: <9cb9dfd70911152300y34789f88qc69dd14bf505f57d@mail.gmail.com><4B018C85.6020801@gmail.com><9cb9dfd70911162117nfac0e52gea3d638e34337b16@mail.gmail.com> <4B0287D7.5050702@gmail.com> Message-ID: <394F62D51F15405BBCF8BB50DA0FF336@NewLife> Aneesh, Have a look in the t/Map directory of the BioPerl distribution. These are test scripts that are also examples of usage. The t/data directory will contain the datafiles that the tests use; these will provide example data. cheers Mark ----- Original Message ----- From: "Roy Chaudhuri" To: "Aneesh K" ; Sent: Tuesday, November 17, 2009 6:24 AM Subject: Re: [Bioperl-l] Regarding Bio::TreeIO Object > Hi Aneesh, > > Please keep your replies on the mailing list, that way someone else can > respond, which would be particularly useful in this case since I know > nothing about MapIO. > > Roy. > > Aneesh K wrote: >> Thanks for your reply. >> >> I would like to know about "Genetic Maps" also. I would like to >> use MapIO object. >> But I'm not aware about genetic maps and the mapmaker format. >> >> Please tell me from where I can get some examples for mapmaker format >> and some example scripts to use MapIO object. >> >> Hoping your reply. >> >> Aneesh.K >> Mob. 09646181517 >> >> >> >> On Mon, Nov 16, 2009 at 11:01 PM, Roy Chaudhuri > > wrote: >> >> Hi Aneesh, >> >> See the Bioperl trees howto: >> http://www.bioperl.org/wiki/HOWTO:Trees >> >> Roy. >> >> >> Aneesh K wrote: >> >> Hi, >> >> I just started to use Bioperl modules. It's really useful and >> interesting. >> Now I have in stuck with "Tree objects and phylogenetic trees". >> I couldn't get any documentation/examples about reading/parsing >> phylip tree >> files. >> >> Please tell me from where I can get some sample codes for this. >> >> Waiting for your reply. >> >> Thanks >> Aneesh.K >> Mob. 09646181517 >> >> >> >> -- >> Dr. Roy Chaudhuri >> Department of Veterinary Medicine >> University of Cambridge, U.K. >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From veronica.xiaoyu at gmail.com Wed Nov 18 17:18:33 2009 From: veronica.xiaoyu at gmail.com (Xiaoyu Liang) Date: Wed, 18 Nov 2009 12:18:33 -0500 Subject: [Bioperl-l] how to visualize multiple sequences alignments Message-ID: Hi, I'm wondering Is there any modules that can be used for visualizing multiple sequences alignments? like the result from ClustalW? Thank you very much, Xiaoyu From jason at bioperl.org Wed Nov 18 18:23:05 2009 From: jason at bioperl.org (Jason Stajich) Date: Wed, 18 Nov 2009 10:23:05 -0800 Subject: [Bioperl-l] how to visualize multiple sequences alignments In-Reply-To: References: Message-ID: try jalview http://www.jalview.org/ On Nov 18, 2009, at 9:18 AM, Xiaoyu Liang wrote: > Hi, > > I'm wondering Is there any modules that can be used for visualizing > multiple > sequences alignments? like the result from ClustalW? > > Thank you very much, > Xiaoyu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From andrew.j.grimm at gmail.com Thu Nov 19 02:52:31 2009 From: andrew.j.grimm at gmail.com (Andrew Grimm) Date: Thu, 19 Nov 2009 13:52:31 +1100 Subject: [Bioperl-l] DANGER: hacking of bioperl wiki? Message-ID: Caution: read the whole email before visiting the bioperl wiki I was doing some bioinformatics-related searching using google, and one of the hits was to the bio dot perl dot org wiki (the FAQ in particular). When I did that, I was redirected to a ferdax dot com web site (a typo-squatting of fedex?). Some people reckon that ferdax hacks web sites and redirects google hits from the victim web site to their own web site. For example, this thread at google's webmaster central http://www.google.com/support/forum/p/Webmasters/thread?tid=37a36c0d1ea99819&hl=en#all (it's talking about zencart, but presumably they've since found other victims) Just going to the website without using google may not trigger the redirect. Apologies if this is a false alarm, but I don't think it is. I won't be in contact between Friday and Monday Australian time (I'll be at railscamp 6 in Melbourne), so I won't be able to answer any replies. Thanks, Andrew Grimm From maj at fortinbras.us Thu Nov 19 03:14:44 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 18 Nov 2009 22:14:44 -0500 Subject: [Bioperl-l] DANGER: hacking of bioperl wiki? In-Reply-To: References: Message-ID: <7761C2223DB54DE6B836F302D2FF6AC0@NewLife> Andrew-- thanks!! We're on it. MAJ ----- Original Message ----- From: "Andrew Grimm" To: Sent: Wednesday, November 18, 2009 9:52 PM Subject: [Bioperl-l] DANGER: hacking of bioperl wiki? > Caution: read the whole email before visiting the bioperl wiki > > I was doing some bioinformatics-related searching using google, and > one of the hits was to the bio dot perl dot org wiki (the FAQ in > particular). > > When I did that, I was redirected to a ferdax dot com web site (a > typo-squatting of fedex?). > > Some people reckon that ferdax hacks web sites and redirects google > hits from the victim web site to their own web site. For example, this > thread at google's webmaster central > http://www.google.com/support/forum/p/Webmasters/thread?tid=37a36c0d1ea99819&hl=en#all > (it's talking about zencart, but presumably they've since found other > victims) > > Just going to the website without using google may not trigger the redirect. > > Apologies if this is a false alarm, but I don't think it is. > > I won't be in contact between Friday and Monday Australian time (I'll > be at railscamp 6 in Melbourne), so I won't be able to answer any > replies. > > Thanks, > > Andrew Grimm > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From sandipan.chowdhury at physiology.wisc.edu Thu Nov 19 06:49:45 2009 From: sandipan.chowdhury at physiology.wisc.edu (Sandipan Chowdhury) Date: Thu, 19 Nov 2009 00:49:45 -0600 Subject: [Bioperl-l] accessing EMBL database Message-ID: <475F74057A618245A773CD325E105D1E033334AC@phy-srv01.physiol.physiology.wisc.edu> Hi, I have 3 questions all related to the retreival of sequences from online databases. (1) I have been trying to download a protein sequence from the EMBL database and trying to write the sequence into a text file, as a string. I am using the following code: use Bio::DB::EMBL; open b,">","s.txt"; $em_obj = Bio::DB::EMBL->new; $seq_obj = $em_obj->get_Seq_by_acc("CAB95729"); $s_str = $seq_obj->seq; print b "$s_str\n"; close b; The script is not working and gives the messege: "MSG: EMBL stream with no ID. Not embl in my book STACK: Error::throw STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm: 368 STACK: Bio::SeqIO::embl::next_seq C:/Perl/site/lib/Bio\SeqIO\embl.pm: 203 STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc C:/Perl/site/lib/Bio/DB/WebDBSeqI.pm: 194 STACK: trial2.pl" I am not sure what this means. A similar version of the script works for the Swissprot, GenBank and RefSeq databases but not for the EMBL. What is the way around this so that I can download the embl sequence? (2) Also, is there anyway I can download sequences from DDBJ (database of Japan)? (3) Can GI numbers be used to retreive the sequences? If so then how? Answers to these questions would be greatly appreciated. I am very new to Perl/Bioperl and am not really familiar with the advanced programming features, so I would need to your help to find my way out of this situation. Many Thanks Sandipan From maj at fortinbras.us Thu Nov 19 13:10:07 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 19 Nov 2009 08:10:07 -0500 Subject: [Bioperl-l] accessing EMBL database In-Reply-To: <475F74057A618245A773CD325E105D1E033334AC@phy-srv01.physiol.physiology.wisc.edu> References: <475F74057A618245A773CD325E105D1E033334AC@phy-srv01.physiol.physiology.wisc.edu> Message-ID: Sandipan-- That id (CAB95729) returns "No entries" from EMBL. I would agree that the error message is not really informative. The module documentation warns: # remember that EMBL_ID does not equal GenBank_ID! so I would check that. MAJ ----- Original Message ----- From: "Sandipan Chowdhury" To: Sent: Thursday, November 19, 2009 1:49 AM Subject: [Bioperl-l] accessing EMBL database > Hi, > > I have 3 questions all related to the retreival of sequences from online > databases. > > (1) I have been trying to download a protein sequence from the EMBL database > and trying to write the sequence into a text file, as a string. I am using the > following code: > > use Bio::DB::EMBL; > open b,">","s.txt"; > $em_obj = Bio::DB::EMBL->new; > $seq_obj = $em_obj->get_Seq_by_acc("CAB95729"); > $s_str = $seq_obj->seq; > print b "$s_str\n"; > close b; > > The script is not working and gives the messege: > "MSG: EMBL stream with no ID. Not embl in my book > STACK: Error::throw > STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm: 368 > STACK: Bio::SeqIO::embl::next_seq C:/Perl/site/lib/Bio\SeqIO\embl.pm: 203 > STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc > C:/Perl/site/lib/Bio/DB/WebDBSeqI.pm: 194 > STACK: trial2.pl" > > I am not sure what this means. A similar version of the script works for the > Swissprot, GenBank and RefSeq databases but not for the EMBL. What is the way > around this so that I can download the embl sequence? > > (2) Also, is there anyway I can download sequences from DDBJ (database of > Japan)? > > (3) Can GI numbers be used to retreive the sequences? If so then how? > > Answers to these questions would be greatly appreciated. I am very new to > Perl/Bioperl and am not really familiar with the advanced programming > features, so I would need to your help to find my way out of this situation. > > Many Thanks > Sandipan > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From hrh at fmi.ch Thu Nov 19 13:23:29 2009 From: hrh at fmi.ch (Hotz, Hans-Rudolf) Date: Thu, 19 Nov 2009 14:23:29 +0100 Subject: [Bioperl-l] accessing EMBL database In-Reply-To: <475F74057A618245A773CD325E105D1E033334AC@phy-srv01.physiol.physiology.wisc.edu> Message-ID: Sandipan > I have 3 questions all related to the retreival of sequences from online > databases. > > (1) I have been trying to download a protein sequence from the EMBL database > and trying to write the sequence into a text file, as a string. I am using the > following code: > > use Bio::DB::EMBL; > open b,">","s.txt"; > $em_obj = Bio::DB::EMBL->new; > $seq_obj = $em_obj->get_Seq_by_acc("CAB95729"); > $s_str = $seq_obj->seq; > print b "$s_str\n"; > close b; > > The script is not working and gives the messege: > "MSG: EMBL stream with no ID. Not embl in my book > STACK: Error::throw > STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm: 368 > STACK: Bio::SeqIO::embl::next_seq C:/Perl/site/lib/Bio\SeqIO\embl.pm: 203 > STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc > C:/Perl/site/lib/Bio/DB/WebDBSeqI.pm: 194 > STACK: trial2.pl" > > I am not sure what this means. A similar version of the script works for the > Swissprot, GenBank and RefSeq databases but not for the EMBL. What is the way > around this so that I can download the embl sequence? "CAB95729" is a protein sequence, ie a translation of the CDS of 'AJ277028.1'. As far as I know, Bio::DB::EMBL is only designed to get EMBL entries, ie the nucleotides sequence > (2) Also, is there anyway I can download sequences from DDBJ (database of > Japan)? Unless, for network/speed reason, why do you want to download data from DDBJ? It contains the same data as GenBank and EMBL. Those three databases exchange their data on a daily basis. > (3) Can GI numbers be used to retreive the sequences? If so then how? Have you looked at Bio::DB::Eutilities ? See the 'HOWTOs' page in the Bioperl Wiki Regards, Hans > Answers to these questions would be greatly appreciated. I am very new to > Perl/Bioperl and am not really familiar with the advanced programming > features, so I would need to your help to find my way out of this situation. > > Many Thanks > Sandipan > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu Nov 19 13:47:16 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 19 Nov 2009 07:47:16 -0600 Subject: [Bioperl-l] accessing EMBL database In-Reply-To: References: Message-ID: <95D416ED-7630-40A1-ABA5-A3C3525D25B1@illinois.edu> On Nov 19, 2009, at 7:23 AM, Hotz, Hans-Rudolf wrote: > > Sandipan > > >> I have 3 questions all related to the retreival of sequences from online >> databases. >> >> (1) I have been trying to download a protein sequence from the EMBL database >> and trying to write the sequence into a text file, as a string. I am using the >> following code: >> >> use Bio::DB::EMBL; >> open b,">","s.txt"; >> $em_obj = Bio::DB::EMBL->new; >> $seq_obj = $em_obj->get_Seq_by_acc("CAB95729"); >> $s_str = $seq_obj->seq; >> print b "$s_str\n"; >> close b; >> >> The script is not working and gives the messege: >> "MSG: EMBL stream with no ID. Not embl in my book >> STACK: Error::throw >> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm: 368 >> STACK: Bio::SeqIO::embl::next_seq C:/Perl/site/lib/Bio\SeqIO\embl.pm: 203 >> STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc >> C:/Perl/site/lib/Bio/DB/WebDBSeqI.pm: 194 >> STACK: trial2.pl" >> >> I am not sure what this means. A similar version of the script works for the >> Swissprot, GenBank and RefSeq databases but not for the EMBL. What is the way >> around this so that I can download the embl sequence? > > "CAB95729" is a protein sequence, ie a translation of the CDS of > 'AJ277028.1'. > > As far as I know, Bio::DB::EMBL is only designed to get EMBL entries, ie the > nucleotides sequence > > > >> (2) Also, is there anyway I can download sequences from DDBJ (database of >> Japan)? > > Unless, for network/speed reason, why do you want to download data from > DDBJ? It contains the same data as GenBank and EMBL. Those three databases > exchange their data on a daily basis. > >> (3) Can GI numbers be used to retreive the sequences? If so then how? > > Have you looked at Bio::DB::Eutilities ? See the 'HOWTOs' page in the > Bioperl Wiki > > > > Regards, Hans > > > >> Answers to these questions would be greatly appreciated. I am very new to >> Perl/Bioperl and am not really familiar with the advanced programming >> features, so I would need to your help to find my way out of this situation. >> >> Many Thanks >> Sandipan To add to that, if you want the protein sequences as a Bio::Seq you can use Bio::DB::GenPept (Bio::DB::EUtilities will retrieve raw data only). chris From David.Messina at sbc.su.se Thu Nov 19 14:04:55 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 19 Nov 2009 15:04:55 +0100 Subject: [Bioperl-l] accessing EMBL database In-Reply-To: References: <475F74057A618245A773CD325E105D1E033334AC@phy-srv01.physiol.physiology.wisc.edu> Message-ID: > I would agree that the error message is not really informative. Agreed that it could be better, but I wonder whether part of the problem with BioPerl error messages is the stack dump. I think a lot of eyes just glaze right over when they see a big wad of complicated stuff, with colons and slashes and line numbers, spewing out at them. Perhaps the stack dump should be turned off by default? Wouldn't this: ERROR: EMBL stream with no ID. Not embl in my book Be a lot clearer than this?: MSG: EMBL stream with no ID. Not embl in my book STACK: Error::throw STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm: 368 STACK: Bio::SeqIO::embl::next_seq C:/Perl/site/lib/Bio\SeqIO\embl.pm: 203 STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc C:/Perl/site/lib/Bio/DB/WebDBSeqI.pm: 194 STACK: trial2.pl Just a thought. This has probably been discussed before. Dave From maj at fortinbras.us Thu Nov 19 14:17:05 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 19 Nov 2009 09:17:05 -0500 Subject: [Bioperl-l] accessing EMBL database In-Reply-To: References: <475F74057A618245A773CD325E105D1E033334AC@phy-srv01.physiol.physiology.wisc.edu> Message-ID: I'm inclined to agree. Lots of responses to questions here that begin "Well, as the error message said, you need to check...", which means people tend towards "I broke it! Write the list!". I do find it hairy when my errors are way down in the object tree. ----- Original Message ----- From: "Dave Messina" To: "Mark A. Jensen" Cc: Sent: Thursday, November 19, 2009 9:04 AM Subject: Re: [Bioperl-l] accessing EMBL database > I would agree that the error message is not really informative. Agreed that it could be better, but I wonder whether part of the problem with BioPerl error messages is the stack dump. I think a lot of eyes just glaze right over when they see a big wad of complicated stuff, with colons and slashes and line numbers, spewing out at them. Perhaps the stack dump should be turned off by default? Wouldn't this: ERROR: EMBL stream with no ID. Not embl in my book Be a lot clearer than this?: MSG: EMBL stream with no ID. Not embl in my book STACK: Error::throw STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm: 368 STACK: Bio::SeqIO::embl::next_seq C:/Perl/site/lib/Bio\SeqIO\embl.pm: 203 STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc C:/Perl/site/lib/Bio/DB/WebDBSeqI.pm: 194 STACK: trial2.pl Just a thought. This has probably been discussed before. Dave From rtbio.2009 at gmail.com Thu Nov 19 14:55:27 2009 From: rtbio.2009 at gmail.com (Roopa Raghuveer) Date: Thu, 19 Nov 2009 15:55:27 +0100 Subject: [Bioperl-l] Remote blast Message-ID: Hello everybody, I have a problem. I would like to use remote blast to find sequences matching for an input sequence. Ex:-I would like to search sequences which match Trypanosoma Brucei sequence. I want the output to be only Trypanosoma Brucei sequences matching with my query.When i tried to use remoteblast to nr database,I got sequences from different organisms like E.coli,Pseudomonas etc., Could you please tell me how can this be solved...? My code is as follows. use Bio::Tools::Run::RemoteBlast; use strict; my $prog = 'blastn'; my $db = 'nr'; my $e_val= '1e-10'; my $organism= 'Trypanosoma Brucei'; my @params = ( '-prog' => $prog, '-data' => $db, '-expect' => $e_val, '-readmethod' => 'SearchIO', '-Organism' => $organism ); my $factory = Bio::Tools::Run::RemoteBlast-> new(@params); #change a paramter #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma brucei[ORGN]' #remove a parameter #delete $Bio::Tools::Run::RemoteBlast::HEADER{'FILTER'}; my $v = 1; #$v is just to turn on and off the messages my $str = Bio::SeqIO->new(-file=>'amino.fa' , '-format' => 'fasta' , '-organism' => 'Trypanosoma Brucei' ); while (my $input = $str->next_seq()){ #Blast a sequence against a database: my $r = $factory->submit_blast($input); #my $r = $factory->submit_blast('amino.fa'); print STDERR "waiting..." if( $v > 0 ); while ( my @rids = $factory->each_rid ) { foreach my $rid ( @rids ) { my $rc = $factory->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $factory->remove_rid($rid); } print STDERR "." if ( $v > 0 ); sleep 5; } else { my $result = $rc->next_result(); #save the output my $filename = $result->query_name()."\.out"; $factory->save_output($filename); $factory->remove_rid($rid); print "\nQuery Name: ", $result->query_name(), "\n"; while ( my $hit = $result->next_hit ) { next unless ( $v > 0); print "\thit name is ", $hit->name, "\n"; while( my $hsp = $hit->next_hsp ) { print "\t\tscore is ", $hsp->score, "\n"; } } } } } } My input sequence is >ref|NC_009512.1|:385-1902 GTGTCAGTGGAACTTTGGCAGCAGTGCGTGGAGCTTCTGCGCGATGAACTGCCTGCCCAGCAATTCAACA CCTGGATCCGTCCGCTACAGGTCGAAGCCGAAGGCGACGAGTTGCGCGTCTATGCGCCTAACCGTTTCGT TCTCGATTGGGTCAATGAAAAGTACCTGGGTCGTTTGCTCGAGCTGTTGGGTGAGAACGGTAGCGGCATT GCACCAGCCCTTTCCTTATTAATAGGTAGCCGCCGCAGCTCGGCCCCAAGGGCTGCACCCAACGCGCCGG TCAGCGCTGCCGTTGCGGCTTCGCTGGCGCAGACTCAGGCGCACAAGACGGCCCCGGCAGCAGCGGTTGA ACCCGTTGCCGTGGCCGCGGCCGAGCCTGTATTGGTCGAGACGTCTTCGCGTGACAGCTTTGATGCCATG GCCGAGCCTGCTGCTGCGCCGCCCAGTGGTGGCCGGGCTGAACAGCGCACCGTGCAGGTTGAAGGTGCGC TCAAGCACACCAGTTACCTGAACCGGACCTTTACCTTTGACACCTTCGTCGAAGGTAAGTCGAACCAGCT CGCCCGCGCGGCTGCCTGGCAGGTTGCGGACAACCCTAAGCATGGCTACAACCCACTGTTCCTTTATGGC GGTGTGGGTTTGGGTAAAACCCACCTTATGCATGCTGTGGGTAACCATCTGCTGAAGAAGAATCCGAACG CCAAGGTGGTGTACCTGCATTCGGAGCGCTTCGTCGCGGACATGGTCAAAGCGTTGCAACTCAACGCCAT CAACGAATTCAAGCGCTTCTACCGCTCGGTGGACGCGTTGCTGATCGACGATATCCAGTTCTTCGCTCGC AAAGAGCGCTCGCAAGAAGAGTTTTTCCACACCTTCAACGCCTTGCTTGAGGGTGGCCAGCAGGTAATCC TTACCTCTGACCGCTATCCCAAGGAAATCGAAGGCCTGGAAGAGCGTCTGAAGTCGCGCTTTGGTTGGGG CCTGACGGTGGCTGTCGAGCCGCCAGAGCTGGAGACCCGCGTAGCGATCCTGATGAAGAAGGCCGACCAG GCCAAAGTCGAGCTCCCGCATGACGCAGCCTTTTTCATCGCTCAGCGCATCCGGTCCAACGTCCGTGAGC TGGAAGGTGCACTGAAGCGAGTTATTGCTCACTCGCACTTCATGGGGCGTGACATCACCATCGAGCTGAT TCGTGAATCGCTCAAGGATCTGTTGGCGCTGCAAGACAAACTGGTCAGTGTGGATAACATTCAGCGTACC GTCGCTGAGTACTACAAGATCAAGATCTCCGATCTGTTGTCCAAGCGTCGTTCGCGTTCTGTCGCGCGCC CGCGTCAGGTAGCCATGGCCCTGTCCAAGGAGTTGACCAACCACAGTCTGCCGGAAATCGGCGACATGTT CGGTGGTCGCGACCATACGACCGTGCTGCACGCCTGCCGCAAAATCAATGAACTGAAGGAATCCGACGCG GACATCCGCGAGGACTACAAGAACCTGCTGCGGACGCTGACGACCTGA Please mail me regarding any queries. Regards, Roopa. From cjfields at illinois.edu Thu Nov 19 15:30:34 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 19 Nov 2009 09:30:34 -0600 Subject: [Bioperl-l] verbosity and error stack, was accessing EMBL database In-Reply-To: References: <475F74057A618245A773CD325E105D1E033334AC@phy-srv01.physiol.physiology.wisc.edu> Message-ID: Mark, Dave, This could be based on verbose(). Level w t d st verbose < 0 - + - -/+ verbose 0 + + - -/+ verbose 1 + + + +/+ verbose > 1 +* -> + + +/+ * converts to throw() w = warn t = throw d = debug st = stack trace warn() is set up that way now, you don't get a stack trace unless verbose() is > 0. throw() could be the same; would be a simple fix, really. My only problem with the current state of things is (I think we've delved down this path before) verbosity level is tied to exception strictness as seen above, and they're really two separate concepts, at least to me. Verbosity of 1 or more doesn't necessarily mean I want an elevated level of strictness along with it. For instance, one might want very strict exceptions w/o the noise, or (conversely) lots of debugging output but no warnings. (aside: another small nit, but I haven't exactly liked that the global level of strictness is designated by a env. variable with DEBUG in the name, but that's just me). I've been thinking it would be nice to have simple separate verbose/strict switches (this is the way it's implemented in Biome). This would allow some finer grained control over output: Level d st verbose 0 - - verbose 1 + + Default = BIOPERLDEBUG || 0 # current situation Level w t strict -1 - + strict 0 + + strict 1 +* -> + * converts to throw() Default = BIOPERLSTRICT || 0 We could even allow finer-grained control of verbosity (states which cover all combinations) w/o affecting strictness. chris On Nov 19, 2009, at 8:17 AM, Mark A. Jensen wrote: > I'm inclined to agree. Lots of responses to questions here that begin > "Well, as the error message said, you need to check...", which means > people tend towards "I broke it! Write the list!". I do find it hairy when > my errors are way down in the object tree. > ----- Original Message ----- From: "Dave Messina" > To: "Mark A. Jensen" > Cc: > Sent: Thursday, November 19, 2009 9:04 AM > Subject: Re: [Bioperl-l] accessing EMBL database > > >> I would agree that the error message is not really informative. > > Agreed that it could be better, but I wonder whether part of the problem with BioPerl error messages is the stack dump. > > I think a lot of eyes just glaze right over when they see a big wad of complicated stuff, with colons and slashes and line numbers, spewing out at them. > > Perhaps the stack dump should be turned off by default? > > Wouldn't this: > > ERROR: EMBL stream with no ID. Not embl in my book > > > > Be a lot clearer than this?: > > MSG: EMBL stream with no ID. Not embl in my book > STACK: Error::throw > STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm: 368 > STACK: Bio::SeqIO::embl::next_seq C:/Perl/site/lib/Bio\SeqIO\embl.pm: 203 > STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc C:/Perl/site/lib/Bio/DB/WebDBSeqI.pm: 194 > STACK: trial2.pl > > > > Just a thought. This has probably been discussed before. > Dave > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From roy.chaudhuri at gmail.com Thu Nov 19 16:10:28 2009 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Thu, 19 Nov 2009 16:10:28 +0000 Subject: [Bioperl-l] Remote blast In-Reply-To: References: Message-ID: <4B056DF4.2030502@gmail.com> Hi Roopa, I think that the -Organism parameter that you specify for Bio::Tools::Run::RemoteBlast is ignored - I can't find any reference to it in the documentation: http://search.cpan.org/~cjfields/BioPerl-1.6.1/Bio/Tools/Run/RemoteBlast.pm You have the correct approach in your code - limiting the search to the Entrez query "Trypanosoma brucei[ORGN]", but the line is commented out. If you uncomment the line (and add a semicolon afterwards), the program runs correctly, but no hits are reported below your threshold e-value. If you change the value of $e_val to 10 then some T.brucei hits are reported. Roy. Roopa Raghuveer wrote: > Hello everybody, > > I have a problem. I would like to use remote blast to find sequences > matching for an input sequence. > > Ex:-I would like to search sequences which match Trypanosoma Brucei > sequence. > > I want the output to be only Trypanosoma Brucei sequences matching with my > query.When i tried to use remoteblast to nr database,I got sequences from > different organisms like E.coli,Pseudomonas etc., > > Could you please tell me how can this be solved...? > > My code is as follows. > > use Bio::Tools::Run::RemoteBlast; > use strict; > my $prog = 'blastn'; > my $db = 'nr'; > my $e_val= '1e-10'; > my $organism= 'Trypanosoma Brucei'; > > my @params = ( '-prog' => $prog, > '-data' => $db, > '-expect' => $e_val, > '-readmethod' => 'SearchIO', > '-Organism' => $organism ); > > my $factory = Bio::Tools::Run::RemoteBlast-> > new(@params); > > #change a paramter > #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma > brucei[ORGN]' > > #remove a parameter > #delete $Bio::Tools::Run::RemoteBlast::HEADER{'FILTER'}; > > my $v = 1; > #$v is just to turn on and off the messages > > my $str = Bio::SeqIO->new(-file=>'amino.fa' , '-format' => 'fasta' , > '-organism' => 'Trypanosoma Brucei' ); > > while (my $input = $str->next_seq()){ > #Blast a sequence against a database: > my $r = $factory->submit_blast($input); > #my $r = $factory->submit_blast('amino.fa'); > > print STDERR "waiting..." if( $v > 0 ); > while ( my @rids = $factory->each_rid ) { > foreach my $rid ( @rids ) { > my $rc = $factory->retrieve_blast($rid); > if( !ref($rc) ) { > if( $rc < 0 ) { > $factory->remove_rid($rid); > } > print STDERR "." if ( $v > 0 ); > sleep 5; > } > else { > my $result = $rc->next_result(); > #save the output > my $filename = $result->query_name()."\.out"; > $factory->save_output($filename); > $factory->remove_rid($rid); > print "\nQuery Name: ", $result->query_name(), "\n"; > while ( my $hit = $result->next_hit ) { > next unless ( $v > 0); > print "\thit name is ", $hit->name, "\n"; > while( my $hsp = $hit->next_hsp ) { > print "\t\tscore is ", $hsp->score, "\n"; > } > } > } > } > } > } > > My input sequence is > >> ref|NC_009512.1|:385-1902 > GTGTCAGTGGAACTTTGGCAGCAGTGCGTGGAGCTTCTGCGCGATGAACTGCCTGCCCAGCAATTCAACA > CCTGGATCCGTCCGCTACAGGTCGAAGCCGAAGGCGACGAGTTGCGCGTCTATGCGCCTAACCGTTTCGT > TCTCGATTGGGTCAATGAAAAGTACCTGGGTCGTTTGCTCGAGCTGTTGGGTGAGAACGGTAGCGGCATT > GCACCAGCCCTTTCCTTATTAATAGGTAGCCGCCGCAGCTCGGCCCCAAGGGCTGCACCCAACGCGCCGG > TCAGCGCTGCCGTTGCGGCTTCGCTGGCGCAGACTCAGGCGCACAAGACGGCCCCGGCAGCAGCGGTTGA > ACCCGTTGCCGTGGCCGCGGCCGAGCCTGTATTGGTCGAGACGTCTTCGCGTGACAGCTTTGATGCCATG > GCCGAGCCTGCTGCTGCGCCGCCCAGTGGTGGCCGGGCTGAACAGCGCACCGTGCAGGTTGAAGGTGCGC > TCAAGCACACCAGTTACCTGAACCGGACCTTTACCTTTGACACCTTCGTCGAAGGTAAGTCGAACCAGCT > CGCCCGCGCGGCTGCCTGGCAGGTTGCGGACAACCCTAAGCATGGCTACAACCCACTGTTCCTTTATGGC > GGTGTGGGTTTGGGTAAAACCCACCTTATGCATGCTGTGGGTAACCATCTGCTGAAGAAGAATCCGAACG > CCAAGGTGGTGTACCTGCATTCGGAGCGCTTCGTCGCGGACATGGTCAAAGCGTTGCAACTCAACGCCAT > CAACGAATTCAAGCGCTTCTACCGCTCGGTGGACGCGTTGCTGATCGACGATATCCAGTTCTTCGCTCGC > AAAGAGCGCTCGCAAGAAGAGTTTTTCCACACCTTCAACGCCTTGCTTGAGGGTGGCCAGCAGGTAATCC > TTACCTCTGACCGCTATCCCAAGGAAATCGAAGGCCTGGAAGAGCGTCTGAAGTCGCGCTTTGGTTGGGG > CCTGACGGTGGCTGTCGAGCCGCCAGAGCTGGAGACCCGCGTAGCGATCCTGATGAAGAAGGCCGACCAG > GCCAAAGTCGAGCTCCCGCATGACGCAGCCTTTTTCATCGCTCAGCGCATCCGGTCCAACGTCCGTGAGC > TGGAAGGTGCACTGAAGCGAGTTATTGCTCACTCGCACTTCATGGGGCGTGACATCACCATCGAGCTGAT > TCGTGAATCGCTCAAGGATCTGTTGGCGCTGCAAGACAAACTGGTCAGTGTGGATAACATTCAGCGTACC > GTCGCTGAGTACTACAAGATCAAGATCTCCGATCTGTTGTCCAAGCGTCGTTCGCGTTCTGTCGCGCGCC > CGCGTCAGGTAGCCATGGCCCTGTCCAAGGAGTTGACCAACCACAGTCTGCCGGAAATCGGCGACATGTT > CGGTGGTCGCGACCATACGACCGTGCTGCACGCCTGCCGCAAAATCAATGAACTGAAGGAATCCGACGCG > GACATCCGCGAGGACTACAAGAACCTGCTGCGGACGCTGACGACCTGA > > Please mail me regarding any queries. > > Regards, > Roopa. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From clements at nescent.org Thu Nov 19 17:46:32 2009 From: clements at nescent.org (Dave Clements) Date: Thu, 19 Nov 2009 18:46:32 +0100 Subject: [Bioperl-l] how to visualize multiple sequences alignments In-Reply-To: References: Message-ID: Hi Xiaoyu, I would also take a look at GBrowse_syn, a perl based solution built with the GBrowse genome browser framework. See http://gmod.org/wiki/GBrowse_syn. Cheers, Dave C. On Wed, Nov 18, 2009 at 7:23 PM, Jason Stajich wrote: > try jalview http://www.jalview.org/ > > > On Nov 18, 2009, at 9:18 AM, Xiaoyu Liang wrote: > > Hi, >> >> I'm wondering Is there any modules that can be used for visualizing >> multiple >> sequences alignments? like the result from ClustalW? >> >> Thank you very much, >> Xiaoyu >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- http://gmod.org/wiki/GMOD_News http://gmod.org/wiki/January_2010_GMOD_Meeting From maj at fortinbras.us Thu Nov 19 23:37:05 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 19 Nov 2009 18:37:05 -0500 Subject: [Bioperl-l] verbosity and error stack, was accessing EMBL database In-Reply-To: References: <475F74057A618245A773CD325E105D1E033334AC@phy-srv01.physiol.physiology.wisc.edu> Message-ID: I like this verbose/strict separability a lot. Should we go for it? ----- Original Message ----- From: "Chris Fields" To: "Mark A. Jensen" Cc: Sent: Thursday, November 19, 2009 10:30 AM Subject: [Bioperl-l] verbosity and error stack, was accessing EMBL database > Mark, Dave, > > This could be based on verbose(). > > Level w t d st > verbose < 0 - + - -/+ > verbose 0 + + - -/+ > verbose 1 + + + +/+ > verbose > 1 +* -> + + +/+ > * converts to throw() > w = warn > t = throw > d = debug > st = stack trace > > warn() is set up that way now, you don't get a stack trace unless verbose() is > > 0. throw() could be the same; would be a simple fix, really. > > My only problem with the current state of things is (I think we've delved down > this path before) verbosity level is tied to exception strictness as seen > above, and they're really two separate concepts, at least to me. Verbosity of > 1 or more doesn't necessarily mean I want an elevated level of strictness > along with it. For instance, one might want very strict exceptions w/o the > noise, or (conversely) lots of debugging output but no warnings. > > (aside: another small nit, but I haven't exactly liked that the global level > of strictness is designated by a env. variable with DEBUG in the name, but > that's just me). > > I've been thinking it would be nice to have simple separate verbose/strict > switches (this is the way it's implemented in Biome). This would allow some > finer grained control over output: > > Level d st > verbose 0 - - > verbose 1 + + > Default = BIOPERLDEBUG || 0 # current situation > > Level w t > strict -1 - + > strict 0 + + > strict 1 +* -> + > * converts to throw() > Default = BIOPERLSTRICT || 0 > > We could even allow finer-grained control of verbosity (states which cover all > combinations) w/o affecting strictness. > > chris > > On Nov 19, 2009, at 8:17 AM, Mark A. Jensen wrote: > >> I'm inclined to agree. Lots of responses to questions here that begin >> "Well, as the error message said, you need to check...", which means >> people tend towards "I broke it! Write the list!". I do find it hairy when >> my errors are way down in the object tree. >> ----- Original Message ----- From: "Dave Messina" >> To: "Mark A. Jensen" >> Cc: >> Sent: Thursday, November 19, 2009 9:04 AM >> Subject: Re: [Bioperl-l] accessing EMBL database >> >> >>> I would agree that the error message is not really informative. >> >> Agreed that it could be better, but I wonder whether part of the problem with >> BioPerl error messages is the stack dump. >> >> I think a lot of eyes just glaze right over when they see a big wad of >> complicated stuff, with colons and slashes and line numbers, spewing out at >> them. >> >> Perhaps the stack dump should be turned off by default? >> >> Wouldn't this: >> >> ERROR: EMBL stream with no ID. Not embl in my book >> >> >> >> Be a lot clearer than this?: >> >> MSG: EMBL stream with no ID. Not embl in my book >> STACK: Error::throw >> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm: 368 >> STACK: Bio::SeqIO::embl::next_seq C:/Perl/site/lib/Bio\SeqIO\embl.pm: 203 >> STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc >> C:/Perl/site/lib/Bio/DB/WebDBSeqI.pm: 194 >> STACK: trial2.pl >> >> >> >> Just a thought. This has probably been discussed before. >> Dave >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From michael.watson at bbsrc.ac.uk Fri Nov 20 10:07:10 2009 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Fri, 20 Nov 2009 10:07:10 +0000 Subject: [Bioperl-l] strand in Bio::SearchIO when parsing fasta35 output In-Reply-To: <8D08960C647E64438CE5740657CBBDC501487319B6@iahcexch1.iah.bbsrc.ac.uk> References: <8D08960C647E64438CE5740657CBBDC501487319AE@iahcexch1.iah.bbsrc.ac.uk> <9994F70B-AE92-4425-9AAC-E9A2DC26964E@bioperl.org> <8D08960C647E64438CE5740657CBBDC501487319B6@iahcexch1.iah.bbsrc.ac.uk> Message-ID: <8D08960C647E64438CE5740657CBBDC50148731CEB@iahcexch1.iah.bbsrc.ac.uk> Hello I was just wondering if anyone had had time to look into this? I posted a bug: http://bugzilla.open-bio.org/show_bug.cgi?id=2937 Thanks Mick -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of michael watson (IAH-C) Sent: 27 October 2009 09:01 To: 'Jason Stajich' Cc: bioperl-l Subject: Re: [Bioperl-l] strand in Bio::SearchIO when parsing fasta35 output Hi Jason They both print 0 also. A bug report it is Mick -----Original Message----- From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of Jason Stajich Sent: 26 October 2009 18:46 To: michael watson (IAH-C) Cc: bioperl-l Subject: Re: [Bioperl-l] strand in Bio::SearchIO when parsing fasta35 output Is this -m9 -d 0 output or standard default? I think the strand is parsed in the HSP parsing. Can you double check what $hsp->query->strand and $hsp->hit->strand prints? A full example report as a bug request will be next step if that doesn't resolve. -jason On Oct 26, 2009, at 10:04 AM, michael watson (IAH-C) wrote: > Dear all > > Where does this go? Perhaps I am doing something wrong. > > Fasta35 output puts the strand in the hit list at the top: > > cluster_99033:3 ( 23) [r] 115 37.9 > 0.0011 > cluster_79238:1 ( 27) [f] 126 38.0 0.00097 0.963 > 0.963 27 > > The [r] stands for reverse and the [f] stands for forward. > > There is also the text "rev-comp" after the hit line further down. > > However, when I parse fasta35 output using SearchIO and output the > strand of the HSP: > > print $hsp->strand('hit'), ","; > print $hsp->strand('query'), "\n"; > > This simply prints out 0, 0 (I assume 0 is the default in BioPerl > for "I don't know which strand it's on"). > > So the information is there, but it's not getting parsed. > Alternatively, I've missed something and will feel a bit foolish. > > Currently using BioPerl 1.6.0 > > Thanks > Mick > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Fri Nov 20 10:15:11 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 20 Nov 2009 11:15:11 +0100 Subject: [Bioperl-l] verbosity and error stack, was accessing EMBL database In-Reply-To: References: <475F74057A618245A773CD325E105D1E033334AC@phy-srv01.physiol.physiology.wisc.edu> Message-ID: <3277368F-615A-4DD3-B9B3-5D32A5EEEE98@sbc.su.se> Chris, I took a look at how you implemented this in Biome -- very nice! > I like this verbose/strict separability a lot. Should we go for it? Me too. So yes, I think so. > We could even allow finer-grained control of verbosity (states which cover all combinations) w/o affecting strictness. Perhaps this is a job for Log::Log4Perl or Log::Dispatch? http://search.cpan.org/~mschilli/Log-Log4perl-1.25/lib/Log/Log4perl.pm http://search.cpan.org/~drolsky/Log-Dispatch-2.26/lib/Log/Dispatch.pm That might be overkill, though. Dave From roychu at gmail.com Fri Nov 20 10:21:54 2009 From: roychu at gmail.com (Chu, Roy) Date: Fri, 20 Nov 2009 02:21:54 -0800 Subject: [Bioperl-l] Installing Bio-perl on dreamhost via CPAN Message-ID: <4d7f3e450911200221m5f39ace2hb979712115fb9d78@mail.gmail.com> Hi, Does anyone use dreamhost as a web hosting service? I'm just curious if anyone has had any luck installing the module as their daemon seems to kill my process whenever I try to install it. Dreamhost tech support attributes it to either exceeding the allocated memory cache or exceeding the processing time. I tried to nice the process, but that didn't help for me. Any luck or experience in resolving this would be much appreciated. I suppose my next attempt would be to try installing it directly and hope I don't need root... Thanks, Roy From s.denaxas at gmail.com Fri Nov 20 10:27:42 2009 From: s.denaxas at gmail.com (Spiros Denaxas) Date: Fri, 20 Nov 2009 11:27:42 +0100 Subject: [Bioperl-l] Installing Bio-perl on dreamhost via CPAN In-Reply-To: <4d7f3e450911200221m5f39ace2hb979712115fb9d78@mail.gmail.com> References: <4d7f3e450911200221m5f39ace2hb979712115fb9d78@mail.gmail.com> Message-ID: Hello, normally you don't need to be root - http://sial.org/howto/perl/life-with-cpan/non-root/ Kind of disturbing that their tech support cannot give you a straight answer on what they are killing the process. Good luck Spiros On Fri, Nov 20, 2009 at 11:21 AM, Chu, Roy wrote: > ?I suppose my next attempt would be to try > installing it directly and hope I don't need root... > > Thanks, > Roy > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From charles-listes+bioperl at plessy.org Fri Nov 20 10:44:45 2009 From: charles-listes+bioperl at plessy.org (Charles Plessy) Date: Fri, 20 Nov 2009 19:44:45 +0900 Subject: [Bioperl-l] Installing Bio-perl on dreamhost via CPAN In-Reply-To: <4d7f3e450911200221m5f39ace2hb979712115fb9d78@mail.gmail.com> References: <4d7f3e450911200221m5f39ace2hb979712115fb9d78@mail.gmail.com> Message-ID: <20091120104445.GG31318@kunpuu.plessy.org> Le Fri, Nov 20, 2009 at 02:21:54AM -0800, Chu, Roy a ?crit : > > Does anyone use dreamhost as a web hosting service? I'm just curious > if anyone has had any luck installing the module as their daemon seems > to kill my process whenever I try to install it. Dreamhost tech > support attributes it to either exceeding the allocated memory cache > or exceeding the processing time. I tried to nice the process, but > that didn't help for me. Any luck or experience in resolving this > would be much appreciated. I suppose my next attempt would be to try > installing it directly and hope I don't need root... Dear Roy, DreamHost uses Debian, so you can suggest them to install the Debian package. If you are in contact with the tech service, do not hesitate to tell them to contact me if they are interested by a backport of the 1.6.0 package. For version 1.6.1, it may be more difficult as it depends on perl 5.10.1. PS: if you propse to install BioPerl as a feature in the Dreamhost panel, I will vote for it :) Have a nice day, -- Charles Plessy Debian Med packaging team, http://www.debian.org/devel/debian-med Tsurumi, Kanagawa, Japan From cjfields at illinois.edu Fri Nov 20 12:51:39 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 20 Nov 2009 06:51:39 -0600 Subject: [Bioperl-l] strand in Bio::SearchIO when parsing fasta35 output In-Reply-To: <8D08960C647E64438CE5740657CBBDC50148731CEB@iahcexch1.iah.bbsrc.ac.uk> References: <8D08960C647E64438CE5740657CBBDC501487319AE@iahcexch1.iah.bbsrc.ac.uk> <9994F70B-AE92-4425-9AAC-E9A2DC26964E@bioperl.org> <8D08960C647E64438CE5740657CBBDC501487319B6@iahcexch1.iah.bbsrc.ac.uk> <8D08960C647E64438CE5740657CBBDC50148731CEB@iahcexch1.iah.bbsrc.ac.uk> Message-ID: Mick, Short answer, no. It was in the queue to be fixed at some point in 1.6.x, but that queue is quite long. I'm pushing it into the queue specifically for 1.6.2, so it should be addressed soon. chris On Nov 20, 2009, at 4:07 AM, michael watson (IAH-C) wrote: > Hello > > I was just wondering if anyone had had time to look into this? > > I posted a bug: http://bugzilla.open-bio.org/show_bug.cgi?id=2937 > > Thanks > Mick > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of michael watson (IAH-C) > Sent: 27 October 2009 09:01 > To: 'Jason Stajich' > Cc: bioperl-l > Subject: Re: [Bioperl-l] strand in Bio::SearchIO when parsing fasta35 output > > Hi Jason > > They both print 0 also. > > A bug report it is > > Mick > > -----Original Message----- > From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of Jason Stajich > Sent: 26 October 2009 18:46 > To: michael watson (IAH-C) > Cc: bioperl-l > Subject: Re: [Bioperl-l] strand in Bio::SearchIO when parsing fasta35 output > > > Is this -m9 -d 0 output or standard default? I think the strand is > parsed in the HSP parsing. > > Can you double check what $hsp->query->strand and $hsp->hit->strand > prints? > > A full example report as a bug request will be next step if that > doesn't resolve. > > -jason > On Oct 26, 2009, at 10:04 AM, michael watson (IAH-C) wrote: > >> Dear all >> >> Where does this go? Perhaps I am doing something wrong. >> >> Fasta35 output puts the strand in the hit list at the top: >> >> cluster_99033:3 ( 23) [r] 115 37.9 >> 0.0011 >> cluster_79238:1 ( 27) [f] 126 38.0 0.00097 0.963 >> 0.963 27 >> >> The [r] stands for reverse and the [f] stands for forward. >> >> There is also the text "rev-comp" after the hit line further down. >> >> However, when I parse fasta35 output using SearchIO and output the >> strand of the HSP: >> >> print $hsp->strand('hit'), ","; >> print $hsp->strand('query'), "\n"; >> >> This simply prints out 0, 0 (I assume 0 is the default in BioPerl >> for "I don't know which strand it's on"). >> >> So the information is there, but it's not getting parsed. >> Alternatively, I've missed something and will feel a bit foolish. >> >> Currently using BioPerl 1.6.0 >> >> Thanks >> Mick >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Fri Nov 20 13:00:45 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 20 Nov 2009 07:00:45 -0600 Subject: [Bioperl-l] Installing Bio-perl on dreamhost via CPAN In-Reply-To: <20091120104445.GG31318@kunpuu.plessy.org> References: <4d7f3e450911200221m5f39ace2hb979712115fb9d78@mail.gmail.com> <20091120104445.GG31318@kunpuu.plessy.org> Message-ID: On Nov 20, 2009, at 4:44 AM, Charles Plessy wrote: > Le Fri, Nov 20, 2009 at 02:21:54AM -0800, Chu, Roy a ?crit : >> >> Does anyone use dreamhost as a web hosting service? I'm just curious >> if anyone has had any luck installing the module as their daemon seems >> to kill my process whenever I try to install it. Dreamhost tech >> support attributes it to either exceeding the allocated memory cache >> or exceeding the processing time. I tried to nice the process, but >> that didn't help for me. Any luck or experience in resolving this >> would be much appreciated. I suppose my next attempt would be to try >> installing it directly and hope I don't need root... > > Dear Roy, > > DreamHost uses Debian, so you can suggest them to install the Debian package. > If you are in contact with the tech service, do not hesitate to tell them to > contact me if they are interested by a backport of the 1.6.0 package. For > version 1.6.1, it may be more difficult as it depends on perl 5.10.1. Any reason why this is so? We specify compatibility back to 5.6.1. Alex mentioned the reliance on the specific Extutils::Manifest version. The version requested has an important bug fix, is present on CPAN, and is backwards-compatible to 5.6.1. It should be fairly easy to request that as a separate package. A strict requirement for perl 5.10.1 doesn't make sense in that light, unless said perl maintainer can enlighten us as to why this is an issue? This one may require a ranty blog post. > PS: if you propse to install BioPerl as a feature in the Dreamhost panel, I > will vote for it :) > > Have a nice day, > > -- > Charles Plessy > Debian Med packaging team, > http://www.debian.org/devel/debian-med > Tsurumi, Kanagawa, Japan chris From rtbio.2009 at gmail.com Fri Nov 20 15:52:09 2009 From: rtbio.2009 at gmail.com (Roopa Raghuveer) Date: Fri, 20 Nov 2009 16:52:09 +0100 Subject: [Bioperl-l] Remote blast In-Reply-To: References: <4B056DF4.2030502@gmail.com> Message-ID: Hello everybody, I have tried to use Remote blast on Trypanasoma brucei sequences and could get certain hits.But I am unable to retrieve the complete sequence from where I got hits. i.e., I am unable to parse the blast output file for getting the complete sequences of the hits. Here is my code. #!/usr/bin/perl -w use Bio::SearchIO; my $blast_report = new Bio::SearchIO ('-format' => 'blast', '-file' => $ARGV[0]); my $result = $blast_report->next_result; my $level = $ARGV[1]; while( my $hit = $result->next_hit) { print $hit->name; push(@arr1,$hit->name); while( my $hsp = $hit->next_hsp()) { if ($hsp->frac_identical() >= $level) { #print $hsp->hit_string, "\n"; push(@arr,$hsp->hit_string); } } } $k=@arr1; for($i=0;$i<$k;$i++){ push(@arr2,split(/|/,$arr1[$i])); #print "$arr[$i]\n"; } #$t=@arr2; Here,I am trying to use the blast output file and get the complete sequence where I found a hit but I could not get the complete sequence. i/p:- Last login: Mon Nov 16 11:57:22 on console Welcome to Darwin! lmbicip-mac1:~ cip$ ssh admin at 141.84.66.66 The authenticity of host '141.84.66.66 (141.84.66.66)' can't be established. RSA key fingerprint is 2d:4a:09:1d:2e:f3:51:c7:ba:8b:29:37:36:f6:44:db. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added '141.84.66.66' (RSA) to the list of known hosts. Password: Last login: Fri Nov 20 13:52:57 2009 from 10.153.189.239 Have a lot of fun... admin at BosLinux:~> clear admin at BosLinux:~> cd Documents/ admin at BosLinux:~/Documents> clear admin at BosLinux:~/Documents> vim blast.pl admin at BosLinux:~/Documents> clear admin at BosLinux:~/Documents> vim nnn.pl admin at BosLinux:~/Documents> vim other.pl admin at BosLinux:~/Documents> vim amino.fa admin at BosLinux:~/Documents> vim Tb09.211.2410.out admin at BosLinux:~/Documents> vim Tb09.211.2410.out |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 661 TTTGATGAAACCCCAATTCGGACGTATGAAAAGATTCTTGCGGGCCGGCTTAAATTCCCC 720 Query 721 AATTGGTTTGATGAGCGTGCGCGGGATCTCGTAAAGGGTTTATTGCAAACGGATCACACG 780 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 721 AATTGGTTTGATGAGCGTGCGCGGGATCTCGTAAAGGGTTTATTGCAAACGGATCACACG 780 Query 781 AAACGGTTGGGCACGCTGAAGGATGGCGTAGCTGATGTGAAGAATCACCCATTCTTCCGT 840 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 781 AAACGGTTGGGCACGCTGAAGGATGGCGTAGCTGATGTGAAGAATCACCCATTCTTCCGT 840 Query 841 GGTGCGAATTGGGAGAAACTCTATGGACGTCATTATAACGCCCCCATTGCCGTAAAAGTG 900 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 841 GGTGCGAATTGGGAGAAACTCTATGGACGTCATTATAACGCCCCCATTGCCGTAAAAGTG 900 Query 901 AAGAGCCCCGGCGACACAAGTAACTTTGAGTCGTATCCCGAGAGTGGAGATAAGGGTTCT 960 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 901 AAGAGCCCCGGCGACACAAGTAACTTTGAGTCGTATCCCGAGAGTGGAGATAAGGGTTCT 960 Query 961 CCTCCACTAACCCCTTCGCAACAGGTTGCATTCCGTGGTTTTTAG 1005 ||||||||||||||||||||||||||||||||||||||||||||| Sbjct 961 CCTCCACTAACCCCTTCGCAACAGGTTGCATTCCGTGGTTTTTAG 1005 >ref|XM_822286.1| Trypanosoma brucei TREU927 protein kinase A catalytic subunit isoform 2 (Tb09.211.2360) partial mRNA Length=1011 Score = 1622 bits (1798), Expect = 0.0 Identities = 944/974 (96%), Gaps = 0/974 (0%) Strand=Plus/Plus Query 32 TGTTTACCAAGCCTGACACATCGGGATGGAAGCTGAGTGACTTTGAAATGGGTGACACGC 91 |||||||||| ||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 38 TGTTTACCAAACCTGACACATCGGGATGGAAGCTGAGTGACTTTGAAATGGGTGACACGC 97 Query 92 TAGGGACCGGCTCGTTTGGTCGCGTGCGCATTGCAAAACTGAAGAGCAGGGGGGAGTATT 151 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 98 TAGGGACCGGCTCGTTTGGTCGCGTGCGCATTGCAAAACTGAAGAGCAGGGGGGAGTATT 157 Query 152 ATGCAATAAAATGTCTAAAGAAGCATGAGATACTAAAGATGAAGCAGGTACAACACCTGA 211 |||||||||||||||||||||||| ||||||||||||||||||||||||||||||||||| Sbjct 158 ATGCAATAAAATGTCTAAAGAAGCGTGAGATACTAAAGATGAAGCAGGTACAACACCTGA 217 Query 212 ACCAAGAGAAGCAAATTCTAATGGAGTTGTCACACCCCTTCATTGTGAACATGATGTGTT 271 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 218 ACCAAGAGAAGCAAATTCTAATGGAGTTGTCACACCCCTTCATTGTGAACATGATGTGTT 277 uery 272 CCTTCCAGGATGAGAACCGCGTCTACTTTGTTCTAGAATTTGTGGTAGGTGGTGAGGTAT 331 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 278 CCTTCCAGGATGAGAACCGCGTCTACTTTGTTCTAGAATTTGTGGTAGGTGGTGAGGTAT 337 Query 332 TTACTCACCTTCGTTCCGCAGGCCGTTTCCCGAATGACGTAGCGAAGTTCTATCATGCGG 391 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 338 TTACTCACCTTCGTTCCGCAGGCCGTTTCCCGAATGACGTAGCGAAGTTCTATCATGCGG 397 Query 392 AGCTTGTGTTGGCCTTTGAATATTTACACTCGAAGGACATTATCTACCGTGACTTGAAAC 451 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 398 AGCTTGTGTTGGCCTTTGAATATTTACACTCGAAGGACATTATCTACCGTGACTTGAAAC 457 Query 452 CTGAGAATCTGCTACTTGATGGGAAGGGACACGTCAAGGTGACTGATTTTGGTTTTGCTA 511 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 458 CTGAGAATCTGCTACTTGATGGGAAGGGACACGTCAAGGTGACTGATTTTGGTTTTGCTA 517 Query 512 AGAAGGTGACGGATCGTACCTATACGTTATGTGGGACACCTGAGTATCTTGCACCTGAGG 571 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 518 AGAAGGTGACGGATCGTACCTATACGTTATGTGGGACACCTGAGTATCTTGCACCTGAGG 577 Query 572 TAATTCAGAGCAAAGGACATGGGAAGGCTGTGGATTGGTGGACGATGGGTGTTTTGCTGT 631 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| It follows like this. The output I got is ATGACGACAACTCCCACTGGTGATGGCCAACTGTTTACCAAGCCTGACACATCGGGATGGAAGCTGAGTGACTTTGAAATGGGTGACACGCTAGGGACCGGCTCGTTTGGTCGCGTGCGCATTGCAAAACTGAAGAGCAGGGGGGAGTATTATGCAATAAAATGTCTAAAGAAGCATGAGATACTAAAGATGAAGCAGGTACAACACCTGAACCAAGAGAAGCAAATTCTAATGGAGTTGTCACACCCCTTCATTGTGAACATGATGTGTTCCTTCCAGGATGAGAACCGCGTCTACTTTGTTCTAGAATTTGTGGTAGGTGGTGAGGTATTTACTCACCTTCGTTCCGCAGGCCGTTTCCCGAATGACGTAGCGAAGTTCTATCATGCGGAGCTTGTGTTGGCCTTTGAATATTTACACTCGAAGGACATTATCTACCGTGACTTGAAACCTGAGAATCTGCTACTTGATGGGAAGGGACACGTCAAGGTGACTGATTTTGGTTTTGCTAAGAAGGTGACGGATCGTACCTATACGTTATGTGGGACACCTGAGTATCTTGCACCTGAGGTAATTCAGAGCAAAGGACATGGGAAGGCTGTGGATTGGTGGACGATGGGTGTTTTGCTGTATGAATTCATAGCTGGCCATCCTCCCTTTTTTGATGAAACCCCAATTCGGACGTATGAAAAGATTCTTGCGGGCCGGCTTAAATTCCCCAATTGGTTTGATGAGCGTGCGCGGGATCTCGTAAAGGGTTTATTGCAAACGGATCACACGAAACGGTTGGGCACGCTGAAGGATGGCGTAGCTGATGTGAAGAATCACCCATTCTTCCGTGGTGCGAATTGGGAGAAACTCTATGGACGTCATTATAACGCCCCCATTGCCGTAAAAGTGAAGAGCCCCGGCGACACAAGTAACTTTGAGTCGTATCCCGAGAGTGGAGATAAGGGTTCTCCTCCACTAACCCCTTCGCAACAGG TTGCATTCCGTGGTTTTTAG TGTTTACCAAACCTGACACATCGGGATGGAAGCTGAGTGACTTTGAAATGGGTGACACGCTAGGGACCGGCTCGTTTGGTCGCGTGCGCATTGCAAAACTGAAGAGCAGGGGGGAGTATTATGCAATAAAATGTCTAAAGAAGCGTGAGATACTAAAGATGAAGCAGGTACAACACCTGAACCAAGAGAAGCAAATTCTAATGGAGTTGTCACACCCCTTCATTGTGAACATGATGTGTTCCTTCCAGGATGAGAACCGCGTCTACTTTGTTCTAGAATTTGTGGTAGGTGGTGAGGTATTTACTCACCTTCGTTCCGCAGGCCGTTTCCCGAATGACGTAGCGAAGTTCTATCATGCGGAGCTTGTGTTGGCCTTTGAATATTTACACTCGAAGGACATTATCTACCGTGACTTGAAACCTGAGAATCTGCTACTTGATGGGAAGGGACACGTCAAGGTGACTGATTTTGGTTTTGCTAAGAAGGTGACGGATCGTACCTATACGTTATGTGGGACACCTGAGTATCTTGCACCTGAGGTAATTCAGAGCAAAGGACATGGGAAGGCTGTGGATTGGTGGACGATGGGTGTTTTGCTGTATGAATTCATAGCTGGCCATCCTCCCTTTTTTGATGAAACCCCAATTCGGACGTATGAAAAGATTCTTGCGGGCCGGTTCAAATTCCCCAATTGGTTTGACTCCCGTGCGCGGGATCTCGTAAAGGGTTTATTGCAAACGGATCACACGAAACGGTTGGGCACGCTGAAGGATGGCGTAGCTGATGTGAAGAATCACCCATTCTTCCGTGGTGCGAATTGGGAGAAACTCTATGGACGTCATTATCACGCTCCCATTCCTGTAAAAGTGAAGAGCCCCGGCGACACAAGTAACTTTGAGTCGTATCCCGAGAGTGGGGATAAGCGGTTGCCCCCGTTAGCACCATCACAACAATTGGAGTTCCGTGGGTTTTAG GGATGATGACCGATTGTACCTCCTCCTCGAGTATGTGGTGGGTGGCGAGCTGT TCTCCCACCTCCGGAAGGCGGGAAAATTCCCTAATGATGTAGCCAAGTTCTACTCCGCAGAAGTGGTTTTGGCGTTTGAATATATTCATGAGTGCGGCATCGTATACCGTGACTTGAAGCCAGAAAATGTGCTTTTGGACAAGCAGGGAAACATTAAGATTACGGACTTTGGGTTCGCGAAACGCGTTAGGGACAGAACGTACACGCTATGTGGGACTCCAGAGTATCTTGCGCCGGAGATAATCCAAAGTAAAGGTCACGATCGGGCTGTGGATTGGTGGACACTCGGAATTCTTCTCTATGAGATGCTTGTCGGTTATCCTCCTTTTTTCGACGAGAGTCCTTTTAGAACATACGAAAAAATTTTAGAGGGGAAACTTCAGTTTCCAAAGTGGGTGGAGATGCGGGCGAAGGACCTCATAAAGAGTTTTTTAACAATTGAACCAACGAAACG i.e.,It is only giving the region where it could find the best alignment i.e., the best hit ones. I want the complete sequence i.e., sequences corresponding to the accession numbers XM_822292.1 XM_822286.1 XM_822694.1 Database used in Remote blast was RefSeq i.e.,(refseq_rna),organism used :Trypanasoma brucei. Can any one please help me in solving this problem Regards, Roopa. On Fri, Nov 20, 2009 at 12:30 PM, Roopa Raghuveer wrote: > > Hello Roy, > > Thanks a lot for your reply.My code is working for my sequence now. > > Thanks alot. > > Regards, > Roopa. > > On Thu, Nov 19, 2009 at 5:10 PM, Roy Chaudhuri wrote: > >> Hi Roopa, >> >> I think that the -Organism parameter that you specify for >> Bio::Tools::Run::RemoteBlast is ignored - I can't find any reference to it >> in the documentation: >> >> http://search.cpan.org/~cjfields/BioPerl-1.6.1/Bio/Tools/Run/RemoteBlast.pm >> >> You have the correct approach in your code - limiting the search to the >> Entrez query "Trypanosoma brucei[ORGN]", but the line is commented out. If >> you uncomment the line (and add a semicolon afterwards), the program runs >> correctly, but no hits are reported below your threshold e-value. If you >> change the value of $e_val to 10 then some T.brucei hits are reported. >> >> Roy. >> >> Roopa Raghuveer wrote: >> >>> Hello everybody, >>> >>> I have a problem. I would like to use remote blast to find sequences >>> matching for an input sequence. >>> >>> Ex:-I would like to search sequences which match Trypanosoma Brucei >>> sequence. >>> >>> I want the output to be only Trypanosoma Brucei sequences matching with >>> my >>> query.When i tried to use remoteblast to nr database,I got sequences from >>> different organisms like E.coli,Pseudomonas etc., >>> >>> Could you please tell me how can this be solved...? >>> >>> My code is as follows. >>> >>> use Bio::Tools::Run::RemoteBlast; >>> use strict; >>> my $prog = 'blastn'; >>> my $db = 'nr'; >>> my $e_val= '1e-10'; >>> my $organism= 'Trypanosoma Brucei'; >>> >>> my @params = ( '-prog' => $prog, >>> '-data' => $db, >>> '-expect' => $e_val, >>> '-readmethod' => 'SearchIO', >>> '-Organism' => $organism ); >>> >>> my $factory = Bio::Tools::Run::RemoteBlast-> >>> new(@params); >>> >>> #change a paramter >>> #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma >>> brucei[ORGN]' >>> >>> #remove a parameter >>> #delete $Bio::Tools::Run::RemoteBlast::HEADER{'FILTER'}; >>> >>> my $v = 1; >>> #$v is just to turn on and off the messages >>> >>> my $str = Bio::SeqIO->new(-file=>'amino.fa' , '-format' => 'fasta' , >>> '-organism' => 'Trypanosoma Brucei' ); >>> >>> while (my $input = $str->next_seq()){ >>> #Blast a sequence against a database: >>> my $r = $factory->submit_blast($input); >>> #my $r = $factory->submit_blast('amino.fa'); >>> >>> print STDERR "waiting..." if( $v > 0 ); >>> while ( my @rids = $factory->each_rid ) { >>> foreach my $rid ( @rids ) { >>> my $rc = $factory->retrieve_blast($rid); >>> if( !ref($rc) ) { >>> if( $rc < 0 ) { >>> $factory->remove_rid($rid); >>> } >>> print STDERR "." if ( $v > 0 ); >>> sleep 5; >>> } >>> else { >>> my $result = $rc->next_result(); >>> #save the output >>> my $filename = $result->query_name()."\.out"; >>> $factory->save_output($filename); >>> $factory->remove_rid($rid); >>> print "\nQuery Name: ", $result->query_name(), "\n"; >>> while ( my $hit = $result->next_hit ) { >>> next unless ( $v > 0); >>> print "\thit name is ", $hit->name, "\n"; >>> while( my $hsp = $hit->next_hsp ) { >>> print "\t\tscore is ", $hsp->score, "\n"; >>> } >>> } >>> } >>> } >>> } >>> } >>> >>> My input sequence is >>> >>> ref|NC_009512.1|:385-1902 >>>> >>> GTGTCAGTGGAACTTTGGCAGCAGTGCGTGGAGCTTCTGCGCGATGAACTGCCTGCCCAGCAATTCAACA >>> CCTGGATCCGTCCGCTACAGGTCGAAGCCGAAGGCGACGAGTTGCGCGTCTATGCGCCTAACCGTTTCGT >>> TCTCGATTGGGTCAATGAAAAGTACCTGGGTCGTTTGCTCGAGCTGTTGGGTGAGAACGGTAGCGGCATT >>> GCACCAGCCCTTTCCTTATTAATAGGTAGCCGCCGCAGCTCGGCCCCAAGGGCTGCACCCAACGCGCCGG >>> TCAGCGCTGCCGTTGCGGCTTCGCTGGCGCAGACTCAGGCGCACAAGACGGCCCCGGCAGCAGCGGTTGA >>> ACCCGTTGCCGTGGCCGCGGCCGAGCCTGTATTGGTCGAGACGTCTTCGCGTGACAGCTTTGATGCCATG >>> GCCGAGCCTGCTGCTGCGCCGCCCAGTGGTGGCCGGGCTGAACAGCGCACCGTGCAGGTTGAAGGTGCGC >>> TCAAGCACACCAGTTACCTGAACCGGACCTTTACCTTTGACACCTTCGTCGAAGGTAAGTCGAACCAGCT >>> CGCCCGCGCGGCTGCCTGGCAGGTTGCGGACAACCCTAAGCATGGCTACAACCCACTGTTCCTTTATGGC >>> GGTGTGGGTTTGGGTAAAACCCACCTTATGCATGCTGTGGGTAACCATCTGCTGAAGAAGAATCCGAACG >>> CCAAGGTGGTGTACCTGCATTCGGAGCGCTTCGTCGCGGACATGGTCAAAGCGTTGCAACTCAACGCCAT >>> CAACGAATTCAAGCGCTTCTACCGCTCGGTGGACGCGTTGCTGATCGACGATATCCAGTTCTTCGCTCGC >>> AAAGAGCGCTCGCAAGAAGAGTTTTTCCACACCTTCAACGCCTTGCTTGAGGGTGGCCAGCAGGTAATCC >>> TTACCTCTGACCGCTATCCCAAGGAAATCGAAGGCCTGGAAGAGCGTCTGAAGTCGCGCTTTGGTTGGGG >>> CCTGACGGTGGCTGTCGAGCCGCCAGAGCTGGAGACCCGCGTAGCGATCCTGATGAAGAAGGCCGACCAG >>> GCCAAAGTCGAGCTCCCGCATGACGCAGCCTTTTTCATCGCTCAGCGCATCCGGTCCAACGTCCGTGAGC >>> TGGAAGGTGCACTGAAGCGAGTTATTGCTCACTCGCACTTCATGGGGCGTGACATCACCATCGAGCTGAT >>> TCGTGAATCGCTCAAGGATCTGTTGGCGCTGCAAGACAAACTGGTCAGTGTGGATAACATTCAGCGTACC >>> GTCGCTGAGTACTACAAGATCAAGATCTCCGATCTGTTGTCCAAGCGTCGTTCGCGTTCTGTCGCGCGCC >>> CGCGTCAGGTAGCCATGGCCCTGTCCAAGGAGTTGACCAACCACAGTCTGCCGGAAATCGGCGACATGTT >>> CGGTGGTCGCGACCATACGACCGTGCTGCACGCCTGCCGCAAAATCAATGAACTGAAGGAATCCGACGCG >>> GACATCCGCGAGGACTACAAGAACCTGCTGCGGACGCTGACGACCTGA >>> >>> Please mail me regarding any queries. >>> >>> Regards, >>> Roopa. >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> > From mauricio at open-bio.org Fri Nov 20 16:15:22 2009 From: mauricio at open-bio.org (Mauricio Herrera Cuadra) Date: Fri, 20 Nov 2009 10:15:22 -0600 Subject: [Bioperl-l] DANGER: hacking of bioperl wiki? In-Reply-To: <7761C2223DB54DE6B836F302D2FF6AC0@NewLife> References: <7761C2223DB54DE6B836F302D2FF6AC0@NewLife> Message-ID: <4B06C09A.8060708@open-bio.org> All OBF wikis and blogs have been upgraded and cleaned from the hack. Thanks for the heads up! Mauricio. Mark A. Jensen wrote: > Andrew-- thanks!! We're on it. > MAJ > ----- Original Message ----- From: "Andrew Grimm" > > To: > Sent: Wednesday, November 18, 2009 9:52 PM > Subject: [Bioperl-l] DANGER: hacking of bioperl wiki? > > >> Caution: read the whole email before visiting the bioperl wiki >> >> I was doing some bioinformatics-related searching using google, and >> one of the hits was to the bio dot perl dot org wiki (the FAQ in >> particular). >> >> When I did that, I was redirected to a ferdax dot com web site (a >> typo-squatting of fedex?). >> >> Some people reckon that ferdax hacks web sites and redirects google >> hits from the victim web site to their own web site. For example, this >> thread at google's webmaster central >> http://www.google.com/support/forum/p/Webmasters/thread?tid=37a36c0d1ea99819&hl=en#all >> >> (it's talking about zencart, but presumably they've since found other >> victims) >> >> Just going to the website without using google may not trigger the >> redirect. >> >> Apologies if this is a false alarm, but I don't think it is. >> >> I won't be in contact between Friday and Monday Australian time (I'll >> be at railscamp 6 in Melbourne), so I won't be able to answer any >> replies. >> >> Thanks, >> >> Andrew Grimm >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From David.Messina at sbc.su.se Fri Nov 20 16:39:53 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 20 Nov 2009 17:39:53 +0100 Subject: [Bioperl-l] Remote blast In-Reply-To: References: <4B056DF4.2030502@gmail.com> Message-ID: <7ECF627D-3DBF-4575-89CF-FA6348C88E8E@sbc.su.se> Hi Roopa, As far as I know, a BLAST report never contains the complete sequences of the hits. If it includes any part of the hit's sequence, it will be the part that matches the query. You'll have to use the hit's ID or accession to get its complete sequence from somewhere else. You can use Bio::DB::Genbank to do that, for example. See http://www.bioperl.org/wiki/HOWTO:Beginners#Retrieving_a_sequence_from_a_database Dave From alessandra.bilardi at gmail.com Fri Nov 20 17:44:18 2009 From: alessandra.bilardi at gmail.com (Alessandra) Date: Fri, 20 Nov 2009 18:44:18 +0100 Subject: [Bioperl-l] Bio::DB::EUtilities question Message-ID: Hi all, I'm testing Bio::DB::EUtilities - webagent which interacts with and retrieves data from NCBI's eUtils. My perl script works but it works only if I request less than ~450 times get_Response function.. else I have got this error message: ------------- EXCEPTION ------------- MSG: Response Error Can't connect to eutils.ncbi.nlm.nih.gov:80 (connect: No route to host) STACK Bio::DB::GenericWebAgent::get_Response /usr/local/share/perl/5.10.0/Bio/DB/GenericWebAgent.pm:215 STACK toplevel ./wget4gbk.pl:77 ------------------------------------- wget4gbk.pl lines 76-77 are: my $req = Bio::DB::EUtilities->new(-db => 'genome', -eutil => 'esummary', -retmode => $mode, -rettype => $type, -id => $id); my $entry = $req->get_Response; I run perl script more ten times and this error arrives random time at the range 300-600 requests. If I use another system to request data, then I can to do ~ 10000 requests, without errors. Had I to set EUtilities object with particular parameters? Can you help me about random exception error? Best, -- Alessandra Bilardi, Ph. D. ---- CRIBI, University of Padova, Italy http://www.linkedin.com/in/bilardi ---- From maj at fortinbras.us Fri Nov 20 18:42:38 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 20 Nov 2009 13:42:38 -0500 Subject: [Bioperl-l] gravatars on the wiki Message-ID: <94431678F3764E8C9A49EA4D2FCD0DBD@NewLife> Hi all, You can now reveal your Gravatar (http://www.gravatar.com) on the wiki, by including the following markup on the page: {{#gravatar|youremail -at- yourplace -dot- tld}} You can do the antispam measure above, or use a regular email. Invalid emails throw an error. http://bioperl.org/wiki/Gravatars Happy coding, MAJ From roychu at gmail.com Fri Nov 20 20:23:21 2009 From: roychu at gmail.com (Chu, Roy) Date: Fri, 20 Nov 2009 12:23:21 -0800 Subject: [Bioperl-l] Installing Bio-perl on dreamhost via CPAN In-Reply-To: References: <4d7f3e450911200221m5f39ace2hb979712115fb9d78@mail.gmail.com> <20091120104445.GG31318@kunpuu.plessy.org> Message-ID: <4d7f3e450911201223w59cb308q5af7690a28697966@mail.gmail.com> "sounds very much like you process was killed for prolonged execution time, or memory usage. We have a daemon in place that monitors for processes that take up too much of a shared web server's resources, and this may have kicked in (and often does when trying to install packages on a shared server)." This was the explanation they had. Regarding asking their admins to install, it seems is a "they'll try to get to it but don't hold your breath situation." Hmmm, I tried some other attempts, installing 1.4.0 posed no problems. I'm not a perl guru, so I tried to increase the build cache size from the default, 10 MB, hoping that that may be the problem--can't imagine how though, since I can't imagine how big the whole package version can differ by (though honestly, I haven't checked). Whenever I try to install 1.6.1, it runs into a problem I guess after the 'make' step and lists the modules--BioPerl-1.6.0/t/Variation/SeqDiff.t BioPerl-1.6.0/t/Variation/SNP.t BioPerl-1.6.0/t/Variation/Variation_IO.t --and typically gets killed here '> Killed' Next, I tried 1.6.0, then I get this: "(I think you ran Build.PL directly, so will use CPAN to install prerequisites on demand) CPAN: Storable loaded ok (v2.12) Going to read '/home/$username/.cpan/Metadata' Killed" (everything prior works and it seems to get further along than when I try to install 1.6.1) Any insight into why this may be happening would be appreciated. Something EQUALLY appreciated would be a recommendation of a decent enough hosting service where someone has had success installing Bio-Perl. I'd try to set up my Mac web sharing feature and then try to setup the stuff locally, but I haven't yet been able to successfully get the port forwarding feature working properly on the apple airport extreme--perplexing. Next, I might just try to install via the Build.pl script. Hmm, checking the wiki, it seems I'll still be able to run remote blast and use the basic seq modules, although some discrepancies and idiosyncrasies may be expected? Any head-ups about any false assumptions by me would be greatly appreciated. Thanks in advance, Roy On Fri, Nov 20, 2009 at 5:00 AM, Chris Fields wrote: > > On Nov 20, 2009, at 4:44 AM, Charles Plessy wrote: > >> Le Fri, Nov 20, 2009 at 02:21:54AM -0800, Chu, Roy a ?crit : >>> >>> Does anyone use dreamhost as a web hosting service? ?I'm just curious >>> if anyone has had any luck installing the module as their daemon seems >>> to kill my process whenever I try to install it. ?Dreamhost tech >>> support attributes it to either exceeding the allocated memory cache >>> or exceeding the processing time. ?I tried to nice the process, but >>> that didn't help for me. ?Any luck or experience in resolving this >>> would be much appreciated. ?I suppose my next attempt would be to try >>> installing it directly and hope I don't need root... >> >> Dear Roy, >> >> DreamHost uses Debian, so you can suggest them to install the Debian package. >> If you are in contact with the tech service, do not hesitate to tell them to >> contact me if they are interested by a backport of the 1.6.0 package. For >> version 1.6.1, it may be more difficult as it depends on perl 5.10.1. > > Any reason why this is so? ?We specify compatibility back to 5.6.1. > > Alex mentioned the reliance on the specific Extutils::Manifest version. ?The version requested has an important bug fix, is present on CPAN, and is backwards-compatible to 5.6.1. ?It should be fairly easy to request that as a separate package. > > A strict requirement for perl 5.10.1 doesn't make sense in that light, unless said perl maintainer can enlighten us as to why this is an issue? ?This one may require a ranty blog post. > >> PS: if you propse to install BioPerl as a feature in the Dreamhost panel, I >> will vote for it :) >> >> Have a nice day, >> >> -- >> Charles Plessy >> Debian Med packaging team, >> http://www.debian.org/devel/debian-med >> Tsurumi, Kanagawa, Japan > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Fri Nov 20 20:40:24 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 20 Nov 2009 14:40:24 -0600 Subject: [Bioperl-l] Installing Bio-perl on dreamhost via CPAN In-Reply-To: <4d7f3e450911201223w59cb308q5af7690a28697966@mail.gmail.com> References: <4d7f3e450911200221m5f39ace2hb979712115fb9d78@mail.gmail.com> <20091120104445.GG31318@kunpuu.plessy.org> <4d7f3e450911201223w59cb308q5af7690a28697966@mail.gmail.com> Message-ID: <1D1B0987-3309-4281-BCE0-2737E4F0D0B1@illinois.edu> BioPerl is pure perl. If you believe all dependencies are installed, just unpack the dist to a specific directory and point PERL5LIB at it (for bash): export PERL5LIB=/home/USER/bioperl/bioperl-live Note that if you plan on doing the same for other bioperl-related modules (ex: bioperl-db) you'll need to add 'lib' to it, as they use a generic Module::Build now. export PERL5LIB=/home/USER/bioperl/bioperl-db/lib You can also add a 'use lib' directive in your scripts as well. More at the following link: http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix#USING_MODULES_NOT_INSTALLED_IN_THE_STANDARD_LOCATION chris On Nov 20, 2009, at 2:23 PM, Chu, Roy wrote: > "sounds very much like you process was killed for prolonged execution > time, or memory usage. We have a daemon in place that monitors for > processes that take up too much of a shared web server's resources, and > this may have kicked in (and often does when trying to install packages > on a shared server)." > > This was the explanation they had. Regarding asking their admins to > install, it seems is a "they'll try to get to it but don't hold your > breath situation." > > Hmmm, I tried some other attempts, installing 1.4.0 posed no problems. > I'm not a perl guru, so I tried to increase the build cache size from > the default, 10 MB, hoping that that may be the problem--can't imagine > how though, since I can't imagine how big the whole package version > can differ by (though honestly, I haven't checked). > Whenever I try to install 1.6.1, it runs into a problem I guess after > the 'make' step and lists the > modules--BioPerl-1.6.0/t/Variation/SeqDiff.t > BioPerl-1.6.0/t/Variation/SNP.t > BioPerl-1.6.0/t/Variation/Variation_IO.t > --and typically gets killed here '> Killed' > > Next, I tried 1.6.0, then I get this: > "(I think you ran Build.PL directly, so will use CPAN to install > prerequisites on demand) > CPAN: Storable loaded ok (v2.12) > Going to read '/home/$username/.cpan/Metadata' > Killed" (everything prior works and it seems to get further along than > when I try to install 1.6.1) > > Any insight into why this may be happening would be appreciated. > Something EQUALLY appreciated would be a recommendation of a decent > enough hosting service where someone has had success installing > Bio-Perl. I'd try to set up my Mac web sharing feature and then try > to setup the stuff locally, but I haven't yet been able to > successfully get the port forwarding feature working properly on the > apple airport extreme--perplexing. Next, I might just try to install > via the Build.pl script. > > Hmm, checking the wiki, it seems I'll still be able to run remote > blast and use the basic seq modules, although some discrepancies and > idiosyncrasies may be expected? Any head-ups about any false > assumptions by me would be greatly appreciated. > > Thanks in advance, > Roy > > On Fri, Nov 20, 2009 at 5:00 AM, Chris Fields wrote: >> >> On Nov 20, 2009, at 4:44 AM, Charles Plessy wrote: >> >>> Le Fri, Nov 20, 2009 at 02:21:54AM -0800, Chu, Roy a ?crit : >>>> >>>> Does anyone use dreamhost as a web hosting service? I'm just curious >>>> if anyone has had any luck installing the module as their daemon seems >>>> to kill my process whenever I try to install it. Dreamhost tech >>>> support attributes it to either exceeding the allocated memory cache >>>> or exceeding the processing time. I tried to nice the process, but >>>> that didn't help for me. Any luck or experience in resolving this >>>> would be much appreciated. I suppose my next attempt would be to try >>>> installing it directly and hope I don't need root... >>> >>> Dear Roy, >>> >>> DreamHost uses Debian, so you can suggest them to install the Debian package. >>> If you are in contact with the tech service, do not hesitate to tell them to >>> contact me if they are interested by a backport of the 1.6.0 package. For >>> version 1.6.1, it may be more difficult as it depends on perl 5.10.1. >> >> Any reason why this is so? We specify compatibility back to 5.6.1. >> >> Alex mentioned the reliance on the specific Extutils::Manifest version. The version requested has an important bug fix, is present on CPAN, and is backwards-compatible to 5.6.1. It should be fairly easy to request that as a separate package. >> >> A strict requirement for perl 5.10.1 doesn't make sense in that light, unless said perl maintainer can enlighten us as to why this is an issue? This one may require a ranty blog post. >> >>> PS: if you propse to install BioPerl as a feature in the Dreamhost panel, I >>> will vote for it :) >>> >>> Have a nice day, >>> >>> -- >>> Charles Plessy >>> Debian Med packaging team, >>> http://www.debian.org/devel/debian-med >>> Tsurumi, Kanagawa, Japan >> >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From charles-listes+bioperl at plessy.org Sat Nov 21 01:07:23 2009 From: charles-listes+bioperl at plessy.org (Charles Plessy) Date: Sat, 21 Nov 2009 10:07:23 +0900 Subject: [Bioperl-l] Installing Bio-perl on dreamhost via CPAN In-Reply-To: References: <4d7f3e450911200221m5f39ace2hb979712115fb9d78@mail.gmail.com> <20091120104445.GG31318@kunpuu.plessy.org> Message-ID: <20091121010723.GA7786@kunpuu.plessy.org> Le Fri, Nov 20, 2009 at 07:00:45AM -0600, Chris Fields a ?crit : > On Nov 20, 2009, at 4:44 AM, Charles Plessy wrote: > > > > DreamHost uses Debian, so you can suggest them to install the Debian > > package. If you are in contact with the tech service, do not hesitate to > > tell them to contact me if they are interested by a backport of the 1.6.0 > > package. For version 1.6.1, it may be more difficult as it depends on perl > > 5.10.1. > > Any reason why this is so? We specify compatibility back to 5.6.1. Dear Chris, you make a good point: although for building we need to either depend on perl 5.10.1 or package separately Extutils::Manifest, the resulting bioperl package does not depend on such a high version. Therefore, there is no need for a backport, and the latest Debian package can be installed on Debian stable (5.0/Lenny) system. I just checked the Dreamhost machine on which I happen to have an acces, ?waratahs?, and it seems to be older, but nevertheless it may be worth asking the admins anyway (with the big drawback that they would have to be asked for each update). Have a nice week-end, -- Charles Plessy Debian Med packaging team, http://www.debian.org/devel/debian-med Tsurumi, Kanagawa, Japan From robert.bradbury at gmail.com Sat Nov 21 01:40:14 2009 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Fri, 20 Nov 2009 20:40:14 -0500 Subject: [Bioperl-l] Excessive CPU use by various Bioperl sites Message-ID: I run a Linux system which is in a gradual process of evolution from the default Linux browsers (Galeon, Epiphany, etc.) through Firefox (better) to Google's Chromium (IMO, perhaps the best so far). Chromium allows one to create a process per tab/URL so one can effectively track what it is doing. It also allows one to track the machine usage of these processes (through the Developer > Task manager [shift-escape keyboard] option) which though expensive in terms of overhead allows one to track offending windows (in terms of memory or CPU use). My processor recently jumped from a typical 700 MHz to 1.4 GHz speed (using the Linux Ondemand scheduler - which saves ~20 W at the wall outlet -- I've measured it) to the full tilt 2.8 GHz the CPU is capable of. Looking at the chrome task manager I was not surprised to find the NY Times high on the list (they are pushing content, esp. using Javascript) but much to my dismay the Jalview and Howto:Trees:Bioperl appeared to be high on the list. Now I am forced to ask myself *why* sites which are simply distributing static information are eating up CPU on my machine! This is a fundamental flaw in the architecture of the sites -- wherein there should be conscious efforts to minimize user-CPU use (or avoid Javascript entirely). This would not be a problem if I were using Firefox as I can easily use NoScript to block Javacscript from non-approved sites. But it raises the question of when one should allow Javascript to run (one would "normally" approve academic sites by default) when even the academic sites are abusing my CPU. There needs to be much greater awareness both on the part of software distributors and software consumers that it is *MY* CPU and *MY* Electricty and *MY* contribution to global warming. And the developers/distributors should not be sucking down those resources without first saying "May I?" and I have the option of saying "No you may not." There is enough we can do productively (running low homology blast searches) without engaging in endless wheel spinning of Javascripts or looped GIFs. Robert From maj at fortinbras.us Sat Nov 21 04:17:12 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 20 Nov 2009 23:17:12 -0500 Subject: [Bioperl-l] ohlohers Message-ID: You can now add your Ohloh widgets and increase your carbon footprint with the less crufty: {{#ohloh|acct_id|TYPE}} where TYPE is [Detailed|Rank|Tiny]. Taint checks aplenty. MAJ From maj at fortinbras.us Sat Nov 21 04:33:02 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 20 Nov 2009 23:33:02 -0500 Subject: [Bioperl-l] Installing Bio-perl on dreamhost via CPAN In-Reply-To: <4d7f3e450911201223w59cb308q5af7690a28697966@mail.gmail.com> References: <4d7f3e450911200221m5f39ace2hb979712115fb9d78@mail.gmail.com><20091120104445.GG31318@kunpuu.plessy.org> <4d7f3e450911201223w59cb308q5af7690a28697966@mail.gmail.com> Message-ID: <9ECC66C2F23F47469AF0F07E3F9307FC@NewLife> Maybe 'nightmarehost' is more appropriate. I've had no problems on AWS, but this may not exactly what you need. MAJ ----- Original Message ----- From: "Chu, Roy" To: Sent: Friday, November 20, 2009 3:23 PM Subject: Re: [Bioperl-l] Installing Bio-perl on dreamhost via CPAN "sounds very much like you process was killed for prolonged execution time, or memory usage. We have a daemon in place that monitors for processes that take up too much of a shared web server's resources, and this may have kicked in (and often does when trying to install packages on a shared server)." This was the explanation they had. Regarding asking their admins to install, it seems is a "they'll try to get to it but don't hold your breath situation." Hmmm, I tried some other attempts, installing 1.4.0 posed no problems. I'm not a perl guru, so I tried to increase the build cache size from the default, 10 MB, hoping that that may be the problem--can't imagine how though, since I can't imagine how big the whole package version can differ by (though honestly, I haven't checked). Whenever I try to install 1.6.1, it runs into a problem I guess after the 'make' step and lists the modules--BioPerl-1.6.0/t/Variation/SeqDiff.t BioPerl-1.6.0/t/Variation/SNP.t BioPerl-1.6.0/t/Variation/Variation_IO.t --and typically gets killed here '> Killed' Next, I tried 1.6.0, then I get this: "(I think you ran Build.PL directly, so will use CPAN to install prerequisites on demand) CPAN: Storable loaded ok (v2.12) Going to read '/home/$username/.cpan/Metadata' Killed" (everything prior works and it seems to get further along than when I try to install 1.6.1) Any insight into why this may be happening would be appreciated. Something EQUALLY appreciated would be a recommendation of a decent enough hosting service where someone has had success installing Bio-Perl. I'd try to set up my Mac web sharing feature and then try to setup the stuff locally, but I haven't yet been able to successfully get the port forwarding feature working properly on the apple airport extreme--perplexing. Next, I might just try to install via the Build.pl script. Hmm, checking the wiki, it seems I'll still be able to run remote blast and use the basic seq modules, although some discrepancies and idiosyncrasies may be expected? Any head-ups about any false assumptions by me would be greatly appreciated. Thanks in advance, Roy On Fri, Nov 20, 2009 at 5:00 AM, Chris Fields wrote: > > On Nov 20, 2009, at 4:44 AM, Charles Plessy wrote: > >> Le Fri, Nov 20, 2009 at 02:21:54AM -0800, Chu, Roy a ?crit : >>> >>> Does anyone use dreamhost as a web hosting service? I'm just curious >>> if anyone has had any luck installing the module as their daemon seems >>> to kill my process whenever I try to install it. Dreamhost tech >>> support attributes it to either exceeding the allocated memory cache >>> or exceeding the processing time. I tried to nice the process, but >>> that didn't help for me. Any luck or experience in resolving this >>> would be much appreciated. I suppose my next attempt would be to try >>> installing it directly and hope I don't need root... >> >> Dear Roy, >> >> DreamHost uses Debian, so you can suggest them to install the Debian package. >> If you are in contact with the tech service, do not hesitate to tell them to >> contact me if they are interested by a backport of the 1.6.0 package. For >> version 1.6.1, it may be more difficult as it depends on perl 5.10.1. > > Any reason why this is so? We specify compatibility back to 5.6.1. > > Alex mentioned the reliance on the specific Extutils::Manifest version. The > version requested has an important bug fix, is present on CPAN, and is > backwards-compatible to 5.6.1. It should be fairly easy to request that as a > separate package. > > A strict requirement for perl 5.10.1 doesn't make sense in that light, unless > said perl maintainer can enlighten us as to why this is an issue? This one may > require a ranty blog post. > >> PS: if you propse to install BioPerl as a feature in the Dreamhost panel, I >> will vote for it :) >> >> Have a nice day, >> >> -- >> Charles Plessy >> Debian Med packaging team, >> http://www.debian.org/devel/debian-med >> Tsurumi, Kanagawa, Japan > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Sat Nov 21 04:38:23 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 20 Nov 2009 22:38:23 -0600 Subject: [Bioperl-l] Excessive CPU use by various Bioperl sites In-Reply-To: References: Message-ID: <8163BC62-3F3E-4936-AAA9-61A4FB307C99@illinois.edu> Robert, Not sure why you're seeing that, but the HOWTO (and, AFAIK, the wiki in general) do not use JS, unless there is a specific addition I'm unaware of. Now, the site wiki was recently 'parasited' for redirects, which may be the culprit, but this is now fixed. Can you at least retest to see if this persists? Anyone else know about this? chris On Nov 20, 2009, at 7:40 PM, Robert Bradbury wrote: > I run a Linux system which is in a gradual process of evolution from the > default Linux browsers (Galeon, Epiphany, etc.) through Firefox (better) to > Google's Chromium (IMO, perhaps the best so far). Chromium allows one to > create a process per tab/URL so one can effectively track what it is doing. > It also allows one to track the machine usage of these processes (through > the Developer > Task manager [shift-escape keyboard] option) which though > expensive in terms of overhead allows one to track offending windows (in > terms of memory or CPU use). My processor recently jumped from a typical > 700 MHz to 1.4 GHz speed (using the Linux Ondemand scheduler - which saves > ~20 W at the wall outlet -- I've measured it) to the full tilt 2.8 GHz the > CPU is capable of. Looking at the chrome task manager I was not surprised > to find the NY Times high on the list (they are pushing content, esp. using > Javascript) but much to my dismay the Jalview and Howto:Trees:Bioperl > appeared to be high on the list. Now I am forced to ask myself *why* sites > which are simply distributing static information are eating up CPU on my > machine! This is a fundamental flaw in the architecture of the sites -- > wherein there should be conscious efforts to minimize user-CPU use (or avoid > Javascript entirely). This would not be a problem if I were using Firefox > as I can easily use NoScript to block Javacscript from non-approved sites. > But it raises the question of when one should allow Javascript to run (one > would "normally" approve academic sites by default) when even the academic > sites are abusing my CPU. There needs to be much greater awareness both on > the part of software distributors and software consumers that it is *MY* CPU > and *MY* Electricty and *MY* contribution to global warming. And the > developers/distributors should not be sucking down those resources without > first saying "May I?" and I have the option of saying "No you may not." > There is enough we can do productively (running low homology blast > searches) without engaging in endless wheel spinning of Javascripts or > looped GIFs. > > Robert > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From sdavis2 at mail.nih.gov Sat Nov 21 05:11:34 2009 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Fri, 20 Nov 2009 21:11:34 -0800 Subject: [Bioperl-l] Excessive CPU use by various Bioperl sites In-Reply-To: <8163BC62-3F3E-4936-AAA9-61A4FB307C99@illinois.edu> References: <8163BC62-3F3E-4936-AAA9-61A4FB307C99@illinois.edu> Message-ID: <264855a00911202111u4b1f1020r4aa6e0e9b0ea61@mail.gmail.com> On Fri, Nov 20, 2009 at 8:38 PM, Chris Fields wrote: > Robert, > > Not sure why you're seeing that, but the HOWTO (and, AFAIK, the wiki in > general) do not use JS, unless there is a specific addition I'm unaware of. > Now, the site wiki was recently 'parasited' for redirects, which may be the > culprit, but this is now fixed. Can you at least retest to see if this > persists? > > Anyone else know about this? > > The page in question does include javascript, it appears from the source. This is a function of using mediawiki, though, I believe and not something specific to that page. Sean From cjfields at illinois.edu Sat Nov 21 05:20:37 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 20 Nov 2009 23:20:37 -0600 Subject: [Bioperl-l] Excessive CPU use by various Bioperl sites In-Reply-To: <264855a00911202111u4b1f1020r4aa6e0e9b0ea61@mail.gmail.com> References: <8163BC62-3F3E-4936-AAA9-61A4FB307C99@illinois.edu> <264855a00911202111u4b1f1020r4aa6e0e9b0ea61@mail.gmail.com> Message-ID: On Nov 20, 2009, at 11:11 PM, Sean Davis wrote: > On Fri, Nov 20, 2009 at 8:38 PM, Chris Fields wrote: > >> Robert, >> >> Not sure why you're seeing that, but the HOWTO (and, AFAIK, the wiki in >> general) do not use JS, unless there is a specific addition I'm unaware of. >> Now, the site wiki was recently 'parasited' for redirects, which may be the >> culprit, but this is now fixed. Can you at least retest to see if this >> persists? >> >> Anyone else know about this? >> >> > The page in question does include javascript, it appears from the source. > This is a function of using mediawiki, though, I believe and not something > specific to that page. > > Sean Sean, thanks for pointing that out. chris From robert.bradbury at gmail.com Sat Nov 21 18:26:05 2009 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Sat, 21 Nov 2009 13:26:05 -0500 Subject: [Bioperl-l] Bio::DB::EUtilities question In-Reply-To: References: Message-ID: It sounds like NCBI may be counting frequency of requests, how much data they send or something similar. Are you delaying the time between fetches? The code I've seen typically sleeps for a few seconds each time around a loop. You might try longer delays between fetches and see if that gets you any more data. Alternatively perhaps the libraries aren't reusing the TCP/IP connection properly. Is there a difference between the amount of memory on the machines? Have you watched the size of the process to see if it grows over time? I think the bug which prevented me from fetching a not-so-large genome from a few months ago (eating up 3GB of memory in the process) has not been resolved. If so that could be your problem. Robert On Fri, Nov 20, 2009 at 12:44 PM, Alessandra wrote: > > > I'm testing Bio::DB::EUtilities - webagent which interacts with and > retrieves data from NCBI's eUtils. My perl script works but it works > only if I request less than ~450 times get_Response function.. else I > have got this error message: > > ------------- EXCEPTION ------------- > MSG: Response Error > Can't connect to eutils.ncbi.nlm.nih.gov:80 (connect: No route to host) > STACK Bio::DB::GenericWebAgent::get_Response > /usr/local/share/perl/5.10.0/Bio/DB/GenericWebAgent.pm:215 > STACK toplevel ./wget4gbk.pl:77 > From cjfields at illinois.edu Sat Nov 21 19:19:24 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 21 Nov 2009 13:19:24 -0600 Subject: [Bioperl-l] Bio::DB::EUtilities question In-Reply-To: References: Message-ID: <837CE7E7-E625-4285-AD54-06FD168C0DF3@illinois.edu> NCBI has specific rules about the repeated queries to its servers: http://eutils.ncbi.nlm.nih.gov/#UserSystemRequirements Acc. to that, if you are making over 100 requests at peak times you will run into problems (they'll probably temp-block your IP), even if the timeout is much shorter now (it's 3 requests/second, whereas a year or two ago it was once every 3 sec). In general it's best to run something like this during off-hours. The actual limit on number of server requests is one specific part of Bio::DB::EUtilities that hasn't been added yet, but is tentatively planned. chris On Nov 21, 2009, at 12:26 PM, Robert Bradbury wrote: > It sounds like NCBI may be counting frequency of requests, how much data > they send or something similar. Are you delaying the time between fetches? > The code I've seen typically sleeps for a few seconds each time around a > loop. You might try longer delays between fetches and see if that gets you > any more data. > > Alternatively perhaps the libraries aren't reusing the TCP/IP connection > properly. Is there a difference between the amount of memory on the > machines? Have you watched the size of the process to see if it grows over > time? I think the bug which prevented me from fetching a not-so-large > genome from a few months ago (eating up 3GB of memory in the process) has > not been resolved. If so that could be your problem. > > Robert > > On Fri, Nov 20, 2009 at 12:44 PM, Alessandra > wrote: >> >> >> I'm testing Bio::DB::EUtilities - webagent which interacts with and >> retrieves data from NCBI's eUtils. My perl script works but it works >> only if I request less than ~450 times get_Response function.. else I >> have got this error message: >> >> ------------- EXCEPTION ------------- >> MSG: Response Error >> Can't connect to eutils.ncbi.nlm.nih.gov:80 (connect: No route to host) >> STACK Bio::DB::GenericWebAgent::get_Response >> /usr/local/share/perl/5.10.0/Bio/DB/GenericWebAgent.pm:215 >> STACK toplevel ./wget4gbk.pl:77 >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Sun Nov 22 02:58:37 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 21 Nov 2009 20:58:37 -0600 Subject: [Bioperl-l] BioPerl on FLOSS Weekly Message-ID: <05EB7AF4-8A20-4046-A585-FBF41EA8350A@illinois.edu> Jason and I were recently interviewed (Wednesday!) about BioPerl for FLOSS Weekly by Randal Schwartz, Leo Laporte, Marc Pelletier, and Kirsten Sanford. The interview is now available online, so get your favorite flavor (MP3, podcast) here: http://twit.tv/floss96 Enjoy! chris and jason From adsj at novozymes.com Sun Nov 22 12:37:40 2009 From: adsj at novozymes.com (Adam =?iso-8859-1?Q?Sj=F8gren?=) Date: Sun, 22 Nov 2009 13:37:40 +0100 Subject: [Bioperl-l] BioPerl on FLOSS Weekly In-Reply-To: <05EB7AF4-8A20-4046-A585-FBF41EA8350A@illinois.edu> (Chris Fields's message of "Sat, 21 Nov 2009 20:58:37 -0600") References: <05EB7AF4-8A20-4046-A585-FBF41EA8350A@illinois.edu> Message-ID: <87aaye91m3.fsf@topper.koldfront.dk> On Sat, 21 Nov 2009 20:58:37 -0600, Chris wrote: > Jason and I were recently interviewed (Wednesday!) about BioPerl for > FLOSS Weekly by Randal Schwartz, Leo Laporte, Marc Pelletier, and > Kirsten Sanford. Great! How about linking to it on bioperl.org? :-), Adam -- Adam Sj?gren adsj at novozymes.com From cjfields at illinois.edu Sun Nov 22 20:30:01 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 22 Nov 2009 14:30:01 -0600 Subject: [Bioperl-l] BioPerl on FLOSS Weekly In-Reply-To: <87aaye91m3.fsf@topper.koldfront.dk> References: <05EB7AF4-8A20-4046-A585-FBF41EA8350A@illinois.edu> <87aaye91m3.fsf@topper.koldfront.dk> Message-ID: <2F050081-8B44-4B4C-82D2-7AC71F156588@illinois.edu> On Nov 22, 2009, at 6:37 AM, Adam Sj?gren wrote: > On Sat, 21 Nov 2009 20:58:37 -0600, Chris wrote: > >> Jason and I were recently interviewed (Wednesday!) about BioPerl for >> FLOSS Weekly by Randal Schwartz, Leo Laporte, Marc Pelletier, and >> Kirsten Sanford. > > Great! > > How about linking to it on bioperl.org? > > > :-), > > Adam > > -- > Adam Sj?gren > adsj at novozymes.com Now posted via O|B|F News; I'll try to make that feed more prominent on the main page. Since this is the second such interview (Jason did one a few years back for PerlCast), I'm thinking we need a media page of some sort. chris From maj at fortinbras.us Sun Nov 22 20:48:39 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sun, 22 Nov 2009 15:48:39 -0500 Subject: [Bioperl-l] BioPerl on FLOSS Weekly In-Reply-To: <2F050081-8B44-4B4C-82D2-7AC71F156588@illinois.edu> References: <05EB7AF4-8A20-4046-A585-FBF41EA8350A@illinois.edu><87aaye91m3.fsf@topper.koldfront.dk> <2F050081-8B44-4B4C-82D2-7AC71F156588@illinois.edu> Message-ID: <247658CC6D9A4529B281F4482BD3E4BD@NewLife> We do have http://www.bioperl.org/wiki/Category:BioPerl_Media -- ----- Original Message ----- From: "Chris Fields" To: "Adam Sj?gren" Cc: Sent: Sunday, November 22, 2009 3:30 PM Subject: Re: [Bioperl-l] BioPerl on FLOSS Weekly On Nov 22, 2009, at 6:37 AM, Adam Sj?gren wrote: > On Sat, 21 Nov 2009 20:58:37 -0600, Chris wrote: > >> Jason and I were recently interviewed (Wednesday!) about BioPerl for >> FLOSS Weekly by Randal Schwartz, Leo Laporte, Marc Pelletier, and >> Kirsten Sanford. > > Great! > > How about linking to it on bioperl.org? > > > :-), > > Adam > > -- > Adam Sj?gren > adsj at novozymes.com Now posted via O|B|F News; I'll try to make that feed more prominent on the main page. Since this is the second such interview (Jason did one a few years back for PerlCast), I'm thinking we need a media page of some sort. chris _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From jardim.rodrigo at gmail.com Sun Nov 22 16:06:40 2009 From: jardim.rodrigo at gmail.com (Rodrigo Jardim) Date: Sun, 22 Nov 2009 14:06:40 -0200 Subject: [Bioperl-l] Problems with Genbank Proteins File Message-ID: I have been problem to parser genbank protein file. I think that because this file have a other order of fields. For example: In most general genbank files: ======================== LOCUS AA399704 183 bp mRNA linear EST 03-MAR-2000 ACCESSION AA399704 VERSION AA399704.1 GI:2053305 DEFINITION TEUF0001 T.cruzi epimastigote non-normalized cDNA Library Trypanosoma cruzi cDNA clone 1 5' similar to T. cruzi gene for histone H2b (X60982), mRNA sequence. KEYWORDS EST. SOURCE Trypanosoma cruzi In genbank protein files: =================== LOCUS XP_628849 510 aa linear INV 31-OCT-2008 DEFINITION hypothetical protein [Dictyostelium discoideum AX4]. ACCESSION XP_628849 VERSION XP_628849.1 GI:66799847 DBSOURCE REFSEQ: accession XM_628847.1 KEYWORDS . SOURCE Dictyostelium discoideum AX4. When I try to parser, Bioperl abort with message error. Any ideas? Thanks all, -- Atc, Rodrigo Jardim jardim.rodrigo at gmail.com From biopython at maubp.freeserve.co.uk Mon Nov 23 17:36:36 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 23 Nov 2009 17:36:36 +0000 Subject: [Bioperl-l] Problems with Genbank Proteins File In-Reply-To: References: Message-ID: <320fb6e00911230936ofb9d897rbd45abb73a361250@mail.gmail.com> On Sun, Nov 22, 2009 at 4:06 PM, Rodrigo Jardim wrote: > I have been problem to parser genbank protein file. I think that because > this file have a other order of fields. For example: > > ... > > When I try to parser, Bioperl abort with message error. > > Any ideas? There are some important bits of information missing - what is the error message, and what version of BioPerl are you using? Peter From maj at fortinbras.us Mon Nov 23 17:58:46 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 23 Nov 2009 12:58:46 -0500 Subject: [Bioperl-l] building samtools/Bio::DB::Sam on cygwin Message-ID: Hi All-- I've had some hard-won success installing samtools and Lincoln's Bio::DB::Sam under cygwin; thought some on the list would be able to use my notes. (Yes, Jason, I'm working on Bio::Tools::Run::BWA...) (To get the current samtools, ping http://sourceforge.net/projects/samtools/files/samtools/0.1.7/samtools-0.1.7a.tar.bz2/download ) * Getting samtools to make from scratch in cygwin The following diff details the changes to the samtools Makefile I made by hand. The key points are -D_WIN32 and the additional variable LFLAGS and its interpolations. To get the linker to see libgcc libstdc++ I needed to add symlinks from /lib to the correct files in /lib/gcc/i386-pc-cygwin/4.3.2/. Your gcc version may differ. --- ../old/samtools-0.1.7a/Makefile 2009-11-16 10:13:43.000000000 -0500 +++ Makefile 2009-11-23 12:14:18.529000000 -0500 @@ -1,16 +1,18 @@ CC= gcc CFLAGS= -g -Wall -O2 #-m64 #-arch ppc -DFLAGS= -D_FILE_OFFSET_BITS=64 -D_USE_KNETFILE -D_CURSES_LIB=1 +LFLAGS= -lws2_32 -lgcc -lcygwin -lbz2 -lz -lstdc++ +DFLAGS= -D_WIN32 -D_FILE_OFFSET_BITS=64 -D_CURSES_LIB=1 LOBJS= bgzf.o kstring.o bam_aux.o bam.o bam_import.o sam.o bam_index.o \ bam_pileup.o bam_lpileup.o bam_md.o glf.o razf.o faidx.o knetfile.o \ bam_sort.o sam_header.o AOBJS= bam_tview.o bam_maqcns.o bam_plcmd.o sam_view.o \ bam_rmdup.o bam_rmdupse.o bam_mate.o bam_stat.o bam_color.o \ bamtk.o kaln.o @@ -36,13 +38,13 @@ $(AR) -cru $@ $(LOBJS) samtools:lib $(AOBJS) - $(CC) $(CFLAGS) -o $@ $(AOBJS) -lm $(LIBPATH) $(LIBCURSES) -lz -L. -lbam + $(CC) $(CFLAGS) -o $@ $(AOBJS) -Xlinker --enable-auto-import -lm $(LIBPATH) $(LIBCURSES) -lz -L. -lbam $(LFLAGS) razip:razip.o razf.o knetfile.o - $(CC) $(CFLAGS) -o $@ razf.o razip.o knetfile.o -lz + $(CC) $(CFLAGS) -o $@ razf.o razip.o knetfile.o -lz -lm -lws2_32 bgzip:bgzip.o bgzf.o - $(CC) $(CFLAGS) -o $@ bgzf.o bgzip.o -lz + $(CC) $(CFLAGS) -o $@ bgzf.o bgzip.o -lz -lm -lws2_32 razip.o:razf.h bam.o:bam.h razf.h bam_endian.h kstring.h sam_header.h * Getting Bio::DB::Sam to compile and install Bio::DB::Sam requires not the samtools.exe, but the bam library created during the samtools build, as well as all the samtools header files. Create a symlink in /lib to libbam.a in the build directory (or copy libbam.a up to /lib), and create symlinks or copy *.h into /usr/include. Then in cygwin bash shell $ cpan cpan> install Bio::DB::Sam should fly. Hope someone finds this useful. These mods led me to a successful Bio::DB::Sam install--have not yet checked original code based on Bio::DB::Sam. If they don't work for you, reply to the list. cheers, MAJ From jcline at ieee.org Mon Nov 23 19:13:26 2009 From: jcline at ieee.org (Jonathan Cline) Date: Mon, 23 Nov 2009 13:13:26 -0600 Subject: [Bioperl-l] Installing Bio-perl on dreamhost via CPAN In-Reply-To: References: Message-ID: <4B0ADED6.8040901@ieee.org> Dreamhost has terrible reliability. I have stats going back years on a standard dreamhost hosting account (non-dedicated server), and on some days the web server doesn't respond. Dreamhost service is OK for a hobby blog however it is definitely *not* suitable for anything real. Add in latency, arbitrary account limits/restrictions, etc, and as a hosting service, it is a bad idea to host a project there. Although some users apparently get lucky with server allocation and end up on a "good server", the provider can change this at any time as well. I think more typically, the accounts users don't notice, since most are simple bloggers. Here's a data snip that illustrates the problem with a typical dreamhost account: ---------------------------------------------------------------------- date uptime dns connect request ttfb ttlb 2008-08-05 91.40 0.000 0.528 0.528 2.257 1.619 2008-08-04 89.13 0.002 0.301 0.301 1.302 0.971 2008-08-03 94.62 0.000 0.567 0.567 1.506 0.913 2008-08-02 100.00 0.000 0.335 0.335 1.475 1.079 2008-08-01 100.00 0.000 0.310 0.310 1.587 0.825 2008-07-31 93.55 0.023 0.386 0.386 1.280 0.759 2008-07-30 100.00 0.000 0.345 0.345 1.373 0.860 2008-07-29 100.00 0.000 0.358 0.358 1.335 0.757 2008-07-28 100.00 0.000 0.327 0.327 1.462 0.896 2008-07-27 100.00 0.000 0.292 0.292 1.410 0.966 2008-07-26 100.00 0.000 0.283 0.283 1.280 0.815 2008-07-25 100.00 0.000 0.297 0.297 1.231 0.853 2008-07-24 100.00 0.000 0.362 0.362 1.258 0.699 2008-07-23 100.00 0.000 0.339 0.339 1.270 0.785 ---------------------------------------------------------------------- minimum 89.13 0.000 0.283 0.283 1.231 0.699 maximum 100.00 0.023 0.567 0.567 2.257 1.619 average 97.76 0.002 0.359 0.359 1.430 0.914 ---------------------------------------------------------------------- Or this month: ---------------------------------------------------------------------- date uptime dns connect request ttfb ttlb 2009-11-11 100.00 0.011 0.097 0.097 1.260 1.638 2009-11-10 100.00 0.008 0.094 0.094 1.285 1.647 2009-11-09 100.00 0.008 0.094 0.094 1.494 1.872 2009-11-08 100.00 0.015 0.101 0.101 1.509 1.894 2009-11-07 100.00 0.006 0.092 0.092 1.453 1.831 2009-11-06 100.00 0.011 0.097 0.097 1.500 1.882 2009-11-05 97.80 0.012 0.097 0.097 1.445 1.806 2009-11-04 100.00 0.010 0.096 0.096 1.235 1.605 2009-11-03 95.65 0.007 0.093 0.093 1.266 1.612 2009-11-02 100.00 0.010 0.096 0.096 1.267 1.637 2009-11-01 100.00 0.007 0.093 0.093 1.311 1.692 2009-10-31 100.00 0.009 0.095 0.095 1.225 1.594 2009-10-30 100.00 0.009 0.095 0.095 1.364 1.739 2009-10-29 100.00 0.017 0.103 0.103 1.121 1.505 ---------------------------------------------------------------------- minimum 95.65 0.006 0.092 0.092 1.121 1.505 maximum 100.00 0.017 0.103 0.103 1.509 1.894 average 99.53 0.010 0.096 0.096 1.338 1.711 ---------------------------------------------------------------------- ## Jonathan Cline ## jcline at ieee.org ## Mobile: +1-805-617-0223 ######################## From cjfields at illinois.edu Tue Nov 24 03:19:02 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 23 Nov 2009 21:19:02 -0600 Subject: [Bioperl-l] verbosity and error stack, was accessing EMBL database In-Reply-To: <3277368F-615A-4DD3-B9B3-5D32A5EEEE98@sbc.su.se> References: <475F74057A618245A773CD325E105D1E033334AC@phy-srv01.physiol.physiology.wisc.edu> <3277368F-615A-4DD3-B9B3-5D32A5EEEE98@sbc.su.se> Message-ID: <167D2408-653E-4DF5-BCD7-134CE2549E44@illinois.edu> Okay, so I think it's feasible to add this into trunk. I like the idea of optionally having a log class, if someone comes up with a nice way of adding it in I would be for it. chris On Nov 20, 2009, at 4:15 AM, Dave Messina wrote: > Chris, I took a look at how you implemented this in Biome -- very nice! > > >> I like this verbose/strict separability a lot. Should we go for it? > > Me too. So yes, I think so. > > >> We could even allow finer-grained control of verbosity (states which cover all combinations) w/o affecting strictness. > > > Perhaps this is a job for Log::Log4Perl or Log::Dispatch? > http://search.cpan.org/~mschilli/Log-Log4perl-1.25/lib/Log/Log4perl.pm > http://search.cpan.org/~drolsky/Log-Dispatch-2.26/lib/Log/Dispatch.pm > > > That might be overkill, though. > > Dave > From David.Messina at sbc.su.se Tue Nov 24 16:18:22 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 24 Nov 2009 17:18:22 +0100 Subject: [Bioperl-l] verbosity and error stack, was accessing EMBL database In-Reply-To: <167D2408-653E-4DF5-BCD7-134CE2549E44@illinois.edu> References: <475F74057A618245A773CD325E105D1E033334AC@phy-srv01.physiol.physiology.wisc.edu> <3277368F-615A-4DD3-B9B3-5D32A5EEEE98@sbc.su.se> <167D2408-653E-4DF5-BCD7-134CE2549E44@illinois.edu> Message-ID: <3FD2086D-062F-4706-9DC8-2A53224C4913@sbc.su.se> > I like the idea of optionally having a log class, if someone comes up with a nice way of adding it in I would be for it. My suggestion of the logging modules was actually to handle the various levels of verbose output -- I think both of the ones I mentioned "log" to STDERR by default. But of course a nice side effect of using such a logging module is that it would allow optional logging to a file, too. Dave From paolo.pavan at gmail.com Tue Nov 24 19:28:09 2009 From: paolo.pavan at gmail.com (Paolo Pavan) Date: Tue, 24 Nov 2009 20:28:09 +0100 Subject: [Bioperl-l] Bio::Tools::Run::Cap3 usage question Message-ID: <56be91b60911241128s52613a56u99e5b1cb3ba8d19a@mail.gmail.com> Dear, I'm confused about the proper usage of the module Bio::Tools::Run::Cap3. As documented in the pod, the run(@seqs) method returns the cap3 report file while I expect to return a Bio::Assembly object, consistently with other Bio::Tools::Run classes. However, I went around this by getting from the factory object the location and the names of the temp output files (actually accessing a private property, although) and reading them via the Assembly::IO system. I was just wandering what is the proper designed way to do this job. Thank you for enlighten the way! Paolo From Russell.Smithies at agresearch.co.nz Tue Nov 24 22:04:31 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Wed, 25 Nov 2009 11:04:31 +1300 Subject: [Bioperl-l] Bio::DB::Fasta Message-ID: <18DF7D20DFEC044098A1062202F5FFF32B63085409@exchsth.agresearch.co.nz> Is there any way to pass a filename to Bio::DB::Fasta for the location of where to write the directory.index? It's writing in the same dir as the fasta but I'd rather have it write in /tmp as it's part of a web app. Thanx, Russell ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From Russell.Smithies at agresearch.co.nz Tue Nov 24 22:21:52 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Wed, 25 Nov 2009 11:21:52 +1300 Subject: [Bioperl-l] Bio::DB::Fasta In-Reply-To: <4296CD1039FC44B89034A1FD3E6721F3@NewLife> References: <18DF7D20DFEC044098A1062202F5FFF32B63085409@exchsth.agresearch.co.nz> <4296CD1039FC44B89034A1FD3E6721F3@NewLife> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32B6308542C@exchsth.agresearch.co.nz> That's what I ended up doing. Also, there's no "obvious" way to index a single file so I ended putting the filename in the glob parameter. my $db = Bio::DB::Fasta->new( "$tmp", -glob => "test.faa", -reindex => 1 ); --Russell > -----Original Message----- > From: Mark A. Jensen [mailto:maj at fortinbras.us] > Sent: Wednesday, 25 November 2009 11:19 a.m. > To: Smithies, Russell; 'bioperl-l' > Subject: Re: [Bioperl-l] Bio::DB::Fasta > > The code (method index_dir() ) seems to expect all the fasta files to be > contained in that directory. Looks hairy; what about creating symlinks to > your > fasta files in a /tmp subdir and calling new() with that subdir? > ----- Original Message ----- > From: "Smithies, Russell" > To: "'bioperl-l'" > Sent: Tuesday, November 24, 2009 5:04 PM > Subject: [Bioperl-l] Bio::DB::Fasta > > > > Is there any way to pass a filename to Bio::DB::Fasta for the location > of > > where to write the directory.index? > > It's writing in the same dir as the fasta but I'd rather have it write > in /tmp > > as it's part of a web app. > > > > Thanx, > > > > Russell > > > > > > ======================================================================= > > Attention: The information contained in this message and/or attachments > > from AgResearch Limited is intended only for the persons or entities > > to which it is addressed and may contain confidential and/or privileged > > material. Any review, retransmission, dissemination or other use of, or > > taking of any action in reliance upon, this information by persons or > > entities other than the intended recipients is prohibited by AgResearch > > Limited. If you have received this message in error, please notify the > > sender immediately. > > ======================================================================= > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > From maj at fortinbras.us Tue Nov 24 22:18:51 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 24 Nov 2009 17:18:51 -0500 Subject: [Bioperl-l] Bio::DB::Fasta In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32B63085409@exchsth.agresearch.co.nz> References: <18DF7D20DFEC044098A1062202F5FFF32B63085409@exchsth.agresearch.co.nz> Message-ID: <4296CD1039FC44B89034A1FD3E6721F3@NewLife> The code (method index_dir() ) seems to expect all the fasta files to be contained in that directory. Looks hairy; what about creating symlinks to your fasta files in a /tmp subdir and calling new() with that subdir? ----- Original Message ----- From: "Smithies, Russell" To: "'bioperl-l'" Sent: Tuesday, November 24, 2009 5:04 PM Subject: [Bioperl-l] Bio::DB::Fasta > Is there any way to pass a filename to Bio::DB::Fasta for the location of > where to write the directory.index? > It's writing in the same dir as the fasta but I'd rather have it write in /tmp > as it's part of a web app. > > Thanx, > > Russell > > > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From florent.angly at gmail.com Tue Nov 24 22:54:48 2009 From: florent.angly at gmail.com (Florent Angly) Date: Tue, 24 Nov 2009 14:54:48 -0800 Subject: [Bioperl-l] Bio::Tools::Run::Cap3 usage question In-Reply-To: <56be91b60911241128s52613a56u99e5b1cb3ba8d19a@mail.gmail.com> References: <56be91b60911241128s52613a56u99e5b1cb3ba8d19a@mail.gmail.com> Message-ID: <4B0C6438.8070405@gmail.com> Hi Paolo, It turns out that there is no standard for what is to be passed to the Bio::Tools::Run wrappers and returned by them. I noticed the inconsistency between the assembly wrappers recently while implementing support for new wrapper. I implemented inital support for additional de novo assembly programs in BioPerl (454 Newbler and Minimo) a couple of weeks ago and Mark Jensen added support for Maq, a program that assembler reads against a reference. In the process, all the assembly wrappers were changed to take the same type of input data (a FASTA sequence or an array reference of sequence objects) and return one of the following: * a Bio::Assembly::Scaffold object (the default), or * a Bio::Assembly::IO object, or * the name of a file for the output of the assembler Use the out_type method to set up which output you want, e.g.: $factory->out_type('Bio::Assembly::IO'); or $factory->out_type('cap3_results.ace'); You'll have to use the code in the bioperl-run subversion if you want to use these new features. Cheers, Florent Paolo Pavan wrote: > Dear, > I'm confused about the proper usage of the module Bio::Tools::Run::Cap3. > As documented in the pod, the run(@seqs) method returns the cap3 report file > while I expect to return a Bio::Assembly object, consistently with other > Bio::Tools::Run classes. > However, I went around this by getting from the factory object the location > and the names of the temp output files (actually accessing a private > property, although) and reading them via the Assembly::IO system. > I was just wandering what is the proper designed way to do this job. > > Thank you for enlighten the way! > Paolo > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From roychu at gmail.com Tue Nov 24 23:00:58 2009 From: roychu at gmail.com (Roy) Date: Tue, 24 Nov 2009 15:00:58 -0800 Subject: [Bioperl-l] Remote Blast - same script but different results Message-ID: <4d7f3e450911241500y7df305acq1d03819ea1ec7d3e@mail.gmail.com> Hi bioperl community, I've tried searching the old lists to see if this topic has been covered, and perhaps this question arises from my own lack of familiarity with BLAST, but (from my perl script listed below) I get different results with remote blast when I call my script (that is, I will either get hits or no hits at all). I'll call the script one time, and get no hits. Then call the script again (with the same parameters), and get the same several hits that I may have before after having gotten no hits. I use a subroutine to parse the blast report information, and then I use a boolean to indicate whether results are returned or not. Any insight into what I may have missed would be appreciated. Short question, is this behavior typical? My understanding of how BLAST works is that it shouldn'tl... Thanks in advance, Roy #!/usr/bin/perl -w use strict; use warnings; use Carp; use Bio::Perl; use CGI; use Bio::SeqIO; use Bio::SearchIO; use Bio::SeqFeature::Generic; use Bio::Restriction::Analysis; use Bio::Tools::Run::RemoteBlast; use Bio::SimpleAlign; use Bio::AlignIO; use Bio::LocatableSeq; my $five_seqobj = Bio::Seq->new( -seq => 'ATTCCCACCGGGACCTGCGGGGCTGAGTGCCCTTCTCGGTTGCTGCCGCTGAGGAGCCCGCCCAGCCAGCCAGGGCCGCGAGGCCGAGGCCAGGCCGCAGCCCAGGAGCCGCCCCACCGCAGCTGGCGATGGACCCGCCGAGGCCCGCGCTGCTGGCGCTGCTGGCGCTGCCTGCGCTGCTGCTGCTGCTGCTGGCGGGCGCCAGGGCCG', -display_id => 'genomic_a', -alphabet => 'dna', ); my $three_seqobj = Bio::Seq->new( -seq => 'GTGAGTGCGCGGCCGCTCTGCGGGCGCAGAGGGAGCGGGAGGGAGCCGGCGGCACGAGGTTGGCCGGGGCAGCCTGGGCCTAGGCCAGAGGGAGGGCAGCCACAGGGTCCAGGGCGAGTGGGGGGATTGGACCAGCTGGCGGCCCCTGCAGGCTCAGGATGGGGGGCGCGGGATGGAGGGGCTGAGGAGGGGGTCTCCGGAGCCTGCCTC', -display_id => 'genomic_b', -alphabet => 'dna', ); my @params = ( '-program' => 'blastn', '-database' => 'refseq_genomic', '-expect' => '10', '-readmethod' => 'blastxml' ); $Bio::Tools::Run::RemoteBlast::HEADER{'MEGABLAST'} = 'YES'; $Bio::Tools::Run::RemoteBlast::HEADER{'PERC_IDENT'} = 75; $Bio::Tools::Run::RemoteBlast::HEADER{'FORMAT_TYPE'} = 'XML'; $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Homo sapiens [ORGN]'; $Bio::Tools::Run::RemoteBlast::HEADER{'HITLIST_SIZE'} = 100; # Put: limit number of hits my $factory_a = Bio::Tools::Run::RemoteBlast->new(@params); $factory_a->retrieve_parameter('FORMAT_TYPE', 'XML'); my $hits_a; my $hits_b; my $r; my $bool_hit; print "Submitting BLAST query - 5' end (MEGABLAST = YES)\n"; $Bio::Tools::Run::RemoteBlast::HEADER{'MEGABLAST'} = 'YES'; $r = $factory_a->submit_blast($a_seqobj); $bool_hit = fetch_blast_report($factory_a); unless ($bool_hit) { print "\nNo hits\n"; print "Re-submitting BLAST query - 5' end (MEGABLAST = NO)\n"; sleep 5; $Bio::Tools::Run::RemoteBlast::HEADER{'MEGABLAST'} = 'NO'; $r = $factory_a->submit_blast($a_seqobj); ($bool_hit, $hits_a) = fetch_blast_report($factory_a); if ($bool_hit == 0) { print "No hits\n"; } sleep 5; } my $factory_b = Bio::Tools::Run::RemoteBlast->new(@params); print "\n--------------------------------------------------\n\n"; print "Submitting BLAST query - 3' end (MEGABLAST = YES)\n"; $Bio::Tools::Run::RemoteBlast::HEADER{'MEGABLAST'} = 'YES'; $r = $remote_blast_three->submit_blast($b_seqobj); $bool_hit = fetch_blast_report($factory_b); unless ($bool_hit) { print " No hits\n"; print "Re-submitting BLAST query - 3' end (MEGABLAST = NO)\n"; sleep 5; $Bio::Tools::Run::RemoteBlast::HEADER{'MEGABLAST'} = 'NO'; $r = $factory_b->submit_blast($b_seqobj); ($bool_hit, $hits_b) = fetch_blast_report($factory_b); if ($bool_hit == 0) { print " No hits\n"; } sleep 5; } print "\nbye\n\n"; print "$hits_a\n$hits_b\n"; exit; sub fetch_blast_report { my ($factory) = @_; my $v = 1; my $bool_hit = 0; my $hits = ''; print STDERR "waiting..."; while (my @rids = $factory->each_rid) { foreach my $rid (@rids) { print STDERR "."; my $rc = $factory->retrieve_blast($rid); # retrieves blast report from remote blast queue, # returns -1 on error, 0 on 'job not finished', Bio::SearchIO object # args, remote blast id (rid) if (!ref($rc)) { # if not empty string, ref EXPR returns a non-empty string if EXPR is a reference if ($rc < 0) { $factory->remove_rid($rid); } print STDERR "." if ($v > 0); ##################################################################################### is this printing out as multiple dots? when and why? sleep 5; } else { $bool_hit = 1; my $result = $rc->next_result(); unless ($result->num_hits > 0) { $bool_hit = 0; } # returns: Bio::Search::Result::ResultI object $factory->remove_rid($rid); print "\ndatabase:\t", $result->database_name,"\n"; print "query name:\t", $result->query_name,"\n"; print "query length\t", $result->query_length,"\n"; print "num hits\t", $result->num_hits,"\n"; if ($result->num_hits) { # $result->hits returns an array of hits # $results->no_hits_found, boolean vs $#{@hits} ie. filtering\ while (my $hit = $result->next_hit) { print "\nhit name:\t", $hit->name,"\n"; print "description:\t", $hit->description,"\n"; print "locus:\t", $hit->locus,"\n"; print "algorithm: ", $hit->algorithm,"\thit length: ", $hit->length,"\thit ranking: ", $hit->rank,"\n"; while (my $hsp = $hit->next_hsp) { print "evalue: ", $hsp->evalue,"\tscore: ", $hsp->score,"\tpercent_id: ", $hsp->percent_identity,"\n"; print "query_start: ", $hsp->query->start,"\tquery_end: ", $hsp->query->end; print "\tquery_length: ", $hsp->query->length,"\tquery_strand: ", $hsp->strand('query'), "\n"; print "subject_start: ", $hsp->subject->start,"\tsubject_end: ", $hsp->subject->end; print "\tsubject_length: ", $hsp->subject->length,"\tsubject_strand: ", $hsp->strand('subject'), "\n\n"; my $aln = $hsp->get_aln; if ($aln->is_flush) { foreach my $seq ($aln->each_seq) { print $seq->seq,"\n"; } print $aln->gap_line, "\n"; print $aln->consensus_string(95), "\n\n"; } $hits .= $hit->name."\t".$hsp->subject->start."\t".$hsp->subject->end."\t".$hsp->strand('subject')."\n"; } } } } } return ($bool_hit, $hits); } } From maj at fortinbras.us Wed Nov 25 04:12:13 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 24 Nov 2009 23:12:13 -0500 Subject: [Bioperl-l] Bio::DB::Fasta In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32B6308542C@exchsth.agresearch.co.nz> References: <18DF7D20DFEC044098A1062202F5FFF32B63085409@exchsth.agresearch.co.nz> <4296CD1039FC44B89034A1FD3E6721F3@NewLife> <18DF7D20DFEC044098A1062202F5FFF32B6308542C@exchsth.agresearch.co.nz> Message-ID: <3ECFA0236D1B467181EE63C8C6BE7E1F@NewLife> I seem to be able to do $db = Bio::DB::Fasta->new("$tmp/test.faa"); without a problem- something in the mixing of named and unnamed parameters? ----- Original Message ----- From: "Smithies, Russell" To: "'Mark A. Jensen'" ; "'bioperl-l'" Sent: Tuesday, November 24, 2009 5:21 PM Subject: RE: [Bioperl-l] Bio::DB::Fasta That's what I ended up doing. Also, there's no "obvious" way to index a single file so I ended putting the filename in the glob parameter. my $db = Bio::DB::Fasta->new( "$tmp", -glob => "test.faa", -reindex => 1 ); --Russell > -----Original Message----- > From: Mark A. Jensen [mailto:maj at fortinbras.us] > Sent: Wednesday, 25 November 2009 11:19 a.m. > To: Smithies, Russell; 'bioperl-l' > Subject: Re: [Bioperl-l] Bio::DB::Fasta > > The code (method index_dir() ) seems to expect all the fasta files to be > contained in that directory. Looks hairy; what about creating symlinks to > your > fasta files in a /tmp subdir and calling new() with that subdir? > ----- Original Message ----- > From: "Smithies, Russell" > To: "'bioperl-l'" > Sent: Tuesday, November 24, 2009 5:04 PM > Subject: [Bioperl-l] Bio::DB::Fasta > > > > Is there any way to pass a filename to Bio::DB::Fasta for the location > of > > where to write the directory.index? > > It's writing in the same dir as the fasta but I'd rather have it write > in /tmp > > as it's part of a web app. > > > > Thanx, > > > > Russell > > > > > > ======================================================================= > > Attention: The information contained in this message and/or attachments > > from AgResearch Limited is intended only for the persons or entities > > to which it is addressed and may contain confidential and/or privileged > > material. Any review, retransmission, dissemination or other use of, or > > taking of any action in reliance upon, this information by persons or > > entities other than the intended recipients is prohibited by AgResearch > > Limited. If you have received this message in error, please notify the > > sender immediately. > > ======================================================================= > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > From maj at fortinbras.us Wed Nov 25 17:25:30 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 25 Nov 2009 12:25:30 -0500 Subject: [Bioperl-l] question for all regarding a sam-based Bio::Assembly::IO Message-ID: <1E72D5B0A190448FA27545DB5B68638D@NewLife> Short-readers, I'm working on an Assembly::IO class for sam alignments. I'm currently making a decision about handling multiple reference sequences: would you prefer that next_assembly() return an assembly that covers all reference sequences, or that next_assembly iterates over each reference sequence? (Or both?) thanks for your input- MAJ From timbourine81 at gmail.com Wed Nov 25 17:40:52 2009 From: timbourine81 at gmail.com (Tim) Date: Wed, 25 Nov 2009 18:40:52 +0100 Subject: [Bioperl-l] How to parse BLAST output - all hits of each query in new file Message-ID: <4B0D6C24.2080308@gmail.com> Dear bioperl users, I am a real newbie and have - maybe a very trivial - question. I searched the mailing list archive and many howtos but I have not found a concrete answer to my problem. So hopefully you can help me :) Background: I use the latest Bioperl version (installed it two weeks before). When I use Bio::Tools::Run::StandAloneBlast to BLAST one fasta file including different sequences, I get a BLAST output with many queries each having several hits / sbjcts. My problem is how to parse *all* hits of *one* query into a single new file. And this for all the queries I have in my BLAST output file. Or is it better the other way round; first to make fasta files with only single sequences inside and BLAST each file? But how can I automize that using Bioperl? I tried Bio::SearchIO but can only parse all queries and their respective hits in only one file... I think iteration is also necessary here, but I do not really know how to include that into Bio::SearchIO. Or do I have to use Module:Bio::Index::Blast? I can index a file (see below), but I have no idea what comes next... ###How I index a file... #!/usr/bin/perl -w $ENV{BIOPERL_INDEX_TYPE} = "SDBM_File"; use Bio::Index::Fasta; $file_name = "8_to_BLAST_two_seq_index.fasta"; $id = "48882"; $inx = Bio::Index::Fasta->new (-filename => $file_name . ".idx", -write_flag => 1); $inx->make_index($file_name); Hopefully, you can give me at least hints what to look for. A big THANKS in advance! Cheers, Tim From timbourine81 at gmail.com Wed Nov 25 17:53:34 2009 From: timbourine81 at gmail.com (Tim) Date: Wed, 25 Nov 2009 18:53:34 +0100 Subject: [Bioperl-l] How to parse different (fasta) files Message-ID: <4B0D6F1E.8@gmail.com> Hey everybody, another question from me...if you do not mind :) My situation is like this: I have parsed a standalone BLAST output using SearchIO with only the hit names. Now I have a second fasta file with the same sequences like in the BLAST database but including an alignment (meaning "." and "-"). (There is no chance to make a BLAST database with fasta files including the alignment, unfortunately...). My intention is now to take the name of the hit sequences (BLAST output) and to get the corresponding aligned sequences (fasta file incl. alignment) and putting it in a new file. Is anybody out there who has tried that before? Again, I am a absolute greenhorn in using (Bio)perl. Maybe it is very simple :D Looking forward to get an answer of you. All the best, Tim -- Tim K?hler MPI for Terrestrial Microbiology Karl-von-Frisch-Stra?e D-35043 Marburg / Germany Email: koehlerd at mpi-marburg.mpg.de Phone: +49 6421 178-740 Fax: +49 6421 178-999 From maj at fortinbras.us Wed Nov 25 18:20:03 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 25 Nov 2009 13:20:03 -0500 Subject: [Bioperl-l] How to parse BLAST output - all hits of each query innew file In-Reply-To: <4B0D6C24.2080308@gmail.com> References: <4B0D6C24.2080308@gmail.com> Message-ID: <53DE480F205E42CE8D2B9421592AAF0E@NewLife> hey Tim-- Sound like you need to go about collecting your queries inside out: my %hits_by_query; for ($result->hits) { push @{$hits_by_query{$hit->name}} $hit; } I believe now each hash element, keyed by the query name, will contain an arrayref to the set of hits assoc with that query. >From here, I believe use Bio::Search::Result::BlastResult; use Bio::SearchIO; foreach my $qid ( keys %hits_by_query ) { my $result = Bio::Search::Result::BlastResult->new(); $result->add_hit($_) for ( @{$hits_by_query{$qid}} ); my $blio = Bio::SearchIO->new( -file => $qid.".bls", -format=>'blast' ); $blio->write_result($result); } will do what you want. hope this helps - Mark ----- Original Message ----- From: "Tim" To: Sent: Wednesday, November 25, 2009 12:40 PM Subject: [Bioperl-l] How to parse BLAST output - all hits of each query innew file > Dear bioperl users, > > I am a real newbie and have - maybe a very trivial - question. > > I searched the mailing list archive and many howtos but I have not found > a concrete answer to my problem. So hopefully you can help me :) > > Background: I use the latest Bioperl version (installed it two weeks > before). > When I use Bio::Tools::Run::StandAloneBlast to BLAST one fasta file > including different sequences, I get a BLAST output with many queries > each having several hits / sbjcts. > > My problem is how to parse *all* hits of *one* query into a single new > file. And this for all the queries I have in my BLAST output file. > > Or is it better the other way round; first to make fasta files with only > single sequences inside and BLAST each file? But how can I automize that > using Bioperl? > > I tried Bio::SearchIO but can only parse all queries and their > respective hits in only one file... > I think iteration is also necessary here, but I do not really know how > to include that into Bio::SearchIO. > Or do I have to use Module:Bio::Index::Blast? > > I can index a file (see below), but I have no idea what comes next... > > ###How I index a file... > > #!/usr/bin/perl -w > > $ENV{BIOPERL_INDEX_TYPE} = "SDBM_File"; > > use Bio::Index::Fasta; > > > $file_name = "8_to_BLAST_two_seq_index.fasta"; > $id = "48882"; > $inx = Bio::Index::Fasta->new (-filename => $file_name . ".idx", > -write_flag => 1); > $inx->make_index($file_name); > > > Hopefully, you can give me at least hints what to look for. > > A big THANKS in advance! > > Cheers, > > Tim > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From Russell.Smithies at agresearch.co.nz Wed Nov 25 19:07:26 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 26 Nov 2009 08:07:26 +1300 Subject: [Bioperl-l] How to parse BLAST output - all hits of each query in new file In-Reply-To: <4B0D6C24.2080308@gmail.com> References: <4B0D6C24.2080308@gmail.com> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32B63085701@exchsth.agresearch.co.nz> Hi Tim, Here's some code for a job I'm working on at the moment that contains all the bits you'll probably need. It's extracting 2 species-specific databases from nr (based on tax ids), doing a blast, then parsing the results and creating a substitution matrix. I was initially using Bio::DB::Eutilities to query and retrieve sequences but I kept getting errors and time-outs from NCBI when pulling back large numbers of sequences. It should give you a rough idea of how to run Bio::Tools::Run::StandAloneBlast, Bio::DB::Fasta and Bio::SearchIO. Email me direct if you want further explaination as it's not well commented ;-) Russell Smithies Bioinformatics Applications Developer T +64 3 489 9085 E russell.smithies at agresearch.co.nz Invermay Research Centre Puddle Alley, Mosgiel, New Zealand T +64 3 489 3809 F +64 3 489 9174 www.agresearch.co.nz ======================================= #!/usr/local/bin/perl use strict; use Bio::Tools::Run::StandAloneBlast; use Bio::SeqIO; use Bio::SearchIO; use Bio::DB::Fasta; use Storable; # Parameters: # Percentage can be specified as either 20p, 20P or 20% # So for 20% of rice sequences blasted against oil palm: # 4530 51953 20p (4530=rice,51953=oil_palm, 20p=20%) # Or for 20 searches: # 4530 51953 20 # my ( $q, $s, $c ) = @ARGV; my $nr = "/data/databases/flatfile/illuminati_blastdata/nr"; my $tax_file = "/data/anonftp/pub/mirror/taxonomy/gi_taxid_prot.dmp.gz"; my $tmp = "/tmp/tax"; my %stats = (); my $total_subs = 0; my $min_hsp_len = 0; my $min_hsp_identity = 0; my $num_searches = $c || 10; my $blast_e = '1e-6'; my $count = 0; # check if all the fasta and blast files exist # if not, extract new fasta and re-formatdb the database foreach my $t ( $q, $s ) { foreach ( map { "$tmp/$t.$_" } qw(faa list phr pin psq) ) { unless ( -e $_ ) { print "Creating database for $t\n"; &create_database($t); last; } } } my @params = ( -database => "$tmp/$q", -program => 'blastp', -e => $blast_e, -outfile => "$tmp/blast.out", -v => '1', -b => '1' ); my $factory = Bio::Tools::Run::StandAloneBlast->new(@params) or die $!; # load the query sequences into a db # makes it easier to randomly access them my $db = Bio::DB::Fasta->new( "$tmp", -glob => "$s.faa", -reindex => 1 ); my @ids = $db->ids; my $id_count = $#ids; exit "No sequences\n" unless $id_count; # if a percentage is requested, calculate # the required number of searches if ( $num_searches =~ m/(\d+)[pP%]/ ) { $num_searches = int( ( $1 / 100 ) * $id_count ); warn "Searching random $1 percent ($num_searches) of $id_count sequences from taxid $q\n"; } my $summary_file = "$tmp/".$$."_summary.txt"; open( OUT, ">", $summary_file ) or die $!; print OUT "#Summary of $num_searches random blast searches from taxid $q against taxid $s.\n"; print OUT "#Parameters used were:\n"; print OUT "#blast_e: $blast_e\n"; print OUT "#min_hsp_len: $min_hsp_len\n"; print OUT "#min_hsp_identity: $min_hsp_identity\n"; print OUT "\n"; while ( my $seq = $db->get_Seq_by_id( $ids[ rand($#ids) ] ) ) { next unless $seq; warn "Processing ", $seq->id, "\n"; eval { my $blast_report = $factory->blastall($seq); sleep 5; }; my $blast_in = new Bio::SearchIO( -format => "blast", -file => "$tmp/blast.out" ); while ( my $result = $blast_in->next_result ) { if ( $result->num_hits <= 0 ) { warn "No hits for ", $result->query_accession, "\n"; print OUT "No hits for ", $result->query_accession, "\n"; next; } $count++; while ( my $hit = $result->next_hit ) { while ( my $hsp = $hit->next_hsp ) { warn sprintf( "%s had %s hsp%s\n", $result->query_accession, $hit->num_hsps, $hit->num_hsps > 1 ? "s" : "" ); print OUT sprintf( "%s had %s hsp%s\n", $result->query_accession, $hit->num_hsps, $hit->num_hsps > 1 ? "s" : "" ); # http://www.bioperl.org/wiki/HOWTO:SearchIO#Table_of_Methods if ( $hsp->length('total') > $min_hsp_len ) { if ( $hsp->percent_identity >= $min_hsp_identity ) { my @query_string = split '', $hsp->query_string; my @homol_string = split '', $hsp->homology_string; my @hit_string = split '', $hsp->hit_string; for ( my $i = 0; $i < $#query_string; $i++ ) { next unless $homol_string[$i] =~ /\+/; $stats{ $query_string[$i] }{ $hit_string[$i] }++; $total_subs++; } } } } } } unlink '$tmp/blast.out' if -e '$tmp/blast.out'; last if $count >= $num_searches; } # create summary frequency list my %summary = (); for my $query ( keys %stats ) { for my $hit ( keys %{ $stats{$query} } ) { $summary{"$query->$hit"} = sprintf( "%6f", $stats{$query}{$hit} / $total_subs ); } } print OUT "\n"; # sort by decending frequencies and print to summary file foreach my $k ( sort { $summary{$b} <=> $summary{$a} } keys %summary ) { print OUT "$k\t", $summary{$k}, "\n" unless $k =~ /TOTAL/; } print OUT "\n\n"; # print substitution matrix my $i = 0; my @prots = qw(A R N D C Q E G H I L K M F P S T W Y V); my $sep = "\t"; print OUT sprintf( "%7s %s", $_, $sep ) foreach ( " ", @prots ); print OUT "\n"; foreach my $x (@prots) { print OUT sprintf( "%7s|%s", $prots[ $i++ ], $sep ); foreach my $y (@prots) { my $val = defined( $stats{$x}{$y} ) ? sprintf( "%0.6f", $stats{$x}{$y} / $total_subs ) : "--------"; print OUT sprintf( "%s%s", $val, $sep ); } print OUT "\n"; } close OUT; open(IN, $summary_file) or die $!; print $_ while(); close IN; # extract sequences from nr database based on taxid. sub create_database { my $txid = shift; my %hash = (); my $gi_stored = "/tmp/gi.dat"; if ( -e $gi_stored ) { %hash = %{ retrieve($gi_stored) }; } else { open( TXID, "zcat $tax_file | " ) or die $!; while () { chomp; my ( $gi, $tx ) = split( "\t", $_ ); push( @{ $hash{$tx} }, $gi ); } close TXID; store( \%hash, $gi_stored ); } my $txlist = "$tmp/$txid.list"; my $txseq = "$tmp/$txid.faa"; die "No sequences found for taxid $txid\n" unless defined( @{ $hash{$txid} }); my $num_seqs = scalar( @{ $hash{$txid} }); warn "Found $num_seqs sequences for taxid $txid in $tax_file\n"; open OUT, ">", $txlist or die $!; print OUT "$_\n" foreach ( @{ $hash{$txid} } ); close OUT; my $cmd = "fastacmd -d $nr -i $txlist -t T -o $txseq 2>/dev/null"; system $cmd; my $count = `grep -c '>' $txseq`; $count =~ s/\n//; warn "Could only extract $count sequences from $nr\n"; $cmd = "formatdb -p T -i $tmp/$txid.faa -n $tmp/$txid -l $tmp/formatdb.log"; system $cmd; $cmd = "fastacmd -d $tmp/$txid -I"; system $cmd; warn "Check the formatdb.log for any errors\n"; } ======================================= > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Tim > Sent: Thursday, 26 November 2009 6:41 a.m. > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] How to parse BLAST output - all hits of each query in > new file > > Dear bioperl users, > > I am a real newbie and have - maybe a very trivial - question. > > I searched the mailing list archive and many howtos but I have not found > a concrete answer to my problem. So hopefully you can help me :) > > Background: I use the latest Bioperl version (installed it two weeks > before). > When I use Bio::Tools::Run::StandAloneBlast to BLAST one fasta file > including different sequences, I get a BLAST output with many queries > each having several hits / sbjcts. > > My problem is how to parse *all* hits of *one* query into a single new > file. And this for all the queries I have in my BLAST output file. > > Or is it better the other way round; first to make fasta files with only > single sequences inside and BLAST each file? But how can I automize that > using Bioperl? > > I tried Bio::SearchIO but can only parse all queries and their > respective hits in only one file... > I think iteration is also necessary here, but I do not really know how > to include that into Bio::SearchIO. > Or do I have to use Module:Bio::Index::Blast? > > I can index a file (see below), but I have no idea what comes next... > > ###How I index a file... > > #!/usr/bin/perl -w > > $ENV{BIOPERL_INDEX_TYPE} = "SDBM_File"; > > use Bio::Index::Fasta; > > > $file_name = "8_to_BLAST_two_seq_index.fasta"; > $id = "48882"; > $inx = Bio::Index::Fasta->new (-filename => $file_name . ".idx", > -write_flag => 1); > $inx->make_index($file_name); > > > Hopefully, you can give me at least hints what to look for. > > A big THANKS in advance! > > Cheers, > > Tim > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From maj at fortinbras.us Wed Nov 25 19:21:27 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 25 Nov 2009 14:21:27 -0500 Subject: [Bioperl-l] How to parse BLAST output - all hits of each queryinnew file In-Reply-To: <53DE480F205E42CE8D2B9421592AAF0E@NewLife> References: <4B0D6C24.2080308@gmail.com> <53DE480F205E42CE8D2B9421592AAF0E@NewLife> Message-ID: <815D2A47BC9C4D89B8DEF0B10DA9EAF8@NewLife> whoops: change the following line: my $blio = Bio::SearchIO->new( -file => $qid.".bls", -format=>'blast' ); to my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", -format=>'blast' ); (I always forget that...) MAJ ----- Original Message ----- From: "Mark A. Jensen" To: "Tim" ; Sent: Wednesday, November 25, 2009 1:20 PM Subject: Re: [Bioperl-l] How to parse BLAST output - all hits of each queryinnew file > hey Tim-- > > Sound like you need to go about collecting your queries inside out: > > my %hits_by_query; > for ($result->hits) { > push @{$hits_by_query{$hit->name}} $hit; > } > > I believe now each hash element, keyed by the query name, will contain > an arrayref to the set of hits assoc with that query. >>From here, I believe > > use Bio::Search::Result::BlastResult; > use Bio::SearchIO; > > foreach my $qid ( keys %hits_by_query ) { > my $result = Bio::Search::Result::BlastResult->new(); > $result->add_hit($_) for ( @{$hits_by_query{$qid}} ); > my $blio = Bio::SearchIO->new( -file => $qid.".bls", -format=>'blast' ); > $blio->write_result($result); > } > > will do what you want. > > hope this helps - > Mark > > ----- Original Message ----- > From: "Tim" > To: > Sent: Wednesday, November 25, 2009 12:40 PM > Subject: [Bioperl-l] How to parse BLAST output - all hits of each query innew > file > > >> Dear bioperl users, >> >> I am a real newbie and have - maybe a very trivial - question. >> >> I searched the mailing list archive and many howtos but I have not found >> a concrete answer to my problem. So hopefully you can help me :) >> >> Background: I use the latest Bioperl version (installed it two weeks >> before). >> When I use Bio::Tools::Run::StandAloneBlast to BLAST one fasta file >> including different sequences, I get a BLAST output with many queries >> each having several hits / sbjcts. >> >> My problem is how to parse *all* hits of *one* query into a single new >> file. And this for all the queries I have in my BLAST output file. >> >> Or is it better the other way round; first to make fasta files with only >> single sequences inside and BLAST each file? But how can I automize that >> using Bioperl? >> >> I tried Bio::SearchIO but can only parse all queries and their >> respective hits in only one file... >> I think iteration is also necessary here, but I do not really know how >> to include that into Bio::SearchIO. >> Or do I have to use Module:Bio::Index::Blast? >> >> I can index a file (see below), but I have no idea what comes next... >> >> ###How I index a file... >> >> #!/usr/bin/perl -w >> >> $ENV{BIOPERL_INDEX_TYPE} = "SDBM_File"; >> >> use Bio::Index::Fasta; >> >> >> $file_name = "8_to_BLAST_two_seq_index.fasta"; >> $id = "48882"; >> $inx = Bio::Index::Fasta->new (-filename => $file_name . ".idx", >> -write_flag => 1); >> $inx->make_index($file_name); >> >> >> Hopefully, you can give me at least hints what to look for. >> >> A big THANKS in advance! >> >> Cheers, >> >> Tim >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From alden.huang at gmail.com Thu Nov 26 10:54:30 2009 From: alden.huang at gmail.com (Alden Huang) Date: Thu, 26 Nov 2009 02:54:30 -0800 Subject: [Bioperl-l] Function that determines serious mutations In-Reply-To: References: Message-ID: <9e408d720911260254r1e85169lb92d944d88a1880c@mail.gmail.com> Hey rob, Sorting Intolerant from Tolerant http://sift.jcvi.org/ ~alden ...a bit late, i kno; I just read you post now while cleaning the inbox On Fri, Nov 6, 2009 at 9:35 AM, Robert Bradbury wrote: > Is there a function in the library (or has someone written one) that can > take a genbank entry and determine which mutations are harmful? > > It would be used to produce a table summary of: > ?GENE ? ? ? ? ?# SNP ? ? ?# BadSNP > > One kind of gets this from NCBI if you lookup in the "GENE" db a gene name > and then go to the "GeneView" om dbSNP page it has the information I want > but largely in a graphical format while I simply want numbers I can dump > into a spreadsheet. > > I don't think it would be hard, fetch the gene, run through the features for > the SNP database, figure out whether they are good or bad SNPs, accumulate > the statistics and dump it. ?I think the functions available are flexible > enough to do it but I can't believe nobody has already done it. ?It could be > a bit more complex in that one could do an analysis to see if the mutations > are in a conserved domain or mutations that code for Cysteine or Methionine > (or othe potentially "critical" amino acids) but since "critical" is in the > eye of the beholder there would have to be some kind of callback to a > scoring function. > > Thanks, > Robert > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From robert.bradbury at gmail.com Thu Nov 26 11:27:50 2009 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Thu, 26 Nov 2009 06:27:50 -0500 Subject: [Bioperl-l] Function that determines serious mutations In-Reply-To: <9e408d720911260254r1e85169lb92d944d88a1880c@mail.gmail.com> References: <9e408d720911260254r1e85169lb92d944d88a1880c@mail.gmail.com> Message-ID: On Thu, Nov 26, 2009 at 5:54 AM, Alden Huang wrote: > > Sorting Intolerant from Tolerant > http://sift.jcvi.org/ > > Ah yes, thank you very much. This looks very much like a tool that can be adapted for various uses. Robert From jason at bioperl.org Thu Nov 26 17:16:17 2009 From: jason at bioperl.org (Jason Stajich) Date: Thu, 26 Nov 2009 09:16:17 -0800 Subject: [Bioperl-l] question about a Bio::Tree::Tree method In-Reply-To: <30960443.966281259248778372.JavaMail.defaultUser@defaultHost> References: <30960443.966281259248778372.JavaMail.defaultUser@defaultHost> Message-ID: <14F4B8C9-A1F4-436B-813F-50E139932D3D@bioperl.org> Emilio - please ask your questions on the list - many people there can help answer questions. get_nodes returns all the nodes in the tree, the options specify the order they are returned in. Depending on your question the order probably won't matter so you can just call it without any arguments like in the examples and the HOWTO. The documentation for the method says: Title : get_nodes Usage : my @nodes = $tree?>get_nodes() Function: Return list of Bio::Tree::NodeI objects Returns : array of Bio::Tree::NodeI objects Args : (named values) hash with one value order => ?b?breadth? first order or ?d?depth? first order So you can provide no arguments and get the default (breadth-first I believe) or you can specify -order => 'd' or -order => 'depth' to get the nodes in depth-first order. -jason On Nov 26, 2009, at 7:19 AM, miglio83 at libero.it wrote: > Hi Jason, > I'm Emilio Siena, a PhD student of the University of Perugia. > I have > a question about the method "get_nodes" of the "Bio::Tree::Tree" > class. > In > particular I didn't understand which type of arguments it accepts > and in which > format an argument should be given. > > Thank you in advance! > > Emilio -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From maj at fortinbras.us Thu Nov 26 17:40:45 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 26 Nov 2009 12:40:45 -0500 Subject: [Bioperl-l] Bio::Assembly::IO::sam is alpha Message-ID: <599F8BABCD2848EFA98FB24A4419674E@NewLife> in bioperl-live/trunk with plenty pod; bravehearts can (please!) test on .bam files cheers, MAJ From mauricio at open-bio.org Thu Nov 26 21:45:43 2009 From: mauricio at open-bio.org (Mauricio Herrera Cuadra) Date: Thu, 26 Nov 2009 15:45:43 -0600 Subject: [Bioperl-l] [DAS] DAS workshop 7th-9th April 2010 In-Reply-To: References: Message-ID: <4B0EF707.6080202@open-bio.org> Hi Jonathan, Any chance it can be webcasted? I'm sure it would attract a lot of remote attendees ;) Regards, Mauricio. Jonathan Warren wrote: > We are considering running a Distributed Annotation System workshop here > at the Sanger/EBI in the UK subject to decent demand. > The workshop will be held from Wednesday 7th-Friday 9th April 2010. If > you would be interested in attending either to present or just take part > then please email me jw12 at sanger.ac.uk > > The format of the workshop is likely to be similar to last years (1st > day for beginners, 2nd for both beginners and advanced users, 3rd day > for advanced), information for which can be found here: > http://www.dasregistry.org/course.jsp > > If you would like to present then please send a short summary of what > you would like to talk about. > > Thanks > > Jonathan. > > Jonathan Warren > Senior Developer and DAS coordinator > jw12 at sanger.ac.uk > > > > > > > > > From robert.bradbury at gmail.com Fri Nov 27 02:06:40 2009 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Thu, 26 Nov 2009 21:06:40 -0500 Subject: [Bioperl-l] BioPerl "guts" question regarding forked processes Message-ID: I'm currently running near my process limit and running sequence fetches from swissprot (I've also had this happen with getting gi's from NCBI) and am running out of processes about halfway through the set I'm trying to fetch [1]. Now, is there someplace in the bioperl documentation that documents where one is supposed to wait() for defunct processes after each sequence fetch. I'm encountering the problem both when the sequence fetches succeed as well as when they fail. Thanks in advance. Robert 1. This is due to a bug in chromium's use of flash that involves it leaving many defunct processes that are uncollected and therefore counting towards ones "process limit". From kanzure at gmail.com Fri Nov 27 02:12:46 2009 From: kanzure at gmail.com (Bryan Bishop) Date: Thu, 26 Nov 2009 20:12:46 -0600 Subject: [Bioperl-l] BioPerl "guts" question regarding forked processes In-Reply-To: References: Message-ID: <55ad6af70911261812q583277d5l71df0d66e756f617@mail.gmail.com> On Thu, Nov 26, 2009 at 8:06 PM, Robert Bradbury wrote: > I'm currently running near my process limit and running sequence fetches > from swissprot (I've also had this happen with getting gi's from NCBI) and > am running out of processes about halfway through the set I'm trying to > fetch [1]. Hey Robert, sorry for the off-topic question, but I was wondering if you're the same Robert Bradbury from the extropy-chat list. Hi? - Bryan http://heybryan.org/ 1 512 203 0507 From paolo.pavan at gmail.com Fri Nov 27 11:35:03 2009 From: paolo.pavan at gmail.com (Paolo Pavan) Date: Fri, 27 Nov 2009 12:35:03 +0100 Subject: [Bioperl-l] More general Bio::Assembly::Contig question (was Bio::Tools::Run::Cap3 usage question) Message-ID: <56be91b60911270335s3a50ab0cpb03aabb6660f81dc@mail.gmail.com> Dear Florent, Thank you for your kind answer and for your efforts spent in this module. Since you are working on these topics I would like to seize the day and put you some questions about some doubts I have in mind, if you agree, of course :-) Some times ago I tried to work with bioperl, loading the data from an ACE file originated by Newbler; my need was to extract part of the contig like an alignment of reads and I tought to do it with a slice() method, since I saw Bio::Assembly::Contig implements Bio::AlignI interface. Unfortunately I realize that this interface is inherited but not implemented. I tried to hack it by adding a slice method which would act on a Bio::Alignment created from the array of LocatableSeqs representing the reads. This is the question: If I'm not wrong (please correct me if yes), Bio::Assembly::Contig class stores reads informations in: Bio::Assembly::Contigs->{_elem}{READ_NAME}{_feat}{ _align_clipping:READ_NAME} _aligned_coord:READ_NAME} _quality_clipping:READ_NAME} Anyone of these 3 features _align_clipping, _aligned_coord, _quality_clipping, contains a Bio::SeqFeature::Generic, which of them is more suitable to the purpose expressed before, the slice method? And more, If you apologize me for being too long, is consequently to the previous: I don't have perfectly clear the purpose of this 3 feature per read, can you explain it? Really thanks you for the time you would spend. Bye bye, Paolo 2009/11/24 Florent Angly > Hi Paolo, > > It turns out that there is no standard for what is to be passed to the > Bio::Tools::Run wrappers and returned by them. I noticed the inconsistency > between the assembly wrappers recently while implementing support for new > wrapper. I implemented inital support for additional de novo assembly > programs in BioPerl (454 Newbler and Minimo) a couple of weeks ago and Mark > Jensen added support for Maq, a program that assembler reads against a > reference. In the process, all the assembly wrappers were changed to take > the same type of input data (a FASTA sequence or an array reference of > sequence objects) and return one of the following: > * a Bio::Assembly::Scaffold object (the default), or > * a Bio::Assembly::IO object, or > * the name of a file for the output of the assembler > Use the out_type method to set up which output you want, e.g.: > $factory->out_type('Bio::Assembly::IO'); > or > $factory->out_type('cap3_results.ace'); > You'll have to use the code in the bioperl-run subversion if you want to > use these new features. > > Cheers, > > Florent > > > > > Paolo Pavan wrote: > >> Dear, >> I'm confused about the proper usage of the module Bio::Tools::Run::Cap3. >> As documented in the pod, the run(@seqs) method returns the cap3 report >> file >> while I expect to return a Bio::Assembly object, consistently with other >> Bio::Tools::Run classes. >> However, I went around this by getting from the factory object the >> location >> and the names of the temp output files (actually accessing a private >> property, although) and reading them via the Assembly::IO system. >> I was just wandering what is the proper designed way to do this job. >> >> Thank you for enlighten the way! >> Paolo >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > > From jw12 at sanger.ac.uk Thu Nov 26 14:57:35 2009 From: jw12 at sanger.ac.uk (Jonathan Warren) Date: Thu, 26 Nov 2009 14:57:35 +0000 Subject: [Bioperl-l] DAS workshop 7th-9th April 2010 Message-ID: We are considering running a Distributed Annotation System workshop here at the Sanger/EBI in the UK subject to decent demand. The workshop will be held from Wednesday 7th-Friday 9th April 2010. If you would be interested in attending either to present or just take part then please email me jw12 at sanger.ac.uk The format of the workshop is likely to be similar to last years (1st day for beginners, 2nd for both beginners and advanced users, 3rd day for advanced), information for which can be found here: http://www.dasregistry.org/course.jsp If you would like to present then please send a short summary of what you would like to talk about. Thanks Jonathan. Jonathan Warren Senior Developer and DAS coordinator jw12 at sanger.ac.uk -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From timbourine81 at googlemail.com Thu Nov 26 16:02:30 2009 From: timbourine81 at googlemail.com (Tim Koehler) Date: Thu, 26 Nov 2009 17:02:30 +0100 Subject: [Bioperl-l] How to parse BLAST output - all hits of each queryinnew file In-Reply-To: <4B0EA44D.2050507@gmail.com> References: <4B0D6C24.2080308@gmail.com> <53DE480F205E42CE8D2B9421592AAF0E@NewLife> <815D2A47BC9C4D89B8DEF0B10DA9EAF8@NewLife> <4B0EA44D.2050507@gmail.com> Message-ID: ups, sent too early... Hey Mark, thanks for the answer. But I am still struggling, especially where to put in your code. Here ist the code I have, so far: #!/usr/bin/perl -w ### should I put your code here as push is a perl command? my %hits_by_query; for ($result->hits) { ### I inserted a comma after name}}; if there is no comma, there was the error: Scalar found where operator expected at 12_BLAST_two_sequence_each_query_one_file.PL line7, near "} $hit" ### (Missing operator before $hit?) ###Useless use of push with no values at 12_BLAST_two_sequence_each_query_one_file.PL line 7. ###syntax error at 12_BLAST_two_sequence_each_query_one_file.PL line 7, near "} $hit" ###BEGIN not safe after errors--compilation aborted at 12_BLAST_two_sequence_each_query_one_file.PL line 13. push @{$hits_by_query{$hit->name}}, $hit; ###here, every time this terror appears: Name "main::result" used only once: possible typo at 12_BLAST_two_sequence_each_query_one_file.PL line 5. ###error: Can't call method "hits" on an undefined value at 12_BLAST_two_sequence_each_query_one_file.PL line 5. } use strict; use Bio::Tools::Run::StandAloneBlast; use Bio::SeqIO; use Bio::SearchIO; use Bio::Search::Result::BlastResult; my $Seq_in = Bio::SeqIO->new ( -file => "/home/koehler/Programs/for_BLAST/BLAST_Pipeline/1_to_BLAST_two_seq.fasta", -format => 'fasta' ); while (my $query = $Seq_in->next_seq()) { my $factory = Bio::Tools::Run::StandAloneBlast->new( 'program' => 'blastn', 'database' => '/home/koehler/Programs/for_BLAST/BLAST_Pipeline/3_BLAST_db', _READMETHOD => "Blast" ); my $blast_report = $factory->blastall($query); ### Should I need to use a module? are the commands here at the right position? errors, e.g., Global symbol "$hit" requires explicit package name #my %hits_by_query; #for ($result->hits) { ### inserted comma after name}} # push @{$hits_by_query{$hit->name}}, $hit; #} foreach my $qid ( keys %hits_by_query ) { my $result = Bio::Search::Result::BlastResult->new(); $result->add_hit($_) for ( @{$hits_by_query{$qid}} ); my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", -format=>'blast' ); $blio->write_result($result); } ###where are the files stored? what is their name. Sorry, but I cannot get behind that :( while( my $result = $blast_report->next_result ) { ## $result is a Bio::Search::Result::ResultI compliant object while( my $hit = $result->next_hit ) { ## $hit is a Bio::Search::Hit::HitI compliant object while( my $hsp = $hit->next_hsp ) { ## $hsp is a Bio::Search::HSP::HSPI compliant object if( $hsp->length('total') > 50 ) { if ( $hsp->percent_identity >= 75 ) { print "Query= ", $result->query_name, "Hit= ", $hit->name, "Length= ", $hsp->length('total'), "Percent_id= ", $hsp->percent_identity, "Subject=", $hsp->hit_string,"\n"; } } } } } } Again, a big thanks in advance :) All the best, Tim On Thu, Nov 26, 2009 at 4:52 PM, Tim wrote: > Hey Mark, > > thanks for the answer > > On 25.11.2009 20:21, Mark A. Jensen wrote: > > whoops: change the following line: > > my $blio = Bio::SearchIO->new( -file => $qid.".bls", -format=>'blast' ); > > > > to > > > > my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", -format=>'blast' ); > > > > (I always forget that...) > > MAJ > > > > ----- Original Message ----- From: "Mark A. Jensen" > > To: "Tim" ; > > Sent: Wednesday, November 25, 2009 1:20 PM > > Subject: Re: [Bioperl-l] How to parse BLAST output - all hits of each > > queryinnew file > > > > > >> hey Tim-- > >> > >> Sound like you need to go about collecting your queries inside out: > >> > >> my %hits_by_query; > >> for ($result->hits) { > >> push @{$hits_by_query{$hit->name}} $hit; > >> } > >> > >> I believe now each hash element, keyed by the query name, will contain > >> an arrayref to the set of hits assoc with that query. > >>> From here, I believe > >> > >> use Bio::Search::Result::BlastResult; > >> use Bio::SearchIO; > >> > >> foreach my $qid ( keys %hits_by_query ) { > >> my $result = Bio::Search::Result::BlastResult->new(); > >> $result->add_hit($_) for ( @{$hits_by_query{$qid}} ); > >> my $blio = Bio::SearchIO->new( -file => $qid.".bls", -format=>'blast' > ); > >> $blio->write_result($result); > >> } > >> > >> will do what you want. > >> > >> hope this helps - > >> Mark > >> > >> ----- Original Message ----- From: "Tim" > >> To: > >> Sent: Wednesday, November 25, 2009 12:40 PM > >> Subject: [Bioperl-l] How to parse BLAST output - all hits of each > >> query innew file > >> > >> > >>> Dear bioperl users, > >>> > >>> I am a real newbie and have - maybe a very trivial - question. > >>> > >>> I searched the mailing list archive and many howtos but I have not > found > >>> a concrete answer to my problem. So hopefully you can help me :) > >>> > >>> Background: I use the latest Bioperl version (installed it two weeks > >>> before). > >>> When I use Bio::Tools::Run::StandAloneBlast to BLAST one fasta file > >>> including different sequences, I get a BLAST output with many queries > >>> each having several hits / sbjcts. > >>> > >>> My problem is how to parse *all* hits of *one* query into a single new > >>> file. And this for all the queries I have in my BLAST output file. > >>> > >>> Or is it better the other way round; first to make fasta files with > only > >>> single sequences inside and BLAST each file? But how can I automize > that > >>> using Bioperl? > >>> > >>> I tried Bio::SearchIO but can only parse all queries and their > >>> respective hits in only one file... > >>> I think iteration is also necessary here, but I do not really know how > >>> to include that into Bio::SearchIO. > >>> Or do I have to use Module:Bio::Index::Blast? > >>> > >>> I can index a file (see below), but I have no idea what comes next... > >>> > >>> ###How I index a file... > >>> > >>> #!/usr/bin/perl -w > >>> > >>> $ENV{BIOPERL_INDEX_TYPE} = "SDBM_File"; > >>> > >>> use Bio::Index::Fasta; > >>> > >>> > >>> $file_name = "8_to_BLAST_two_seq_index.fasta"; > >>> $id = "48882"; > >>> $inx = Bio::Index::Fasta->new (-filename => $file_name . ".idx", > >>> -write_flag => 1); > >>> $inx->make_index($file_name); > >>> > >>> > >>> Hopefully, you can give me at least hints what to look for. > >>> > >>> A big THANKS in advance! > >>> > >>> Cheers, > >>> > >>> Tim > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >>> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > > > > Tim K?hler MPI for Terrestrial Microbiology Karl-von-Frisch-Stra?e D-35043 Marburg / Germany Email: koehlerd at mpi-marburg.mpg.de Phone: +49 6421 178-740 Fax: +49 6421 178-999 From rtbio.2009 at gmail.com Sat Nov 28 07:53:43 2009 From: rtbio.2009 at gmail.com (Roopa Raghuveer) Date: Sat, 28 Nov 2009 08:53:43 +0100 Subject: [Bioperl-l] Linking of two cgi scripts Message-ID: hello everyone, I have a small question. I would like to link two cgi scripts i.e., I have an input sequence being entered in a text area ex:->gi|at442323|... ATGCCCCCTTGGAACCAAAAAAA.... So I would like to compare this with the query sequences.These query sequences would be from a BLAST script in the module blast.pm So once I enter the input sequence and request for BLAST using submit button,my request should go to a program which performs BLAST search.After this, the sequences obtained from BLAST have to be returned to a program Roopa.pm which compares the input sequence and the sequences obtained from blast. But I am unable to provide this link between the cgi scripts.(i.e.,one script to use BLAST,the other script to compare the sequences and send the results to the browser) Could any one help me in this regard? Regards, Roopa. From s.denaxas at gmail.com Sat Nov 28 10:56:15 2009 From: s.denaxas at gmail.com (Spiros Denaxas) Date: Sat, 28 Nov 2009 10:56:15 +0000 Subject: [Bioperl-l] Linking of two cgi scripts In-Reply-To: References: Message-ID: Hello, Why do they both have to be CGi scripts? cant all the processing happen server side, i.e. both BLAST and comparison of returned results? If that is strictly a requirement, you could: a) get input from user on script A, i.e. the input sequence b) do a HTTP request from the CGI to the other script B using LWP::UserAgent c) get results from script B, pass on to comparison module d) return results to user As I said, this will be clunky so either do everything in one go or consider AJAX hope this helps Spiros On Sat, Nov 28, 2009 at 7:53 AM, Roopa Raghuveer wrote: > hello everyone, > > I have a small question. > > I would like to link two cgi scripts i.e., > > I have an input sequence being entered in a text area > > ex:->gi|at442323|... > ATGCCCCCTTGGAACCAAAAAAA.... > > So I would like to compare this with the query sequences.These query > sequences would be from a BLAST script in the module blast.pm > So once I enter the input sequence and request for BLAST using submit > button,my request should go to a program which performs BLAST search.After > this, the sequences obtained from BLAST have to be returned to a program > Roopa.pm which compares the input sequence and the sequences obtained from > blast. > > But I am unable to provide this link between the cgi scripts.(i.e.,one > script to use BLAST,the other script to compare the sequences and send the > results to the browser) > > Could any one help me in this regard? > > Regards, > Roopa. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From maj at fortinbras.us Sat Nov 28 16:23:53 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sat, 28 Nov 2009 11:23:53 -0500 Subject: [Bioperl-l] Run wrappers for BWA and Samtools Message-ID: <7F56A6EEEB0E4EE291D5340F27DF7D3A@NewLife> Hi All, Run wrappers for the bwa assembler and the samtools suite are now available as beta in the bioperl-run/trunk. The bwa wrapper allows you to run a canned assembly pipeline, or to execute individual bwa components. The assembly pipeline can return a Bio::Assembly::Scaffold object via the new Bio::Assembly::IO::sam module in bioperl-live/trunk (this requires lstein's Bio::DB::Sam, from CPAN). Details at http://www.bioperl.org/wiki/HOWTO:Short-read_assemblies_with_BWA and, of course, in the pod. Cheers, MAJ From maj at fortinbras.us Sun Nov 29 02:55:42 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sat, 28 Nov 2009 21:55:42 -0500 Subject: [Bioperl-l] How to parse BLAST output - all hits of eachqueryinnew file In-Reply-To: References: <4B0D6C24.2080308@gmail.com><53DE480F205E42CE8D2B9421592AAF0E@NewLife><815D2A47BC9C4D89B8DEF0B10DA9EAF8@NewLife><4B0EA44D.2050507@gmail.com> Message-ID: <21BFD947CEEF43CCAC8AFFDB7A064A49@NewLife> Hi Tim-- There's a bug in my code; should be for my $hit ($result->hits) { ... } and you're right about the comma. My bad. But I don't think you need this-- you're already looping over your query sequences and doing blastn on each one. So in the middle of your loop, you can simply write the blast result that you got: my $blio = Bio::SearchIO->new( -file => ">".$query->id.".bls", -format=>"blast" ); $blio->write_result($result); and forget about the foreach my $qid loop entirely. The files should show up in the directory from which you're running the script. cheers, MAJ ----- Original Message ----- From: "Tim Koehler" To: Sent: Thursday, November 26, 2009 11:02 AM Subject: Re: [Bioperl-l] How to parse BLAST output - all hits of eachqueryinnew file ups, sent too early... Hey Mark, thanks for the answer. But I am still struggling, especially where to put in your code. Here ist the code I have, so far: #!/usr/bin/perl -w ### should I put your code here as push is a perl command? my %hits_by_query; for ($result->hits) { ### I inserted a comma after name}}; if there is no comma, there was the error: Scalar found where operator expected at 12_BLAST_two_sequence_each_query_one_file.PL line7, near "} $hit" ### (Missing operator before $hit?) ###Useless use of push with no values at 12_BLAST_two_sequence_each_query_one_file.PL line 7. ###syntax error at 12_BLAST_two_sequence_each_query_one_file.PL line 7, near "} $hit" ###BEGIN not safe after errors--compilation aborted at 12_BLAST_two_sequence_each_query_one_file.PL line 13. push @{$hits_by_query{$hit->name}}, $hit; ###here, every time this terror appears: Name "main::result" used only once: possible typo at 12_BLAST_two_sequence_each_query_one_file.PL line 5. ###error: Can't call method "hits" on an undefined value at 12_BLAST_two_sequence_each_query_one_file.PL line 5. } use strict; use Bio::Tools::Run::StandAloneBlast; use Bio::SeqIO; use Bio::SearchIO; use Bio::Search::Result::BlastResult; my $Seq_in = Bio::SeqIO->new ( -file => "/home/koehler/Programs/for_BLAST/BLAST_Pipeline/1_to_BLAST_two_seq.fasta", -format => 'fasta' ); while (my $query = $Seq_in->next_seq()) { my $factory = Bio::Tools::Run::StandAloneBlast->new( 'program' => 'blastn', 'database' => '/home/koehler/Programs/for_BLAST/BLAST_Pipeline/3_BLAST_db', _READMETHOD => "Blast" ); my $blast_report = $factory->blastall($query); ### Should I need to use a module? are the commands here at the right position? errors, e.g., Global symbol "$hit" requires explicit package name #my %hits_by_query; #for ($result->hits) { ### inserted comma after name}} # push @{$hits_by_query{$hit->name}}, $hit; #} foreach my $qid ( keys %hits_by_query ) { my $result = Bio::Search::Result::BlastResult->new(); $result->add_hit($_) for ( @{$hits_by_query{$qid}} ); my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", -format=>'blast' ); $blio->write_result($result); } ###where are the files stored? what is their name. Sorry, but I cannot get behind that :( while( my $result = $blast_report->next_result ) { ## $result is a Bio::Search::Result::ResultI compliant object while( my $hit = $result->next_hit ) { ## $hit is a Bio::Search::Hit::HitI compliant object while( my $hsp = $hit->next_hsp ) { ## $hsp is a Bio::Search::HSP::HSPI compliant object if( $hsp->length('total') > 50 ) { if ( $hsp->percent_identity >= 75 ) { print "Query= ", $result->query_name, "Hit= ", $hit->name, "Length= ", $hsp->length('total'), "Percent_id= ", $hsp->percent_identity, "Subject=", $hsp->hit_string,"\n"; } } } } } } Again, a big thanks in advance :) All the best, Tim On Thu, Nov 26, 2009 at 4:52 PM, Tim wrote: > Hey Mark, > > thanks for the answer > > On 25.11.2009 20:21, Mark A. Jensen wrote: > > whoops: change the following line: > > my $blio = Bio::SearchIO->new( -file => $qid.".bls", -format=>'blast' ); > > > > to > > > > my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", -format=>'blast' ); > > > > (I always forget that...) > > MAJ > > > > ----- Original Message ----- From: "Mark A. Jensen" > > To: "Tim" ; > > Sent: Wednesday, November 25, 2009 1:20 PM > > Subject: Re: [Bioperl-l] How to parse BLAST output - all hits of each > > queryinnew file > > > > > >> hey Tim-- > >> > >> Sound like you need to go about collecting your queries inside out: > >> > >> my %hits_by_query; > >> for ($result->hits) { > >> push @{$hits_by_query{$hit->name}} $hit; > >> } > >> > >> I believe now each hash element, keyed by the query name, will contain > >> an arrayref to the set of hits assoc with that query. > >>> From here, I believe > >> > >> use Bio::Search::Result::BlastResult; > >> use Bio::SearchIO; > >> > >> foreach my $qid ( keys %hits_by_query ) { > >> my $result = Bio::Search::Result::BlastResult->new(); > >> $result->add_hit($_) for ( @{$hits_by_query{$qid}} ); > >> my $blio = Bio::SearchIO->new( -file => $qid.".bls", -format=>'blast' > ); > >> $blio->write_result($result); > >> } > >> > >> will do what you want. > >> > >> hope this helps - > >> Mark > >> > >> ----- Original Message ----- From: "Tim" > >> To: > >> Sent: Wednesday, November 25, 2009 12:40 PM > >> Subject: [Bioperl-l] How to parse BLAST output - all hits of each > >> query innew file > >> > >> > >>> Dear bioperl users, > >>> > >>> I am a real newbie and have - maybe a very trivial - question. > >>> > >>> I searched the mailing list archive and many howtos but I have not > found > >>> a concrete answer to my problem. So hopefully you can help me :) > >>> > >>> Background: I use the latest Bioperl version (installed it two weeks > >>> before). > >>> When I use Bio::Tools::Run::StandAloneBlast to BLAST one fasta file > >>> including different sequences, I get a BLAST output with many queries > >>> each having several hits / sbjcts. > >>> > >>> My problem is how to parse *all* hits of *one* query into a single new > >>> file. And this for all the queries I have in my BLAST output file. > >>> > >>> Or is it better the other way round; first to make fasta files with > only > >>> single sequences inside and BLAST each file? But how can I automize > that > >>> using Bioperl? > >>> > >>> I tried Bio::SearchIO but can only parse all queries and their > >>> respective hits in only one file... > >>> I think iteration is also necessary here, but I do not really know how > >>> to include that into Bio::SearchIO. > >>> Or do I have to use Module:Bio::Index::Blast? > >>> > >>> I can index a file (see below), but I have no idea what comes next... > >>> > >>> ###How I index a file... > >>> > >>> #!/usr/bin/perl -w > >>> > >>> $ENV{BIOPERL_INDEX_TYPE} = "SDBM_File"; > >>> > >>> use Bio::Index::Fasta; > >>> > >>> > >>> $file_name = "8_to_BLAST_two_seq_index.fasta"; > >>> $id = "48882"; > >>> $inx = Bio::Index::Fasta->new (-filename => $file_name . ".idx", > >>> -write_flag => 1); > >>> $inx->make_index($file_name); > >>> > >>> > >>> Hopefully, you can give me at least hints what to look for. > >>> > >>> A big THANKS in advance! > >>> > >>> Cheers, > >>> > >>> Tim > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >>> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > > > > Tim K?hler MPI for Terrestrial Microbiology Karl-von-Frisch-Stra?e D-35043 Marburg / Germany Email: koehlerd at mpi-marburg.mpg.de Phone: +49 6421 178-740 Fax: +49 6421 178-999 _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Sun Nov 29 03:32:42 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sat, 28 Nov 2009 22:32:42 -0500 Subject: [Bioperl-l] HOWTO copyright policy vs FDL on wiki Message-ID: <9EC73CA501BD45BA912F2D77954D6CD7@NewLife> The HOWTOs appear to have a more restrictive copyright than FDL-- in particular, the blurb at the bottom of the HOWTO page asks users to use the documents for personal use only. I'm for this; I think we should therefore have some explicit license for these that specifies this kind of restriction, and then express that on each howto and in BioPerl:Copyright. Any thoughts on the right license and whether this is a good plan? MAJ From florent.angly at gmail.com Sun Nov 29 03:47:45 2009 From: florent.angly at gmail.com (Florent Angly) Date: Sat, 28 Nov 2009 19:47:45 -0800 Subject: [Bioperl-l] More general Bio::Assembly::Contig question (was Bio::Tools::Run::Cap3 usage question) In-Reply-To: <56be91b60911270335s3a50ab0cpb03aabb6660f81dc@mail.gmail.com> References: <56be91b60911270335s3a50ab0cpb03aabb6660f81dc@mail.gmail.com> Message-ID: <4B11EEE1.8070907@gmail.com> Hi Paolo, The aligned reads of a contig are stored in Bio::Assembly::Contigs->{_elem}{READ_NAME}{_seq}. To implement a slice() method, you could retrieve the reads using get_seq_ids(), get_seq_by_name() or get_seq_by_pos(). To retrieve the position of an aligned read in the contig, use get_seq_coord() which returns a Bio::SeqFeature::Generic object (from Bio::Assembly::Contigs->{_elem}{READ_NAME}{_feat}{_aligned_coord:READ_NAME}) on which you can call the start() and end() methods. I'm not entirely sure what Bio::Assembly::Contigs->{_elem}{READ_NAME}{_feat}{_align_clipping:READ_NAME} and {_quality_clipping:READ_NAME} are. I believe that they represent the clear range of the read/contig. Hope it helps, Florent Paolo Pavan wrote: > Dear Florent, > Thank you for your kind answer and for your efforts spent in this module. > Since you are working on these topics I would like to seize the day > and put you some questions about some doubts I have in mind, if you > agree, of course :-) > Some times ago I tried to work with bioperl, loading the data from an > ACE file originated by Newbler; my need was to extract part of the > contig like an alignment of reads and I tought to do it with a slice() > method, since I saw Bio::Assembly::Contig implements Bio::AlignI > interface. Unfortunately I realize that this interface is inherited > but not implemented. > I tried to hack it by adding a slice method which would act on a > Bio::Alignment created from the array of LocatableSeqs representing > the reads. > > This is the question: > If I'm not wrong (please correct me if yes), Bio::Assembly::Contig > class stores reads informations in: > Bio::Assembly::Contigs->{_elem}{READ_NAME}{_feat}{ > _align_clipping:READ_NAME} > _aligned_coord:READ_NAME} > _quality_clipping:READ_NAME} > > Anyone of these 3 features _align_clipping, _aligned_coord, > _quality_clipping, contains a Bio::SeqFeature::Generic, which of them > is more suitable to the purpose expressed before, the slice method? > And more, If you apologize me for being too long, is consequently to > the previous: I don't have perfectly clear the purpose of this 3 > feature per read, can you explain it? > > Really thanks you for the time you would spend. > Bye bye, > Paolo From bimber at wisc.edu Sun Nov 29 05:31:25 2009 From: bimber at wisc.edu (Ben Bimber) Date: Sat, 28 Nov 2009 23:31:25 -0600 Subject: [Bioperl-l] using bioperl to compare sequences Message-ID: <9f985cdc0911282131l350bc525gd9ad4717c101ac63@mail.gmail.com> Hello, I have a couple years programming experience, but am reasonably new to perl and extremely new to bioperl. I have been reading through the bioperl documentation and am trying to understand the best way to approach a particular problem. I'm hoping someone could offer some tips and point me in the right direction. If someone has solved this sort of problem before, i'd prefer not to reinvent things. Here's what I'm trying to do: Our lab generates mRNA sequence data, consisting of alleles of a given gene or genes I want to compare each of these sequences against a reference using BLAST or clustalw (will need the ability to choose at run time) Take the result of this alignment, then record positions of difference between the experimental sequence and reference sequence (SNPs) Translate the corresponding AA change(s) associated with each SNP. There can be overlapping ORFs. I see that bioperl has modules for BLAST and clustal. I've also been looking at the modules under variation. I havent fully wrapped my head around them, but they look to be what i'd use for SNP detection. has anyone has written code to perform similar things and if so, would you be willing to share specific examples? Anything concrete to see exactly how these modules operate would be extremely helpful. Thanks in advance for any tips or help. From jason at bioperl.org Sun Nov 29 15:54:53 2009 From: jason at bioperl.org (Jason Stajich) Date: Sun, 29 Nov 2009 07:54:53 -0800 Subject: [Bioperl-l] How to parse BLAST output - all hits of eachqueryinnew file In-Reply-To: <21BFD947CEEF43CCAC8AFFDB7A064A49@NewLife> References: <4B0D6C24.2080308@gmail.com><53DE480F205E42CE8D2B9421592AAF0E@NewLife><815D2A47BC9C4D89B8DEF0B10DA9EAF8@NewLife><4B0EA44D.2050507@gmail.com> <21BFD947CEEF43CCAC8AFFDB7A064A49@NewLife> Message-ID: <897A8DB4-AF29-4601-A1E5-9A04D9D8C151@bioperl.org> or while( my $hit = $result->next_hit ) { } On Nov 28, 2009, at 6:55 PM, Mark A. Jensen wrote: > Hi Tim-- > There's a bug in my code; should be > for my $hit ($result->hits) { > ... > } > and you're right about the comma. My bad. > > But I don't think you need this-- you're already looping over your > query sequences and doing blastn on each one. So in the middle of > your loop, you can simply write the blast result that you got: > > my $blio = Bio::SearchIO->new( -file => ">".$query->id.".bls", - > format=>"blast" ); > $blio->write_result($result); > > and forget about the foreach my $qid loop entirely. > > The files should show up in the directory from which you're > running the script. > cheers, MAJ > > > > ----- Original Message ----- From: "Tim Koehler" > > To: > Sent: Thursday, November 26, 2009 11:02 AM > Subject: Re: [Bioperl-l] How to parse BLAST output - all hits of > eachqueryinnew file > > > ups, sent too early... > > Hey Mark, > > thanks for the answer. But I am still struggling, especially where > to put in > your code. > > Here ist the code I have, so far: > > #!/usr/bin/perl -w > > ### should I put your code here as push is a perl command? > my %hits_by_query; > for ($result->hits) { > ### I inserted a comma after name}}; if there is no comma, there was > the > error: Scalar found where operator expected at > 12_BLAST_two_sequence_each_query_one_file.PL line7, near "} $hit" > ### (Missing operator before $hit?) > ###Useless use of push with no values at > 12_BLAST_two_sequence_each_query_one_file.PL line 7. > ###syntax error at 12_BLAST_two_sequence_each_query_one_file.PL line > 7, near > "} $hit" > ###BEGIN not safe after errors--compilation aborted at > 12_BLAST_two_sequence_each_query_one_file.PL line 13. > push @{$hits_by_query{$hit->name}}, $hit; > ###here, every time this terror appears: Name "main::result" used > only once: > possible typo at 12_BLAST_two_sequence_each_query_one_file.PL line 5. > ###error: Can't call method "hits" on an undefined value at > 12_BLAST_two_sequence_each_query_one_file.PL line 5. > } > > > use strict; > use Bio::Tools::Run::StandAloneBlast; > use Bio::SeqIO; > use Bio::SearchIO; > use Bio::Search::Result::BlastResult; > > my $Seq_in = Bio::SeqIO->new ( > -file => > "/home/koehler/Programs/for_BLAST/BLAST_Pipeline/ > 1_to_BLAST_two_seq.fasta", > -format => 'fasta' > ); > while (my $query = $Seq_in->next_seq()) { > my $factory = Bio::Tools::Run::StandAloneBlast->new( > 'program' => 'blastn', > 'database' => '/home/koehler/Programs/for_BLAST/BLAST_Pipeline/ > 3_BLAST_db', > _READMETHOD => "Blast" > ); > > my $blast_report = $factory->blastall($query); > > ### Should I need to use a module? are the commands here at the right > position? errors, e.g., Global symbol "$hit" requires explicit > package name > #my %hits_by_query; > #for ($result->hits) { > ### inserted comma after name}} > # push @{$hits_by_query{$hit->name}}, $hit; > #} > > foreach my $qid ( keys %hits_by_query ) { > my $result = Bio::Search::Result::BlastResult->new(); > $result->add_hit($_) for ( @{$hits_by_query{$qid}} ); > my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", - > format=>'blast' ); > $blio->write_result($result); > } > > ###where are the files stored? what is their name. Sorry, but I > cannot get > behind that :( > > while( my $result = $blast_report->next_result ) { > ## $result is a Bio::Search::Result::ResultI compliant object > while( my $hit = $result->next_hit ) { > ## $hit is a Bio::Search::Hit::HitI compliant object > while( my $hsp = $hit->next_hsp ) { > ## $hsp is a Bio::Search::HSP::HSPI compliant object > if( $hsp->length('total') > 50 ) { > if ( $hsp->percent_identity >= 75 ) { > print "Query= ", $result->query_name, > "Hit= ", $hit->name, > "Length= ", $hsp->length('total'), > "Percent_id= ", $hsp->percent_identity, > "Subject=", $hsp->hit_string,"\n"; > } > } > } > } > } > } > > Again, a big thanks in advance :) > > All the best, > > Tim > > > On Thu, Nov 26, 2009 at 4:52 PM, Tim wrote: > >> Hey Mark, >> >> thanks for the answer >> >> On 25.11.2009 20:21, Mark A. Jensen wrote: >> > whoops: change the following line: >> > my $blio = Bio::SearchIO->new( -file => $qid.".bls", - >> format=>'blast' ); >> > >> > to >> > >> > my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", - >> format=>'blast' ); >> > >> > (I always forget that...) >> > MAJ >> > >> > ----- Original Message ----- From: "Mark A. Jensen" > > >> > To: "Tim" ; >> > Sent: Wednesday, November 25, 2009 1:20 PM >> > Subject: Re: [Bioperl-l] How to parse BLAST output - all hits of >> each >> > queryinnew file >> > >> > >> >> hey Tim-- >> >> >> >> Sound like you need to go about collecting your queries inside >> out: >> >> >> >> my %hits_by_query; >> >> for ($result->hits) { >> >> push @{$hits_by_query{$hit->name}} $hit; >> >> } >> >> >> >> I believe now each hash element, keyed by the query name, will >> contain >> >> an arrayref to the set of hits assoc with that query. >> >>> From here, I believe >> >> >> >> use Bio::Search::Result::BlastResult; >> >> use Bio::SearchIO; >> >> >> >> foreach my $qid ( keys %hits_by_query ) { >> >> my $result = Bio::Search::Result::BlastResult->new(); >> >> $result->add_hit($_) for ( @{$hits_by_query{$qid}} ); >> >> my $blio = Bio::SearchIO->new( -file => $qid.".bls", - >> format=>'blast' >> ); >> >> $blio->write_result($result); >> >> } >> >> >> >> will do what you want. >> >> >> >> hope this helps - >> >> Mark >> >> >> >> ----- Original Message ----- From: "Tim" >> >> To: >> >> Sent: Wednesday, November 25, 2009 12:40 PM >> >> Subject: [Bioperl-l] How to parse BLAST output - all hits of each >> >> query innew file >> >> >> >> >> >>> Dear bioperl users, >> >>> >> >>> I am a real newbie and have - maybe a very trivial - question. >> >>> >> >>> I searched the mailing list archive and many howtos but I have >> not >> found >> >>> a concrete answer to my problem. So hopefully you can help me :) >> >>> >> >>> Background: I use the latest Bioperl version (installed it two >> weeks >> >>> before). >> >>> When I use Bio::Tools::Run::StandAloneBlast to BLAST one fasta >> file >> >>> including different sequences, I get a BLAST output with many >> queries >> >>> each having several hits / sbjcts. >> >>> >> >>> My problem is how to parse *all* hits of *one* query into a >> single new >> >>> file. And this for all the queries I have in my BLAST output >> file. >> >>> >> >>> Or is it better the other way round; first to make fasta files >> with >> only >> >>> single sequences inside and BLAST each file? But how can I >> automize >> that >> >>> using Bioperl? >> >>> >> >>> I tried Bio::SearchIO but can only parse all queries and their >> >>> respective hits in only one file... >> >>> I think iteration is also necessary here, but I do not really >> know how >> >>> to include that into Bio::SearchIO. >> >>> Or do I have to use Module:Bio::Index::Blast? >> >>> >> >>> I can index a file (see below), but I have no idea what comes >> next... >> >>> >> >>> ###How I index a file... >> >>> >> >>> #!/usr/bin/perl -w >> >>> >> >>> $ENV{BIOPERL_INDEX_TYPE} = "SDBM_File"; >> >>> >> >>> use Bio::Index::Fasta; >> >>> >> >>> >> >>> $file_name = "8_to_BLAST_two_seq_index.fasta"; >> >>> $id = "48882"; >> >>> $inx = Bio::Index::Fasta->new (-filename => $file_name . ".idx", >> >>> -write_flag => 1); >> >>> $inx->make_index($file_name); >> >>> >> >>> >> >>> Hopefully, you can give me at least hints what to look for. >> >>> >> >>> A big THANKS in advance! >> >>> >> >>> Cheers, >> >>> >> >>> Tim >> >>> _______________________________________________ >> >>> Bioperl-l mailing list >> >>> Bioperl-l at lists.open-bio.org >> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >>> >> >>> >> >> >> >> _______________________________________________ >> >> Bioperl-l mailing list >> >> Bioperl-l at lists.open-bio.org >> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> >> >> > >> >> Tim K?hler > MPI for Terrestrial Microbiology > Karl-von-Frisch-Stra?e > D-35043 Marburg / Germany > > Email: koehlerd at mpi-marburg.mpg.de > Phone: +49 6421 178-740 > Fax: +49 6421 178-999 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From suzi at berkeleybop.org Mon Nov 30 04:03:09 2009 From: suzi at berkeleybop.org (Suzanna Lewis) Date: Sun, 29 Nov 2009 20:03:09 -0800 Subject: [Bioperl-l] [DAS] DAS workshop 7th-9th April 2010 In-Reply-To: References: Message-ID: <3AD3C819-4BAA-4D90-B141-9611F48C5CAD@ berkeleybop.org> I/we (Gregg) would be interested in attending. We'd present an update on the collaborative, web-based version of Apollo. We will be working with Ian Holmes and Mitch Skinner using JBrowse for basic display. -S On Nov 26, 2009, at 6:57 AM, Jonathan Warren wrote: > We are considering running a Distributed Annotation System workshop here at the Sanger/EBI in the UK subject to decent demand. > The workshop will be held from Wednesday 7th-Friday 9th April 2010. If you would be interested in attending either to present or just take part > then please email me jw12 at sanger.ac.uk > > The format of the workshop is likely to be similar to last years (1st day for beginners, 2nd for both beginners and advanced users, 3rd day for advanced), information for which can be found here: > http://www.dasregistry.org/course.jsp > > If you would like to present then please send a short summary of what you would like to talk about. > > Thanks > > Jonathan. > > Jonathan Warren > Senior Developer and DAS coordinator > jw12 at sanger.ac.uk > > > > > > > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome ResearchLimited, a charity registered in England with number 1021457 and acompany registered in England with number 2742969, whose registeredoffice is 215 Euston Road, London, NW1 2BE._______________________________________________ > DAS mailing list > DAS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das > From maj at fortinbras.us Mon Nov 30 14:31:27 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 30 Nov 2009 09:31:27 -0500 Subject: [Bioperl-l] HOWTO copyright policy vs FDL on wiki In-Reply-To: <81B3C4A1-9F14-4FF9-A4AF-F7E90817A2F1@verizon.net> References: <9EC73CA501BD45BA912F2D77954D6CD7@NewLife> <81B3C4A1-9F14-4FF9-A4AF-F7E90817A2F1@verizon.net> Message-ID: <513F1C824EF84974993A76F0CC719CDF@NewLife> Well, it has a history, Jason's point. So the question could be: "is this still a valid issue"? A while back, a user on the wiki, with natural and good intentions, removed the authorship and revision info from a couple of the HOWTOs; it is more wiki-like, after all. But Chris had some objections to that, which I seconded, mainly on the basis of the special status that seems implied by the copyright note on the HOWTO page. I also think that the nature of the howto is somewhat different from other info on the site -- that developers themselves put a lot of time in to explaining how to use their modules, and that in this world where devs get paid by recognition, it is a reasonable thing to allow this extra horn-tooting. Now, that is a policy that could be completely separable from the issue of copyright. However, devs may also get paid by using their materials in teaching seminars. The dilemma would be that people who like to use the wiki are people who like to share, and so it feels unnatural to withhold from the community the materials they develop, but people who like to share also like to eat and wear shoes... so I'm interested in everyone's thoughts about it. ----- Original Message ----- From: "Brian Osborne" To: "Mark A. Jensen" Cc: "Chris Fields" ; "Jason Stajich" ; "bioperl List" Sent: Monday, November 30, 2009 9:16 AM Subject: Re: [Bioperl-l] HOWTO copyright policy vs FDL on wiki > Mark, > > Let me ask you a question, and don't take this question as an implicit > criticism of your suggestion, it is not. Why would you want this more > restrictive copyright? > > Brian O. > > On Nov 28, 2009, at 10:32 PM, Mark A. Jensen wrote: > >> The HOWTOs appear to have a more restrictive copyright >> than FDL-- in particular, the blurb at the bottom of the >> HOWTO page asks users to use the documents for personal >> use only. I'm for this; I think we should therefore have some >> explicit license for these that specifies this kind of restriction, >> and then express that on each howto and in BioPerl:Copyright. >> Any thoughts on the right license and whether this is a good plan? >> MAJ >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From bosborne11 at verizon.net Mon Nov 30 15:15:32 2009 From: bosborne11 at verizon.net (Brian Osborne) Date: Mon, 30 Nov 2009 10:15:32 -0500 Subject: [Bioperl-l] HOWTO copyright policy vs FDL on wiki In-Reply-To: <513F1C824EF84974993A76F0CC719CDF@NewLife> References: <9EC73CA501BD45BA912F2D77954D6CD7@NewLife> <81B3C4A1-9F14-4FF9-A4AF-F7E90817A2F1@verizon.net> <513F1C824EF84974993A76F0CC719CDF@NewLife> Message-ID: <54671455-A02C-4139-8C39-AC17B50D5CE6@verizon.net> Mark, I have no objection to a more restrictive copyright, and I also have no objection to using FDL, or things like it. Brian O. On Nov 30, 2009, at 9:31 AM, Mark A. Jensen wrote: > Well, it has a history, Jason's point. So the question could > be: "is this still a valid issue"? A while back, a user on the wiki, > with natural and good intentions, removed the authorship and revision > info from a couple of the HOWTOs; it is more wiki-like, > after all. But Chris had some objections to that, which I > seconded, mainly on the basis of the special status that > seems implied by the copyright note on the HOWTO > page. I also think that the nature of the howto is somewhat > different from other info on the site -- that developers themselves > put a lot of time in to explaining how to use their modules, and > that in this world where devs get paid by recognition, it is a > reasonable > thing to allow this extra horn-tooting. Now, that is a policy > that could be completely separable from the issue of copyright. > However, devs may also get paid by using their materials in teaching > seminars. The dilemma would be that people who like to use the > wiki are people who like to share, and so it feels unnatural to > withhold from the community the materials they develop, but > people who like to share also like to eat and wear shoes... > so I'm interested in everyone's thoughts about it. > ----- Original Message ----- From: "Brian Osborne" > > To: "Mark A. Jensen" > Cc: "Chris Fields" ; "Jason Stajich" >; "bioperl List" > Sent: Monday, November 30, 2009 9:16 AM > Subject: Re: [Bioperl-l] HOWTO copyright policy vs FDL on wiki > > >> Mark, >> >> Let me ask you a question, and don't take this question as an >> implicit criticism of your suggestion, it is not. Why would you >> want this more restrictive copyright? >> >> Brian O. >> >> On Nov 28, 2009, at 10:32 PM, Mark A. Jensen wrote: >> >>> The HOWTOs appear to have a more restrictive copyright >>> than FDL-- in particular, the blurb at the bottom of the >>> HOWTO page asks users to use the documents for personal >>> use only. I'm for this; I think we should therefore have some >>> explicit license for these that specifies this kind of restriction, >>> and then express that on each howto and in BioPerl:Copyright. >>> Any thoughts on the right license and whether this is a good plan? >>> MAJ >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > From bosborne11 at verizon.net Mon Nov 30 14:16:07 2009 From: bosborne11 at verizon.net (Brian Osborne) Date: Mon, 30 Nov 2009 09:16:07 -0500 Subject: [Bioperl-l] HOWTO copyright policy vs FDL on wiki In-Reply-To: <9EC73CA501BD45BA912F2D77954D6CD7@NewLife> References: <9EC73CA501BD45BA912F2D77954D6CD7@NewLife> Message-ID: <81B3C4A1-9F14-4FF9-A4AF-F7E90817A2F1@verizon.net> Mark, Let me ask you a question, and don't take this question as an implicit criticism of your suggestion, it is not. Why would you want this more restrictive copyright? Brian O. On Nov 28, 2009, at 10:32 PM, Mark A. Jensen wrote: > The HOWTOs appear to have a more restrictive copyright > than FDL-- in particular, the blurb at the bottom of the > HOWTO page asks users to use the documents for personal > use only. I'm for this; I think we should therefore have some > explicit license for these that specifies this kind of restriction, > and then express that on each howto and in BioPerl:Copyright. > Any thoughts on the right license and whether this is a good plan? > MAJ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Mon Nov 30 17:41:44 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 30 Nov 2009 12:41:44 -0500 Subject: [Bioperl-l] How to parse BLAST output - all hits of each queryinnew file In-Reply-To: References: <4B0D6C24.2080308@gmail.com> <53DE480F205E42CE8D2B9421592AAF0E@NewLife> <815D2A47BC9C4D89B8DEF0B10DA9EAF8@NewLife> <4B0EA44D.2050507@gmail.com> <18DF7D20DFEC044098A1062202F5FFF32B630E6C53@exchsth.agresearch.co.nz> <52D67F20A9CB4953B86FF794ADE0BE96@NewLife> <18DF7D20DFEC044098A1062202F5FFF32B630E6D05@exchsth.agresearch.co.nz> Message-ID: <8C288FEF9CEB4055B0CDD19267FBA26C@NewLife> thanks Tim! corrected (I hope) in r16432... MAJ ----- Original Message ----- From: Tim Koehler To: Smithies, Russell Cc: Mark A. Jensen ; bioperl-l at lists.open-bio.org Sent: Monday, November 30, 2009 12:23 PM Subject: Re: [Bioperl-l] How to parse BLAST output - all hits of each queryinnew file Hello everybody, thanks a lot for the overwhelming answers! All these codes are different flavors and worked all. For me the added code works the best. But I think I found a bug in ...Bio/SearchIO/blast.pm. There the DEFAULT_BLAST_... variable is set to Bio::Search::Writer::HitTableWriter instead of Bio::SearchIO::Writer::HitTableWriter. This variable I changed also to HTMLResultWriter and others. So again: THANKS for the support! Cheers, Tim #!/usr/bin/perl -w use strict; use Bio::Tools::Run::StandAloneBlast; use Bio::SeqIO; use Bio::SearchIO; ### add here the writer you want use Bio::SearchIO::Writer::HitTableWriter; use Bio::Search::Result::BlastResult; use Data::Dumper; my $Seq_in = Bio::SeqIO->new( -file => "/home/koehler/Programs/for_BLAST/1_to_BLAST_two_seq.fasta", -format => "fasta" ); while ( my $query = $Seq_in->next_seq() ) { warn "Processing ",$query->id, "\n"; my $factory = Bio::Tools::Run::StandAloneBlast->new( program => "blastn", database => "/home/koehler/Programs/for_BLAST/BLAST_Pipeline/3_BLAST_db", _READMETHOD => "Blast" ); my $blast_report = $factory->blastall($query); sleep 5; # just write the result we got for this query into a #new blast-formatted file...named after the id of the query seq... my $result = $blast_report->next_result; my $blio = Bio::SearchIO->new( -file => ">".$query->id.".bls", -format => "blast" ) or die $!; $blio->write_result($result); # below, just looking at the current blast result ###this does not appear in the output files while ( my $result = $blast_report->next_result ) { ## $result is a Bio::Search::Result::ResultI compliant object while ( my $hit = $result->next_hit ) { ## $hit is a Bio::Search::Hit::HitI compliant object while ( my $hsp = $hit->next_hsp ) { ## $hsp is a Bio::Search::HSP::HSPI compliant object if ( $hsp->length('total') > 50 ) { if ( $hsp->percent_identity >= 75 ) { print "Query= ", $result->query_name, "Hit= ", $hit->name, "Length= ", $hsp->length('total'), "Percent_id= ", $hsp->percent_identity, "Subject=", $hsp->hit_string, "\n"; } } } } } } On Sun, Nov 29, 2009 at 11:29 PM, Smithies, Russell wrote: Changed it to a generic result and added a writer and it seems tio work: foreach my $qid ( keys %hits_by_query ) { warn "qid = $qid\n"; my $res = Bio::Search::Result::GenericResult->new(-algorithm => "blastn") or die $!; # print Dumper $res; foreach my $h ( @{ $hits_by_query{$qid} } ){ warn "adding hit ", $h->name, "\n"; $res->add_hit($h) if defined($h); } my $writerhtml = Bio::SearchIO::Writer::HTMLResultWriter->new(); my $blio = Bio::SearchIO->new(-writer => $writerhtml, -file => ">$qid\.bls\.html", -format => "blast" ) or die $!; $blio->write_result($res); } From: Mark A. Jensen [mailto:maj at fortinbras.us] Sent: Monday, 30 November 2009 10:19 a.m. To: Smithies, Russell; 'Tim Koehler' Subject: Re: [Bioperl-l] How to parse BLAST output - all hits of each queryinnew file My thought here was that since Tim's already going one at a time thru his queries, my scrap was not really necessary: use strict; use Bio::Tools::Run::StandAloneBlast; use Bio::SeqIO; use Bio::SearchIO; use Bio::Search::Result::BlastResult; use Data::Dumper; my $Seq_in = Bio::SeqIO->new( -file => "sequences.fasta", -format => "fasta" ); while ( my $query = $Seq_in->next_seq() ) { warn "Processing ",$query->id, "\n"; my $factory = Bio::Tools::Run::StandAloneBlast->new( program => "blastn", database => "/data/databases/flatfile/illuminati_blastdata/nt", _READMETHOD => "Blast" ); my $blast_report = $factory->blastall($query); sleep 5; # just write the result we got for this query into a #new blast-formatted file...named after the id of the query seq... my $result = $blast_report->next_result; my $blio = Bio::SearchIO->new( -file => ">".$query->id.".bls", -format => "blast" ) or die $!; $blio->write_result($result); # below, just looking at the current blast result while ( my $result = $blast_report->next_result ) { ## $result is a Bio::Search::Result::ResultI compliant object while ( my $hit = $result->next_hit ) { ## $hit is a Bio::Search::Hit::HitI compliant object while ( my $hsp = $hit->next_hsp ) { ## $hsp is a Bio::Search::HSP::HSPI compliant object if ( $hsp->length('total') > 50 ) { if ( $hsp->percent_identity >= 75 ) { print "Query= ", $result->query_name, "Hit= ", $hit->name, "Length= ", $hsp->length('total'), "Percent_id= ", $hsp->percent_identity, "Subject=", $hsp->hit_string, "\n"; } } } } } } ----- Original Message ----- From: Smithies, Russell To: 'Tim Koehler' ; 'maj at fortinbras.us' Sent: Sunday, November 29, 2009 3:58 PM Subject: RE: [Bioperl-l] How to parse BLAST output - all hits of each queryinnew file Hi Tim With various people writing the ?howtos? and other docs, the examples are bound to have differing names for the variables used but as long as you?re consistent, it should all fit together. I think I?ve almost got your code working, just getting errors from Bio::Search::Result::BlastResult which I?m not entirely sure how to use. Perhaps Mark can get this bit going? --Russell =============================== use strict; use Bio::Tools::Run::StandAloneBlast; use Bio::SeqIO; use Bio::SearchIO; use Bio::Search::Result::BlastResult; use Data::Dumper; my $Seq_in = Bio::SeqIO->new( -file => "sequences.fasta", -format => "fasta" ); while ( my $query = $Seq_in->next_seq() ) { warn "Processing ",$query->id, "\n"; my $factory = Bio::Tools::Run::StandAloneBlast->new( program => "blastn", database => "/data/databases/flatfile/illuminati_blastdata/nt", _READMETHOD => "Blast" ); my $blast_report = $factory->blastall($query); sleep 5; my %hits_by_query; while ( my $result = $blast_report->next_result ) { foreach my $hit ( $result->hits ) { warn "Pushed a hit for ",$hit->name, "\n"; push( @{ $hits_by_query{ $hit->name } }, $hit ); } } foreach my $qid ( keys %hits_by_query ) { warn "qid = $qid\n"; my $res = Bio::Search::Result::BlastResult->new() or die $!; print Dumper $res; foreach my $h ( @{ $hits_by_query{$qid} } ){ warn "adding hit ", $h->name, "\n"; $res->add_hit($h) if defined($h); } my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", -format => "blast" ) or die $!; $blio->write_result($res); } while ( my $result = $blast_report->next_result ) { ## $result is a Bio::Search::Result::ResultI compliant object while ( my $hit = $result->next_hit ) { ## $hit is a Bio::Search::Hit::HitI compliant object while ( my $hsp = $hit->next_hsp ) { ## $hsp is a Bio::Search::HSP::HSPI compliant object if ( $hsp->length('total') > 50 ) { if ( $hsp->percent_identity >= 75 ) { print "Query= ", $result->query_name, "Hit= ", $hit->name, "Length= ", $hsp->length('total'), "Percent_id= ", $hsp->percent_identity, "Subject=", $hsp->hit_string, "\n"; } } } } } } =============================== From: Tim Koehler [mailto:timbourine81 at googlemail.com] Sent: Friday, 27 November 2009 10:24 p.m. To: Smithies, Russell; maj at fortinbras.us Subject: Re: [Bioperl-l] How to parse BLAST output - all hits of each queryinnew file Hey guys, please, do not get me wrong that I wanted to put the workload on you. So far I only found the HowTo's but in there in some way the language changed with time (e.g. $in to $Seq_in) or some things I simply could not find. Now I got a tip where else to search: the scrapbook and deobfuscator. I immediately will have a look at that. This is the first time for me touching linux / perl commands; that's why I thought after several days of trial and many errors ;) asking the mailinglist. I was very happy about your fast answers! Cheers and a nice weekend, Tim On Thu, Nov 26, 2009 at 5:02 PM, Tim Koehler wrote: ups, sent too early... Hey Mark, thanks for the answer. But I am still struggling, especially where to put in your code. Here ist the code I have, so far: #!/usr/bin/perl -w ### should I put your code here as push is a perl command? my %hits_by_query; for ($result->hits) { ### I inserted a comma after name}}; if there is no comma, there was the error: Scalar found where operator expected at 12_BLAST_two_sequence_each_query_one_file.PL line7, near "} $hit" ### (Missing operator before $hit?) ###Useless use of push with no values at 12_BLAST_two_sequence_each_query_one_file.PL line 7. ###syntax error at 12_BLAST_two_sequence_each_query_one_file.PL line 7, near "} $hit" ###BEGIN not safe after errors--compilation aborted at 12_BLAST_two_sequence_each_query_one_file.PL line 13. push @{$hits_by_query{$hit->name}}, $hit; ###here, every time this terror appears: Name "main::result" used only once: possible typo at 12_BLAST_two_sequence_each_query_one_file.PL line 5. ###error: Can't call method "hits" on an undefined value at 12_BLAST_two_sequence_each_query_one_file.PL line 5. } use strict; use Bio::Tools::Run::StandAloneBlast; use Bio::SeqIO; use Bio::SearchIO; use Bio::Search::Result::BlastResult; my $Seq_in = Bio::SeqIO->new ( -file => "/home/koehler/Programs/for_BLAST/BLAST_Pipeline/1_to_BLAST_two_seq.fasta", -format => 'fasta' ); while (my $query = $Seq_in->next_seq()) { my $factory = Bio::Tools::Run::StandAloneBlast->new( 'program' => 'blastn', 'database' => '/home/koehler/Programs/for_BLAST/BLAST_Pipeline/3_BLAST_db', _READMETHOD => "Blast" ); my $blast_report = $factory->blastall($query); ### Should I need to use a module? are the commands here at the right position? errors, e.g., Global symbol "$hit" requires explicit package name #my %hits_by_query; #for ($result->hits) { ### inserted comma after name}} # push @{$hits_by_query{$hit->name}}, $hit; #} foreach my $qid ( keys %hits_by_query ) { my $result = Bio::Search::Result::BlastResult->new(); $result->add_hit($_) for ( @{$hits_by_query{$qid}} ); my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", -format=>'blast' ); $blio->write_result($result); } ###where are the files stored? what is their name. Sorry, but I cannot get behind that :( while( my $result = $blast_report->next_result ) { ## $result is a Bio::Search::Result::ResultI compliant object while( my $hit = $result->next_hit ) { ## $hit is a Bio::Search::Hit::HitI compliant object while( my $hsp = $hit->next_hsp ) { ## $hsp is a Bio::Search::HSP::HSPI compliant object if( $hsp->length('total') > 50 ) { if ( $hsp->percent_identity >= 75 ) { print "Query= ", $result->query_name, "Hit= ", $hit->name, "Length= ", $hsp->length('total'), "Percent_id= ", $hsp->percent_identity, "Subject=", $hsp->hit_string,"\n"; } } } } } } Again, a big thanks in advance :) All the best, Tim On Thu, Nov 26, 2009 at 4:52 PM, Tim wrote: Hey Mark, thanks for the answer On 25.11.2009 20:21, Mark A. Jensen wrote: > whoops: change the following line: > my $blio = Bio::SearchIO->new( -file => $qid.".bls", -format=>'blast' ); > > to > > my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", -format=>'blast' ); > > (I always forget that...) > MAJ > > ----- Original Message ----- From: "Mark A. Jensen" > To: "Tim" ; > Sent: Wednesday, November 25, 2009 1:20 PM > Subject: Re: [Bioperl-l] How to parse BLAST output - all hits of each > queryinnew file > > >> hey Tim-- >> >> Sound like you need to go about collecting your queries inside out: >> >> my %hits_by_query; >> for ($result->hits) { >> push @{$hits_by_query{$hit->name}} $hit; >> } >> >> I believe now each hash element, keyed by the query name, will contain >> an arrayref to the set of hits assoc with that query. >>> From here, I believe >> >> use Bio::Search::Result::BlastResult; >> use Bio::SearchIO; >> >> foreach my $qid ( keys %hits_by_query ) { >> my $result = Bio::Search::Result::BlastResult->new(); >> $result->add_hit($_) for ( @{$hits_by_query{$qid}} ); >> my $blio = Bio::SearchIO->new( -file => $qid.".bls", -format=>'blast' ); >> $blio->write_result($result); >> } >> >> will do what you want. >> >> hope this helps - >> Mark >> >> ----- Original Message ----- From: "Tim" >> To: >> Sent: Wednesday, November 25, 2009 12:40 PM >> Subject: [Bioperl-l] How to parse BLAST output - all hits of each >> query innew file >> >> >>> Dear bioperl users, >>> >>> I am a real newbie and have - maybe a very trivial - question. >>> >>> I searched the mailing list archive and many howtos but I have not found >>> a concrete answer to my problem. So hopefully you can help me :) >>> >>> Background: I use the latest Bioperl version (installed it two weeks >>> before). >>> When I use Bio::Tools::Run::StandAloneBlast to BLAST one fasta file >>> including different sequences, I get a BLAST output with many queries >>> each having several hits / sbjcts. >>> >>> My problem is how to parse *all* hits of *one* query into a single new >>> file. And this for all the queries I have in my BLAST output file. >>> >>> Or is it better the other way round; first to make fasta files with only >>> single sequences inside and BLAST each file? But how can I automize that >>> using Bioperl? >>> >>> I tried Bio::SearchIO but can only parse all queries and their >>> respective hits in only one file... >>> I think iteration is also necessary here, but I do not really know how >>> to include that into Bio::SearchIO. >>> Or do I have to use Module:Bio::Index::Blast? >>> >>> I can index a file (see below), but I have no idea what comes next... >>> >>> ###How I index a file... >>> >>> #!/usr/bin/perl -w >>> >>> $ENV{BIOPERL_INDEX_TYPE} = "SDBM_File"; >>> >>> use Bio::Index::Fasta; >>> >>> >>> $file_name = "8_to_BLAST_two_seq_index.fasta"; >>> $id = "48882"; >>> $inx = Bio::Index::Fasta->new (-filename => $file_name . ".idx", >>> -write_flag => 1); >>> $inx->make_index($file_name); >>> >>> >>> Hopefully, you can give me at least hints what to look for. >>> >>> A big THANKS in advance! >>> >>> Cheers, >>> >>> Tim >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > Tim K?hler MPI for Terrestrial Microbiology Karl-von-Frisch-Stra?e D-35043 Marburg / Germany Email: koehlerd at mpi-marburg.mpg.de Phone: +49 6421 178-740 Fax: +49 6421 178-999 -------------------------------------------------------------------------- Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. -------------------------------------------------------------------------- From timbourine81 at googlemail.com Mon Nov 30 17:23:58 2009 From: timbourine81 at googlemail.com (Tim Koehler) Date: Mon, 30 Nov 2009 18:23:58 +0100 Subject: [Bioperl-l] How to parse BLAST output - all hits of each queryinnew file In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32B630E6D05@exchsth.agresearch.co.nz> References: <4B0D6C24.2080308@gmail.com> <53DE480F205E42CE8D2B9421592AAF0E@NewLife> <815D2A47BC9C4D89B8DEF0B10DA9EAF8@NewLife> <4B0EA44D.2050507@gmail.com> <18DF7D20DFEC044098A1062202F5FFF32B630E6C53@exchsth.agresearch.co.nz> <52D67F20A9CB4953B86FF794ADE0BE96@NewLife> <18DF7D20DFEC044098A1062202F5FFF32B630E6D05@exchsth.agresearch.co.nz> Message-ID: Hello everybody, thanks a lot for the overwhelming answers! All these codes are different flavors and worked all. For me the added code works the best. But I think I found a bug in ...Bio/SearchIO/blast.pm. There the DEFAULT_BLAST_... variable is set to Bio::Search::Writer::HitTableWriter instead of Bio::SearchIO::Writer::HitTableWriter. This variable I changed also to HTMLResultWriter and others. So again: THANKS for the support! Cheers, Tim #!/usr/bin/perl -w use strict; use Bio::Tools::Run::StandAloneBlast; use Bio::SeqIO; use Bio::SearchIO; ### add here the writer you want use Bio::SearchIO::Writer::HitTableWriter; use Bio::Search::Result::BlastResult; use Data::Dumper; my $Seq_in = Bio::SeqIO->new( -file => "/home/koehler/Programs/for_BLAST/1_to_BLAST_two_seq.fasta", -format => "fasta" ); while ( my $query = $Seq_in->next_seq() ) { warn "Processing ",$query->id, "\n"; my $factory = Bio::Tools::Run::StandAloneBlast->new( program => "blastn", database => "/home/koehler/Programs/for_BLAST/BLAST_Pipeline/3_BLAST_db", _READMETHOD => "Blast" ); my $blast_report = $factory->blastall($query); sleep 5; # just write the result we got for this query into a #new blast-formatted file...named after the id of the query seq... my $result = $blast_report->next_result; my $blio = Bio::SearchIO->new( -file => ">".$query->id.".bls", -format => "blast" ) or die $!; $blio->write_result($result); # below, just looking at the current blast result ###this does not appear in the output files while ( my $result = $blast_report->next_result ) { ## $result is a Bio::Search::Result::ResultI compliant object while ( my $hit = $result->next_hit ) { ## $hit is a Bio::Search::Hit::HitI compliant object while ( my $hsp = $hit->next_hsp ) { ## $hsp is a Bio::Search::HSP::HSPI compliant object if ( $hsp->length('total') > 50 ) { if ( $hsp->percent_identity >= 75 ) { print "Query= ", $result->query_name, "Hit= ", $hit->name, "Length= ", $hsp->length('total'), "Percent_id= ", $hsp->percent_identity, "Subject=", $hsp->hit_string, "\n"; } } } } } } On Sun, Nov 29, 2009 at 11:29 PM, Smithies, Russell < Russell.Smithies at agresearch.co.nz> wrote: > Changed it to a generic result and added a writer and it seems tio work: > > > > foreach my $qid ( keys %hits_by_query ) { > > warn "qid = $qid\n"; > > my $res = Bio::Search::Result::GenericResult->new(-algorithm => > "blastn") or die $!; > > # print Dumper $res; > > foreach my $h ( @{ $hits_by_query{$qid} } ){ > > warn "adding hit ", $h->name, "\n"; > > $res->add_hit($h) if defined($h); > > } > > my $writerhtml = Bio::SearchIO::Writer::HTMLResultWriter->new(); > > my $blio = Bio::SearchIO->new(-writer => $writerhtml, -file => > ">$qid\.bls\.html", -format => "blast" ) or die $!; > > $blio->write_result($res); > > } > > > > > > *From:* Mark A. Jensen [mailto:maj at fortinbras.us] > *Sent:* Monday, 30 November 2009 10:19 a.m. > *To:* Smithies, Russell; 'Tim Koehler' > > *Subject:* Re: [Bioperl-l] How to parse BLAST output - all hits of each > queryinnew file > > > > My thought here was that since Tim's already going one at a time thru > > his queries, my scrap was not really necessary: > > > > use strict; > > use Bio::Tools::Run::StandAloneBlast; > > use Bio::SeqIO; > > use Bio::SearchIO; > > use Bio::Search::Result::BlastResult; > > > > use Data::Dumper; > > > > my $Seq_in = Bio::SeqIO->new( -file => "sequences.fasta", > > -format => "fasta" ); > > > > while ( my $query = $Seq_in->next_seq() ) { > > warn "Processing ",$query->id, "\n"; > > my $factory = > > Bio::Tools::Run::StandAloneBlast->new( > > program => "blastn", > > database => > "/data/databases/flatfile/illuminati_blastdata/nt", > > _READMETHOD => "Blast" > > ); > > > > my $blast_report = $factory->blastall($query); > > sleep 5; > > > > # just write the result we got for this query into a > > #new blast-formatted file...named after the id of the query seq... > > my $result = $blast_report->next_result; > > my $blio = Bio::SearchIO->new( -file => ">".$query->id.".bls", -format => > "blast" ) or die $!; > > $blio->write_result($result); > > > > # below, just looking at the current blast result > > while ( my $result = $blast_report->next_result ) { > > ## $result is a Bio::Search::Result::ResultI compliant object > > while ( my $hit = $result->next_hit ) { > > ## $hit is a Bio::Search::Hit::HitI compliant object > > while ( my $hsp = $hit->next_hsp ) { > > ## $hsp is a Bio::Search::HSP::HSPI compliant object > > if ( $hsp->length('total') > 50 ) { > > if ( $hsp->percent_identity >= 75 ) { > > print "Query= ", $result->query_name, > > "Hit= ", $hit->name, > > "Length= ", $hsp->length('total'), > > "Percent_id= ", $hsp->percent_identity, > > "Subject=", $hsp->hit_string, "\n"; > > } > > } > > } > > } > > } > > } > > ----- Original Message ----- > > *From:* Smithies, Russell > > *To:* 'Tim Koehler' ; 'maj at fortinbras.us'<%27maj at fortinbras.us%27> > > *Sent:* Sunday, November 29, 2009 3:58 PM > > *Subject:* RE: [Bioperl-l] How to parse BLAST output - all hits of each > queryinnew file > > > > Hi Tim > > With various people writing the ?howtos? and other docs, the examples are > bound to have differing names for the variables used but as long as you?re > consistent, it should all fit together. > > > > I think I?ve almost got your code working, just getting errors from > Bio::Search::Result::BlastResult which I?m not entirely sure how to use. > Perhaps Mark can get this bit going? > > > > --Russell > > =============================== > > > > use strict; > > use Bio::Tools::Run::StandAloneBlast; > > use Bio::SeqIO; > > use Bio::SearchIO; > > use Bio::Search::Result::BlastResult; > > > > use Data::Dumper; > > > > my $Seq_in = Bio::SeqIO->new( -file => "sequences.fasta", > > -format => "fasta" ); > > > > while ( my $query = $Seq_in->next_seq() ) { > > warn "Processing ",$query->id, "\n"; > > my $factory = > > Bio::Tools::Run::StandAloneBlast->new( > > program => "blastn", > > database => > "/data/databases/flatfile/illuminati_blastdata/nt", > > _READMETHOD => "Blast" > > ); > > > > my $blast_report = $factory->blastall($query); > > sleep 5; > > > > > > my %hits_by_query; > > > > while ( my $result = $blast_report->next_result ) { > > foreach my $hit ( $result->hits ) { > > warn "Pushed a hit for ",$hit->name, "\n"; > > push( @{ $hits_by_query{ $hit->name } }, $hit ); > > } > > } > > > > foreach my $qid ( keys %hits_by_query ) { > > warn "qid = $qid\n"; > > my $res = Bio::Search::Result::BlastResult->new() or die $!; > > print Dumper $res; > > foreach my $h ( @{ $hits_by_query{$qid} } ){ > > warn "adding hit ", $h->name, "\n"; > > $res->add_hit($h) if defined($h); > > } > > my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", -format => > "blast" ) or die $!; > > $blio->write_result($res); > > } > > > > while ( my $result = $blast_report->next_result ) { > > ## $result is a Bio::Search::Result::ResultI compliant object > > while ( my $hit = $result->next_hit ) { > > ## $hit is a Bio::Search::Hit::HitI compliant object > > while ( my $hsp = $hit->next_hsp ) { > > ## $hsp is a Bio::Search::HSP::HSPI compliant object > > if ( $hsp->length('total') > 50 ) { > > if ( $hsp->percent_identity >= 75 ) { > > print "Query= ", $result->query_name, > > "Hit= ", $hit->name, > > "Length= ", $hsp->length('total'), > > "Percent_id= ", $hsp->percent_identity, > > "Subject=", $hsp->hit_string, "\n"; > > } > > } > > } > > } > > } > > } > > =============================== > > > > *From:* Tim Koehler [mailto:timbourine81 at googlemail.com] > *Sent:* Friday, 27 November 2009 10:24 p.m. > *To:* Smithies, Russell; maj at fortinbras.us > *Subject:* Re: [Bioperl-l] How to parse BLAST output - all hits of each > queryinnew file > > > > Hey guys, > > please, do not get me wrong that I wanted to put the workload on you. So > far I only found the HowTo's but in there in some way the language changed > with time (e.g. $in to $Seq_in) or some things I simply could not find. > Now I got a tip where else to search: the scrapbook and deobfuscator. > > I immediately will have a look at that. > > This is the first time for me touching linux / perl commands; that's why I > thought after several days of trial and many errors ;) asking the > mailinglist. > > I was very happy about your fast answers! > > Cheers and a nice weekend, > > Tim > > On Thu, Nov 26, 2009 at 5:02 PM, Tim Koehler > wrote: > > ups, sent too early... > > Hey Mark, > > thanks for the answer. But I am still struggling, especially where to put > in your code. > > Here ist the code I have, so far: > > #!/usr/bin/perl -w > > ### should I put your code here as push is a perl command? > > > my %hits_by_query; > for ($result->hits) { > > ### I inserted a comma after name}}; if there is no comma, there was the > error: Scalar found where operator expected at > 12_BLAST_two_sequence_each_query_one_file.PL line7, near "} $hit" > ### (Missing operator before $hit?) > ###Useless use of push with no values at > 12_BLAST_two_sequence_each_query_one_file.PL line 7. > ###syntax error at 12_BLAST_two_sequence_each_query_one_file.PL line 7, > near "} $hit" > ###BEGIN not safe after errors--compilation aborted at > 12_BLAST_two_sequence_each_query_one_file.PL line 13. > > > push @{$hits_by_query{$hit->name}}, $hit; > > ###here, every time this terror appears: Name "main::result" used only > once: possible typo at 12_BLAST_two_sequence_each_query_one_file.PL line 5. > ###error: Can't call method "hits" on an undefined value at > 12_BLAST_two_sequence_each_query_one_file.PL line 5. > > > } > > > use strict; > use Bio::Tools::Run::StandAloneBlast; > use Bio::SeqIO; > use Bio::SearchIO; > > use Bio::Search::Result::BlastResult; > > my $Seq_in = Bio::SeqIO->new ( > -file => > "/home/koehler/Programs/for_BLAST/BLAST_Pipeline/1_to_BLAST_two_seq.fasta", > -format => 'fasta' > ); > while (my $query = $Seq_in->next_seq()) { > > > my $factory = Bio::Tools::Run::StandAloneBlast->new( > > 'program' => 'blastn', > 'database' => '/home/koehler/Programs/for_BLAST/BLAST_Pipeline/3_BLAST_db', > _READMETHOD => "Blast" > ); > > my $blast_report = $factory->blastall($query); > > ### Should I need to use a module? are the commands here at the right > position? errors, e.g., Global symbol "$hit" requires explicit package name > #my %hits_by_query; > #for ($result->hits) { > ### inserted comma after name}} > # push @{$hits_by_query{$hit->name}}, $hit; > #} > > > > foreach my $qid ( keys %hits_by_query ) { > my $result = Bio::Search::Result::BlastResult->new(); > $result->add_hit($_) for ( @{$hits_by_query{$qid}} ); > my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", -format=>'blast' ); > $blio->write_result($result); > } > > ###where are the files stored? what is their name. Sorry, but I cannot get > behind that :( > > while( my $result = $blast_report->next_result ) { > ## $result is a Bio::Search::Result::ResultI compliant object > > > while( my $hit = $result->next_hit ) { > > ## $hit is a Bio::Search::Hit::HitI compliant object > > > while( my $hsp = $hit->next_hsp ) { > > ## $hsp is a Bio::Search::HSP::HSPI compliant object > if( $hsp->length('total') > 50 ) { > if ( $hsp->percent_identity >= 75 ) { > print "Query= ", $result->query_name, > "Hit= ", $hit->name, > "Length= ", $hsp->length('total'), > "Percent_id= ", $hsp->percent_identity, > "Subject=", $hsp->hit_string,"\n"; > } > } > } > } > } > } > > Again, a big thanks in advance :) > > All the best, > > Tim > > On Thu, Nov 26, 2009 at 4:52 PM, Tim wrote: > > Hey Mark, > > thanks for the answer > > > > > On 25.11.2009 20:21, Mark A. Jensen wrote: > > whoops: change the following line: > > my $blio = Bio::SearchIO->new( -file => $qid.".bls", -format=>'blast' ); > > > > to > > > > my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", -format=>'blast' ); > > > > (I always forget that...) > > MAJ > > > > ----- Original Message ----- From: "Mark A. Jensen" > > To: "Tim" ; > > Sent: Wednesday, November 25, 2009 1:20 PM > > Subject: Re: [Bioperl-l] How to parse BLAST output - all hits of each > > queryinnew file > > > > > >> hey Tim-- > >> > >> Sound like you need to go about collecting your queries inside out: > >> > >> my %hits_by_query; > >> for ($result->hits) { > >> push @{$hits_by_query{$hit->name}} $hit; > >> } > >> > >> I believe now each hash element, keyed by the query name, will contain > >> an arrayref to the set of hits assoc with that query. > >>> From here, I believe > >> > >> use Bio::Search::Result::BlastResult; > >> use Bio::SearchIO; > >> > >> foreach my $qid ( keys %hits_by_query ) { > >> my $result = Bio::Search::Result::BlastResult->new(); > >> $result->add_hit($_) for ( @{$hits_by_query{$qid}} ); > >> my $blio = Bio::SearchIO->new( -file => $qid.".bls", -format=>'blast' > ); > >> $blio->write_result($result); > >> } > >> > >> will do what you want. > >> > >> hope this helps - > >> Mark > >> > >> ----- Original Message ----- From: "Tim" > >> To: > >> Sent: Wednesday, November 25, 2009 12:40 PM > >> Subject: [Bioperl-l] How to parse BLAST output - all hits of each > >> query innew file > >> > >> > >>> Dear bioperl users, > >>> > >>> I am a real newbie and have - maybe a very trivial - question. > >>> > >>> I searched the mailing list archive and many howtos but I have not > found > >>> a concrete answer to my problem. So hopefully you can help me :) > >>> > >>> Background: I use the latest Bioperl version (installed it two weeks > >>> before). > >>> When I use Bio::Tools::Run::StandAloneBlast to BLAST one fasta file > >>> including different sequences, I get a BLAST output with many queries > >>> each having several hits / sbjcts. > >>> > >>> My problem is how to parse *all* hits of *one* query into a single new > >>> file. And this for all the queries I have in my BLAST output file. > >>> > >>> Or is it better the other way round; first to make fasta files with > only > >>> single sequences inside and BLAST each file? But how can I automize > that > >>> using Bioperl? > >>> > >>> I tried Bio::SearchIO but can only parse all queries and their > >>> respective hits in only one file... > >>> I think iteration is also necessary here, but I do not really know how > >>> to include that into Bio::SearchIO. > >>> Or do I have to use Module:Bio::Index::Blast? > >>> > >>> I can index a file (see below), but I have no idea what comes next... > >>> > >>> ###How I index a file... > >>> > >>> #!/usr/bin/perl -w > >>> > >>> $ENV{BIOPERL_INDEX_TYPE} = "SDBM_File"; > >>> > >>> use Bio::Index::Fasta; > >>> > >>> > >>> $file_name = "8_to_BLAST_two_seq_index.fasta"; > >>> $id = "48882"; > >>> $inx = Bio::Index::Fasta->new (-filename => $file_name . ".idx", > >>> -write_flag => 1); > >>> $inx->make_index($file_name); > >>> > >>> > >>> Hopefully, you can give me at least hints what to look for. > >>> > >>> A big THANKS in advance! > >>> > >>> Cheers, > >>> > >>> Tim > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >>> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > > > > Tim K?hler > MPI for Terrestrial Microbiology > Karl-von-Frisch-Stra?e > D-35043 Marburg / Germany > > Email: koehlerd at mpi-marburg.mpg.de > Phone: +49 6421 178-740 > Fax: +49 6421 178-999 > > > > > ------------------------------ > > *Attention: *The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities to > which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ------------------------------ > > > >