From chiragmatkarbioinfo at gmail.com Mon Nov 1 02:58:55 2010 From: chiragmatkarbioinfo at gmail.com (chirag matkar) Date: Mon, 1 Nov 2010 13:58:55 +0700 Subject: [Bioperl-l] how to download PDB files using Bioperl script In-Reply-To: References: <01f901cb7203$f66e4040$e34ac0c0$%yin@ucd.ie> Message-ID: Use Perl Mechanize Module to fetch pdb data in bulk Example , Each pdb file is saved in path http://www.rcsb.org/pdb/files/1HKB.pdb use WWW::Mechanize; use Storable; $url = 'http://www.rcsb.org/pdb/files/1HKB.pdb'; $m = WWW::Mechanize->new(); $m->get($url); $c = $m->content; print $c; Just create a filehandle to fetch pdb id from text file and create a new object for each id and loop it to fetch data On Wed, Oct 27, 2010 at 11:53 PM, Christopher Bottoms wrote: > Ashwani, > > Do you need to download the files once or does this need to be automated? > > If you just need to do it once, check out > http://www.rcsb.org/pdb/download/download.do for downloading multiple > files. > > If you need to automate the process, let me know and I'll help you > figure it out. The easiest way I can think of, which I used to do, is > downloading them from the ftp site. > > Sincerely, > > Christopher Bottoms > > On Fri, Oct 22, 2010 at 11:12 AM, Jun Yin wrote: >> Hi, Ashwani, >> >> I havenot found any module in BioPerl for downloading PDB files, though >> Bio::Structure::IO::pdb can parse PDB files. >> >> However, PDB provides RESTful service >> (http://www.rcsb.org/pdb/software/rest.do). You can write perl scripts to >> batch downloading the proteins. >> >> Cheers, >> Jun Yin >> Ph.D. student in U.C.D. >> >> Bioinformatics Laboratory >> Conway Institute >> University College Dublin >> >> >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org >> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of ashwani sharma >> Sent: Thursday, October 21, 2010 9:39 AM >> To: bioperl-l at lists.open-bio.org >> Subject: [Bioperl-l] how to download PDB files using Bioperl script >> >> Hi All, >> >> >> I have around 150 pdb file names and I need to download them from Protein >> Data Bank. I wonder if someone could tell me how to do it by using Bioperl. >> >> Thanks in advance. >> >> Regards, >> Ashwani >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> __________ Information from ESET Smart Security, version of virus signature >> database 5377 (20100818) __________ >> >> The message was checked by ESET Smart Security. >> >> http://www.eset.com >> >> >> >> >> __________ Information from ESET Smart Security, version of virus signature >> database 5377 (20100818) __________ >> >> The message was checked by ESET Smart Security. >> >> http://www.eset.com >> >> >> >> __________ Information from ESET Smart Security, version of virus signature >> database 5377 (20100818) __________ >> >> The message was checked by ESET Smart Security. >> >> http://www.eset.com >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Regards, Chirag Matkar From Russell.Smithies at agresearch.co.nz Tue Nov 2 22:11:36 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Wed, 3 Nov 2010 15:11:36 +1300 Subject: [Bioperl-l] how to download PDB files using Bioperl script In-Reply-To: References: <01f901cb7203$f66e4040$e34ac0c0$%yin@ucd.ie>

Message-ID: <18DF7D20DFEC044098A1062202F5FFF3313A9B7D7C@exchsth.agresearch.co.nz> Seems a bit like overkill, what's wrong with wget? wget -i --Russell -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of chirag matkar Sent: Monday, 1 November 2010 7:59 p.m. To: bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] how to download PDB files using Bioperl script Use Perl Mechanize Module to fetch pdb data in bulk Example , Each pdb file is saved in path http://www.rcsb.org/pdb/files/1HKB.pdb use WWW::Mechanize; use Storable; $url = 'http://www.rcsb.org/pdb/files/1HKB.pdb'; $m = WWW::Mechanize->new(); $m->get($url); $c = $m->content; print $c; Just create a filehandle to fetch pdb id from text file and create a new object for each id and loop it to fetch data On Wed, Oct 27, 2010 at 11:53 PM, Christopher Bottoms wrote: > Ashwani, > > Do you need to download the files once or does this need to be automated? > > If you just need to do it once, check out > http://www.rcsb.org/pdb/download/download.do for downloading multiple > files. > > If you need to automate the process, let me know and I'll help you > figure it out. The easiest way I can think of, which I used to do, is > downloading them from the ftp site. > > Sincerely, > > Christopher Bottoms > > On Fri, Oct 22, 2010 at 11:12 AM, Jun Yin wrote: >> Hi, Ashwani, >> >> I havenot found any module in BioPerl for downloading PDB files, though >> Bio::Structure::IO::pdb can parse PDB files. >> >> However, PDB provides RESTful service >> (http://www.rcsb.org/pdb/software/rest.do). You can write perl scripts to >> batch downloading the proteins. >> >> Cheers, >> Jun Yin >> Ph.D. student in U.C.D. >> >> Bioinformatics Laboratory >> Conway Institute >> University College Dublin >> >> >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org >> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of ashwani sharma >> Sent: Thursday, October 21, 2010 9:39 AM >> To: bioperl-l at lists.open-bio.org >> Subject: [Bioperl-l] how to download PDB files using Bioperl script >> >> Hi All, >> >> >> I have around 150 pdb file names and I need to download them from Protein >> Data Bank. I wonder if someone could tell me how to do it by using Bioperl. >> >> Thanks in advance. >> >> Regards, >> Ashwani >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> __________ Information from ESET Smart Security, version of virus signature >> database 5377 (20100818) __________ >> >> The message was checked by ESET Smart Security. >> >> http://www.eset.com >> >> >> >> >> __________ Information from ESET Smart Security, version of virus signature >> database 5377 (20100818) __________ >> >> The message was checked by ESET Smart Security. >> >> http://www.eset.com >> >> >> >> __________ Information from ESET Smart Security, version of virus signature >> database 5377 (20100818) __________ >> >> The message was checked by ESET Smart Security. >> >> http://www.eset.com >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Regards, Chirag Matkar _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From miguel.pignatelli at uv.es Wed Nov 3 05:42:49 2010 From: miguel.pignatelli at uv.es (Miguel Pignatelli) Date: Wed, 03 Nov 2010 10:42:49 +0100 Subject: [Bioperl-l] Another Taxonomy modules to CPAN In-Reply-To: <4C8606FA.3000509@fmi.ch> References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk> <4C860148.3030000@fmi.ch> <007501cb4e6d$9b2c3ac0$d184b040$@edu.hk> <4C8606FA.3000509@fmi.ch> Message-ID: <4CD12E99.7080701@uv.es> Hi all, I have written a couple of modules that overlap certain functionality with Bio::DB::Taxonomy and Bio::Taxon. I had to write them because certain constraints in the environment I had to run it (GRID) made impossible to use a bioperl based solution. The main features of these modules are: + No dependencies of non-standard Perl modules + NCBI and RDP based taxonomies supported + Very fast and low memory footprint -- orders of magnitude faster than Bioperl modules (for the tasks they are designed for --). Of course, they do not compete with Bio::DB::Taxonomy and Bio::Taxon in completeness or integration with other tools (e.g. rest of bioperl suit) but they are handy for mapping very large datasets (for example blast results) with the NCBI or RDP Taxonomy. The modules are: Taxonomy::Base -- Finds ancestors, ranks, converts between names, ranks and IDs, etc... Taxonomy::RDP -- Reads the taxonomic tree from the RDP xml file Taxonomy::NCBI -- Reads the taxonomic tree from flat NCBI files (nodes.dmp and names.dmp) (Similar to Bio::DB::Taxonomy::flatfile) Taxonomy::NCBI::Gi2taxid -- Converts very fast and efficiently NCBI GIs to Taxids. Uses a binary lookup table. These modules are being used by several groups now -- mainly working with large metagenomics datasets -- and I am considering uploading them to CPAN, but I am not clear on where these modules should be placed there. How do you think I should name these modules? (e.g. where these modules should live in CPAN?) Their natural place could be under Bio::DB::Taxonomy, maybe Bio::DB::Taxonomy::Lite / Bio::DB::Taxonomy::Lite::NCBI / etc...? Is this possible (and convenient) without being part of Bioperl? Any other suggestions? Thank you very much in advance, M; ---------------------------------------------------- From gabbyteku at gmail.com Wed Nov 3 07:53:42 2010 From: gabbyteku at gmail.com (gabriel teku) Date: Wed, 3 Nov 2010 13:53:42 +0200 Subject: [Bioperl-l] Bio::DB::EUtilities esearch PhraseNotFound error Message-ID: I can't get my esearch to work. It's as follows: my $eut_obj = Bio::DB::EUtilities->new( -eutil => 'esearch', -email => ' myemail at gmail.com', -term => '$genename[SYMB] AND homo_sapiens[ORGN]', -db => 'geo', -usehistory => 'y' ); When I run this using a gene symbol, it works fine. But then when I run it as it is with a gene's symbol changed to $genename(stores the gene's symbol/name), it fails. How can this be fixed? Thanks From miguel.pignatelli at uv.es Wed Nov 3 09:51:18 2010 From: miguel.pignatelli at uv.es (Miguel Pignatelli) Date: Wed, 03 Nov 2010 14:51:18 +0100 Subject: [Bioperl-l] SFF format support In-Reply-To: <4CD12E99.7080701@uv.es> References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk> <4C860148.3030000@fmi.ch> <007501cb4e6d$9b2c3ac0$d184b040$@edu.hk> <4C8606FA.3000509@fmi.ch> <4CD12E99.7080701@uv.es> Message-ID: <4CD168D6.8030100@uv.es> Hi all, I have seen in the Nextgen section of the bioperl wiki (http://www.bioperl.org/wiki/Nextgen_in_Bioperl) that SFF support is in the wish list. I have some code written for parsing SFF files that I can refactor and contribute with it. Does anyone already took this? Best regards, M; From biopython at maubp.freeserve.co.uk Wed Nov 3 10:28:56 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 3 Nov 2010 14:28:56 +0000 Subject: [Bioperl-l] SFF format support In-Reply-To: <4CD168D6.8030100@uv.es> References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk> <4C860148.3030000@fmi.ch> <007501cb4e6d$9b2c3ac0$d184b040$@edu.hk> <4C8606FA.3000509@fmi.ch> <4CD12E99.7080701@uv.es> <4CD168D6.8030100@uv.es> Message-ID: On 2010/11/3 Miguel Pignatelli : > Hi all, > > I have seen in the Nextgen section of the bioperl wiki > (http://www.bioperl.org/wiki/Nextgen_in_Bioperl) that SFF support is in the > wish list. I have some code written for parsing SFF files that I can > refactor and contribute with it. > > Does anyone already took this? > > Best regards, > > M; Hi, This sounds like a good thing for BioPerl. If you hook this up into Bio::SeqIO then for consistency with Biopython for the format name note we use "sff" as the full reads, and "sff-trim" for the reads with the quality trimming applied. If BioPerl has some built in way to hold a trimmed sequence (with the head/tail still accessible) then this may not be a good solution. Peter From scott at scottcain.net Wed Nov 3 10:59:12 2010 From: scott at scottcain.net (Scott Cain) Date: Wed, 3 Nov 2010 10:59:12 -0400 Subject: [Bioperl-l] Bio::DB::EUtilities esearch PhraseNotFound error In-Reply-To: References: Message-ID: Hi Gabriel, It looks to me like you've got a few things going wrong. First, you are using single quotes for the right hand side of the -term value. In perl, single quotes are non-interpolating, which means variables won't be substituted in. However, if you switch to double quotes, you'll still have a problem, because this: $genename[SYMB] looks like you are trying to access an element of an array called genename with a constant called SYMB, which of course you aren't, so the perl interpreter will die with that. You can rewrite that section like this: $genename . '[SYMB] AND homo_sapiens[ORGN]' so that the variable interpolation is moved out of the single quotes. Scott On Wed, Nov 3, 2010 at 7:53 AM, gabriel teku wrote: > I can't get my esearch to work. It's as follows: > > my $eut_obj = Bio::DB::EUtilities->new( ? -eutil ? ? ?=> 'esearch', > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-email ? ? ?=> ' > myemail at gmail.com', > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-term ? ? ? => > '$genename[SYMB] AND homo_sapiens[ORGN]', > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-db ? ? ? ? => 'geo', > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-usehistory => 'y' > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?); > > When I run this using a gene symbol, it works fine. > But then when I run it as it is with a gene's symbol changed to > $genename(stores the gene's symbol/name), it fails. > How can this be fixed? > > Thanks > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From cseligman at earthlink.net Wed Nov 3 12:07:05 2010 From: cseligman at earthlink.net (Chet Seligman) Date: Wed, 3 Nov 2010 09:07:05 -0700 Subject: [Bioperl-l] BioPerl installation was incomplete Message-ID: <001001cb7b71$294f12c0$7bed3840$@earthlink.net> My perl is ActiveState 5.10.1 I installed with PPM using the following repositories: BioPerl-Regular Releases BioPerl-Release Candidates Kobes Bribes Trouchelle I got the following warnings: Can't find any package that provides DB_File:: for Bundle-BioPerl-Core WARNING: Can't find any package that provides IPC::Run for GraphViz WARNING: Can't find any package that provides Apache:: for SOAP-Lite I do have DB_File installed but do not have: IPC::Run or, Apache:: for SOAP-Lite I did try to install SOAP-Lite according to the instructions in the Wiki but no luck. In PPM-edit repositories I saw that Kobes, Bribes & Trouchelle all showed "0" So how do I get SOAP-Lite, Apache and IPC::Run? Chet Seligman From cjfields at illinois.edu Wed Nov 3 13:18:38 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 3 Nov 2010 12:18:38 -0500 Subject: [Bioperl-l] SFF format support In-Reply-To: <4CD168D6.8030100@uv.es> References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk> <4C860148.3030000@fmi.ch> <007501cb4e6d$9b2c3ac0$d184b040$@edu.hk> <4C8606FA.3000509@fmi.ch> <4CD12E99.7080701@uv.es> <4CD168D6.8030100@uv.es> Message-ID: <46FE17F5-32A8-40F8-AAF7-70010B7A0831@illinois.edu> Sure, you are more than welcome to add this. Our main source code repository in now on github (so one can fork the code, hack away, and submit pull requests), but we also accept patches. I suggest, though, for long-term maintenance you could be added as a collaborator. chris On Nov 3, 2010, at 8:51 AM, Miguel Pignatelli wrote: > Hi all, > > I have seen in the Nextgen section of the bioperl wiki (http://www.bioperl.org/wiki/Nextgen_in_Bioperl) that SFF support is in the wish list. I have some code written for parsing SFF files that I can refactor and contribute with it. > > Does anyone already took this? > > Best regards, > > M; > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hanbobio at 126.com Tue Nov 2 01:29:50 2010 From: hanbobio at 126.com (hanbobio) Date: Tue, 2 Nov 2010 13:29:50 +0800 (CST) Subject: [Bioperl-l] bioperl modules Message-ID: <6f81b65.7a0e.12c0b100d5b.Coremail.hanbobio@126.com> Hi, all. I ran the bioperl script on suse10, the perl version was 5.8.7 and bioperl version was 1.6 . But there were some mistake, the following was the message the linux system feelback: liaoy at linux:~/nd-hn> perl nd-hn.pl Can't locate Bio/Tools/Run/Alignment/Clustalw.pm in @INC (@INC contains: /usr/lib/perl5/5.8.7/i586-linux-thread-multi/usr/lib/perl5/5.8.7/usr/lib/perl5/site_perl/5.8.7/i586-linux-thread-multi/usr/lib/perl5/site_perl/5.8.7/usr/lib/perl5/site_perl/usr/lib/perl5/vendor_perl/5.8.7/i586-linux-thread-multi/usr/lib/perl5/vendor_perl/5.8.7 /usr/lib/perl5/vendor_perl .) at nd-hn.pl line 26. BEGIN failed--compilation aborted at nd-hn.pl line 26. The detail of this script in two ways: the attached file and the following script. Would you be kindly to test my script and give me some modification or suggestion? I myself tested the script some time ago, and my judge was that: the first part(retrieve sequences from remote genebank) and the third part(get the conservative sequences of the multi- sequences alignment outputfile) were right, and the second part(using the Clustalw to do the multi- sequences alignment) encountered some problem? What the problem is? How can I correct it? Thank you very much for your advice. Best regards Yusheng Liao 2010-11-02 # retrive sequence from Genbank use strict; use Bio::DB::GenBank; use Bio::SeqIO; my $gb = new Bio::DB::GenBank; open(OUT,">HCV-5UTR-2.txt")||die "Can't open the file!"; my $seqout = new Bio::SeqIO(-fh => \*OUT, -format => 'fasta'); my $query = Bio::DB::Query::GenBank->new (-query =>'Hepatitis C virus[Organism] AND genotype=1a and 5 UTR', -db => 'nucleotide'); my $seqio = $gb->get_Stream_by_query($query); my $i=0; while( defined (my $seq = $seqio->next_seq )) { $seqout->write_seq($seq); $i++; print "."; } print "$i \n"; # run multi sequence alignment use Bio::Tools::Run::Alignment::Clustalw; use Bio::SimpleAlign; use Bio::AlignIO::clustalw; my @params = ('ktuple' => 2, 'matrix' => 'BLOSUM','output'=>'mdf','outfile'=>'HCV-5UTR-2.aln'); my $factory = Bio::Tools::Run::Alignment::Clustalw->new(@params); my $ktuple = 3; $factory->ktuple($ktuple); my $inputfilename = '/home/Tonny/perl-5.10.0/HCV-5UTR-2.txt'; my $aln = $factory->align($inputfilename); my $alignout->write_aln(my $aln); my $line; my @array=('A','C','G','T','-'); my @arrayX=('a','c','g','t','-'); my @arrayA; my @arrayB; my ($i,$j,$k); my (@a, at b); my %hash; open FILE, "/home/Tonny/perl-5.10.0/HCV-5UTR-2.aln" or die; open OUT, ">getAlignHCV-5 UTR-1a.txt" or die; open OUTA, ">getConsensusHCV-5 UTR-1a.txt" or die; while($line=){ chomp; if($line=~ /CLUSTAL/){ next; }elsif($line=~ /^[A-Za-z0-9]/){ @a=split /\s+/,$line; if(defined($hash{$a[0]})){ push @{$hash{$a[0]}}, $line; }else{ ${$hash{$a[0]}}[0]=$line; push @b,$a[0]; } } } for(my $k=0;$k<=$#{$hash{$a[0]}};$k++){ for(my $l=0;$l<=$#b;$l++){ print OUT ${$hash{$b[$l]}}[$k]; my $seq=(split /\s+/,${$hash{$b[$l]}}[$k])[1]; @arrayA=split //,$seq; for($i=0;$i<=$#arrayA;$i++){ if($arrayA[$i] =~ /A/i){ $arrayB[0][$i]++; $arrayB[5][$i]++; } if($arrayA[$i] =~ /C/i){ $arrayB[1][$i]++; $arrayB[5][$i]++; } if($arrayA[$i] =~ /G/i){ $arrayB[2][$i]++; $arrayB[5][$i]++; } if($arrayA[$i] =~ /T/i){ $arrayB[3][$i]++; $arrayB[5][$i]++; } if($arrayA[$i] =~ /\./i){ $arrayB[4][$i]++; $arrayB[5][$i]++; } if($arrayA[$i] =~ /\-/i){ $arrayB[4][$i]++; $arrayB[5][$i]++; } } } for($j=0;$j<=$#arrayA;$j++){ my $large=0;my $pos=0; for($i=0;$i<=4;$i++){ $arrayB[$i][$j]=$arrayB[$i][$j]/$arrayB[5][$j]; if($arrayB[$i][$j]>$large){ $large=$arrayB[$i][$j]; $pos=$i; } } if($large>0.5){ $arrayB[5][$j]=$large; $arrayB[6][$j]=$array[$pos]; }else{ $arrayB[5][$j]=$large; $arrayB[6][$j]='N'; } } for($i=0;$i<=5;$i++){ for($j=0;$j<=15;$j++){ print OUT " "; } for($j=0;$j<=$#arrayA;$j++){ if($arrayB[$i][$j]>0){ printf OUT "%.2f\t",$arrayB[$i][$j]; }else{ print OUT "0\t"; } } print OUT "\n"; } for($j=0;$j<=15;$j++){ print OUT " "; } for($j=0;$j<=$#arrayA;$j++){ print OUT "$arrayB[6][$j]\t"; } print OUT "\n"; for($j=0;$j<=15;$j++){ print OUTA " "; } for($j=0;$j<=$#arrayA;$j++){ print OUTA "$arrayB[6][$j]"; } print OUTA "\n"; @arrayB=undef; } -------------- next part -------------- A non-text attachment was scrubbed... Name: Moduletest.pl.pl Type: application/octet-stream Size: 4525 bytes Desc: not available URL: From daniel.standage at gmail.com Wed Nov 3 14:42:07 2010 From: daniel.standage at gmail.com (Daniel Standage) Date: Wed, 3 Nov 2010 13:42:07 -0500 Subject: [Bioperl-l] Write segments with Bio::Tools::GFF Message-ID: Hi I've got a simple script that is filtering some GFF3 data. The Bio::Tools::GFF class has methods for reading and writing features and reading regions (segments), but I cannot find a method for writing regions. It's not hard to just print out the sequence-region line from the LocatableSeq object, but then these lines are printed out before the gff-version line (which is a no-no). Any suggestions about how to handle this? Thanks. -- Daniel S. Standage Graduate Research Assistant Bioinformatics and Computational Biology Program Iowa State University From scott at scottcain.net Wed Nov 3 16:09:45 2010 From: scott at scottcain.net (Scott Cain) Date: Wed, 3 Nov 2010 16:09:45 -0400 Subject: [Bioperl-l] Write segments with Bio::Tools::GFF In-Reply-To: References: Message-ID: Hi Daniel, Why do you need the sequence-region line? It is mostly just informational. If you want to define the reference sequence for the features in the GFF file, I would suggest printing out a full GFF line for the reference sequence instead anyway. That way, it can have all the information that a GFF line can encode, like a source and type, as well as information in the ninth column. That said, Bio::Tools::GFF probably does lack that method. The code in it is probably a little outdated. Scott On Wed, Nov 3, 2010 at 2:42 PM, Daniel Standage wrote: > Hi > > I've got a simple script that is filtering some GFF3 data. The > Bio::Tools::GFF class has methods for reading and writing features and > reading regions (segments), but I cannot find a method for writing regions. > It's not hard to just print out the sequence-region line from the > LocatableSeq object, but then these lines are printed out before the > gff-version line (which is a no-no). Any suggestions about how to handle > this? > > Thanks. > > -- > Daniel S. Standage > Graduate Research Assistant > Bioinformatics and Computational Biology Program > Iowa State University > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From scott at scottcain.net Wed Nov 3 16:19:07 2010 From: scott at scottcain.net (Scott Cain) Date: Wed, 3 Nov 2010 16:19:07 -0400 Subject: [Bioperl-l] Write segments with Bio::Tools::GFF In-Reply-To: References: Message-ID: Ah, I see. Well, you should complain to the authors that sequence-region isn't a required directive (though if you made that complaint to me, I'd say "well, it is for this tool" :-) Of course, you could also use Bio::Graphics or GBrowse to do the drawing, then I could do more to help. :-) Scott On Wed, Nov 3, 2010 at 4:14 PM, Daniel Standage wrote: > I'm using a tool called AnnotationSketch (part of the GenomeTools package) > to create graphics from my GFF3 files. It complains when sequence-region > lines are not included, even when the reference is printed as a feature. > That's what got me interested in the problem in the first place. > > Thanks for the info, I'll figure something out. > > Daniel > > On Wed, Nov 3, 2010 at 3:09 PM, Scott Cain wrote: >> >> Hi Daniel, >> >> Why do you need the sequence-region line? ?It is mostly just >> informational. ?If you want to define the reference sequence for the >> features in the GFF file, I would suggest printing out a full GFF line >> for the reference sequence instead anyway. ?That way, it can have all >> the information that a GFF line can encode, like a source and type, as >> well as information in the ninth column. >> >> That said, Bio::Tools::GFF probably does lack that method. ?The code >> in it is probably a little outdated. >> >> Scott >> >> >> On Wed, Nov 3, 2010 at 2:42 PM, Daniel Standage >> wrote: >> > Hi >> > >> > I've got a simple script that is filtering some GFF3 data. The >> > Bio::Tools::GFF class has methods for reading and writing features and >> > reading regions (segments), but I cannot find a method for writing >> > regions. >> > It's not hard to just print out the sequence-region line from the >> > LocatableSeq object, but then these lines are printed out before the >> > gff-version line (which is a no-no). Any suggestions about how to handle >> > this? >> > >> > Thanks. >> > >> > -- >> > Daniel S. Standage >> > Graduate Research Assistant >> > Bioinformatics and Computational Biology Program >> > Iowa State University >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > >> >> >> >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain >> dot net >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 >> Ontario Institute for Cancer Research > > > > -- > Daniel S. Standage > Graduate Research Assistant > Bioinformatics and Computational Biology Program > Iowa State University > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From daniel.standage at gmail.com Wed Nov 3 16:14:02 2010 From: daniel.standage at gmail.com (Daniel Standage) Date: Wed, 3 Nov 2010 15:14:02 -0500 Subject: [Bioperl-l] Write segments with Bio::Tools::GFF In-Reply-To: References: Message-ID: I'm using a tool called AnnotationSketch (part of the GenomeTools package) to create graphics from my GFF3 files. It complains when sequence-region lines are not included, even when the reference is printed as a feature. That's what got me interested in the problem in the first place. Thanks for the info, I'll figure something out. Daniel On Wed, Nov 3, 2010 at 3:09 PM, Scott Cain wrote: > Hi Daniel, > > Why do you need the sequence-region line? It is mostly just > informational. If you want to define the reference sequence for the > features in the GFF file, I would suggest printing out a full GFF line > for the reference sequence instead anyway. That way, it can have all > the information that a GFF line can encode, like a source and type, as > well as information in the ninth column. > > That said, Bio::Tools::GFF probably does lack that method. The code > in it is probably a little outdated. > > Scott > > > On Wed, Nov 3, 2010 at 2:42 PM, Daniel Standage > wrote: > > Hi > > > > I've got a simple script that is filtering some GFF3 data. The > > Bio::Tools::GFF class has methods for reading and writing features and > > reading regions (segments), but I cannot find a method for writing > regions. > > It's not hard to just print out the sequence-region line from the > > LocatableSeq object, but then these lines are printed out before the > > gff-version line (which is a no-no). Any suggestions about how to handle > > this? > > > > Thanks. > > > > -- > > Daniel S. Standage > > Graduate Research Assistant > > Bioinformatics and Computational Biology Program > > Iowa State University > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot > net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > -- Daniel S. Standage Graduate Research Assistant Bioinformatics and Computational Biology Program Iowa State University From daniel.standage at gmail.com Wed Nov 3 16:29:29 2010 From: daniel.standage at gmail.com (Daniel Standage) Date: Wed, 3 Nov 2010 15:29:29 -0500 Subject: [Bioperl-l] Write segments with Bio::Tools::GFF In-Reply-To: References: Message-ID: Yeah, I guess it's not a show stopper, since it still generates the graphics fine. The output that clutters the terminal is still annoying, but I can live with that. Thanks! Daniel On Wed, Nov 3, 2010 at 3:19 PM, Scott Cain wrote: > Ah, I see. Well, you should complain to the authors that > sequence-region isn't a required directive (though if you made that > complaint to me, I'd say "well, it is for this tool" :-) > > Of course, you could also use Bio::Graphics or GBrowse to do the > drawing, then I could do more to help. :-) > > Scott > > > On Wed, Nov 3, 2010 at 4:14 PM, Daniel Standage > wrote: > > I'm using a tool called AnnotationSketch (part of the GenomeTools > package) > > to create graphics from my GFF3 files. It complains when sequence-region > > lines are not included, even when the reference is printed as a feature. > > That's what got me interested in the problem in the first place. > > > > Thanks for the info, I'll figure something out. > > > > Daniel > > > > On Wed, Nov 3, 2010 at 3:09 PM, Scott Cain wrote: > >> > >> Hi Daniel, > >> > >> Why do you need the sequence-region line? It is mostly just > >> informational. If you want to define the reference sequence for the > >> features in the GFF file, I would suggest printing out a full GFF line > >> for the reference sequence instead anyway. That way, it can have all > >> the information that a GFF line can encode, like a source and type, as > >> well as information in the ninth column. > >> > >> That said, Bio::Tools::GFF probably does lack that method. The code > >> in it is probably a little outdated. > >> > >> Scott > >> > >> > >> On Wed, Nov 3, 2010 at 2:42 PM, Daniel Standage > >> wrote: > >> > Hi > >> > > >> > I've got a simple script that is filtering some GFF3 data. The > >> > Bio::Tools::GFF class has methods for reading and writing features and > >> > reading regions (segments), but I cannot find a method for writing > >> > regions. > >> > It's not hard to just print out the sequence-region line from the > >> > LocatableSeq object, but then these lines are printed out before the > >> > gff-version line (which is a no-no). Any suggestions about how to > handle > >> > this? > >> > > >> > Thanks. > >> > > >> > -- > >> > Daniel S. Standage > >> > Graduate Research Assistant > >> > Bioinformatics and Computational Biology Program > >> > Iowa State University > >> > _______________________________________________ > >> > Bioperl-l mailing list > >> > Bioperl-l at lists.open-bio.org > >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > >> > >> > >> > >> -- > >> ------------------------------------------------------------------------ > >> Scott Cain, Ph. D. scott at scottcain > >> dot net > >> GMOD Coordinator (http://gmod.org/) 216-392-3087 > >> Ontario Institute for Cancer Research > > > > > > > > -- > > Daniel S. Standage > > Graduate Research Assistant > > Bioinformatics and Computational Biology Program > > Iowa State University > > > > > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot > net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > -- Daniel S. Standage Graduate Research Assistant Bioinformatics and Computational Biology Program Iowa State University From cjfields at illinois.edu Wed Nov 3 17:27:28 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 3 Nov 2010 16:27:28 -0500 Subject: [Bioperl-l] bioperl modules In-Reply-To: <6f81b65.7a0e.12c0b100d5b.Coremail.hanbobio@126.com> References: <6f81b65.7a0e.12c0b100d5b.Coremail.hanbobio@126.com> Message-ID: <2A3F6A7D-CF80-4C6D-8BE2-EEDA873BCC3D@illinois.edu> You need to install bioperl-run to get Bio::Tools::Run::Alignment::Clustalw. chris On Nov 2, 2010, at 12:29 AM, hanbobio wrote: > Hi, all. > I ran the bioperl script on suse10, the perl version was 5.8.7 and bioperl version was 1.6 . But there were some mistake, the following was the message the linux system feelback: > > liaoy at linux:~/nd-hn> perl nd-hn.pl > > Can't locate Bio/Tools/Run/Alignment/Clustalw.pm in @INC (@INC contains: /usr/lib/perl5/5.8.7/i586-linux-thread-multi/usr/lib/perl5/5.8.7/usr/lib/perl5/site_perl/5.8.7/i586-linux-thread-multi/usr/lib/perl5/site_perl/5.8.7/usr/lib/perl5/site_perl/usr/lib/perl5/vendor_perl/5.8.7/i586-linux-thread-multi/usr/lib/perl5/vendor_perl/5.8.7 /usr/lib/perl5/vendor_perl .) at nd-hn.pl line 26. > > BEGIN failed--compilation aborted at nd-hn.pl line 26. > The detail of this script in two ways: the attached file and the following script. > Would you be kindly to test my script and give me some modification or suggestion? > I myself tested the script some time ago, and my judge was that: the first part(retrieve sequences from remote genebank) and the third part(get the conservative sequences of the multi- sequences alignment outputfile) were right, and the second part(using the Clustalw to do the multi- sequences alignment) encountered some problem? What the problem is? How can I correct it? > Thank you very much for your advice. > > Best regards > Yusheng Liao 2010-11-02 > > # retrive sequence from Genbank > use strict; > use Bio::DB::GenBank; > use Bio::SeqIO; > my $gb = new Bio::DB::GenBank; > open(OUT,">HCV-5UTR-2.txt")||die "Can't open the file!"; > my $seqout = new Bio::SeqIO(-fh => \*OUT, -format => 'fasta'); > my $query = Bio::DB::Query::GenBank->new > (-query =>'Hepatitis C virus[Organism] AND genotype=1a and 5 UTR', > -db => 'nucleotide'); > my $seqio = $gb->get_Stream_by_query($query); > my $i=0; > while( defined (my $seq = $seqio->next_seq )) { > $seqout->write_seq($seq); > $i++; > print "."; > } > print "$i \n"; > # run multi sequence alignment > use Bio::Tools::Run::Alignment::Clustalw; > use Bio::SimpleAlign; > use Bio::AlignIO::clustalw; > my @params = ('ktuple' => 2, 'matrix' => 'BLOSUM','output'=>'mdf','outfile'=>'HCV-5UTR-2.aln'); > my $factory = Bio::Tools::Run::Alignment::Clustalw->new(@params); > my $ktuple = 3; > $factory->ktuple($ktuple); > my $inputfilename = '/home/Tonny/perl-5.10.0/HCV-5UTR-2.txt'; > my $aln = $factory->align($inputfilename); > my $alignout->write_aln(my $aln); > > my $line; > my @array=('A','C','G','T','-'); > my @arrayX=('a','c','g','t','-'); > my @arrayA; > my @arrayB; > my ($i,$j,$k); > my (@a, at b); > my %hash; > open FILE, "/home/Tonny/perl-5.10.0/HCV-5UTR-2.aln" or die; > open OUT, ">getAlignHCV-5 UTR-1a.txt" or die; > open OUTA, ">getConsensusHCV-5 UTR-1a.txt" or die; > while($line=){ > chomp; > if($line=~ /CLUSTAL/){ > next; > }elsif($line=~ /^[A-Za-z0-9]/){ > @a=split /\s+/,$line; > if(defined($hash{$a[0]})){ > push @{$hash{$a[0]}}, $line; > }else{ > ${$hash{$a[0]}}[0]=$line; > push @b,$a[0]; > } > } > } > for(my $k=0;$k<=$#{$hash{$a[0]}};$k++){ > for(my $l=0;$l<=$#b;$l++){ > print OUT ${$hash{$b[$l]}}[$k]; > my $seq=(split /\s+/,${$hash{$b[$l]}}[$k])[1]; > @arrayA=split //,$seq; > for($i=0;$i<=$#arrayA;$i++){ > if($arrayA[$i] =~ /A/i){ > $arrayB[0][$i]++; > $arrayB[5][$i]++; > } > if($arrayA[$i] =~ /C/i){ > $arrayB[1][$i]++; > $arrayB[5][$i]++; > } > if($arrayA[$i] =~ /G/i){ > $arrayB[2][$i]++; > $arrayB[5][$i]++; > } > if($arrayA[$i] =~ /T/i){ > $arrayB[3][$i]++; > $arrayB[5][$i]++; > } > if($arrayA[$i] =~ /\./i){ > $arrayB[4][$i]++; > $arrayB[5][$i]++; > } > if($arrayA[$i] =~ /\-/i){ > $arrayB[4][$i]++; > $arrayB[5][$i]++; > } > } > } > for($j=0;$j<=$#arrayA;$j++){ > my $large=0;my $pos=0; > for($i=0;$i<=4;$i++){ > $arrayB[$i][$j]=$arrayB[$i][$j]/$arrayB[5][$j]; > if($arrayB[$i][$j]>$large){ > $large=$arrayB[$i][$j]; > $pos=$i; > } > } > if($large>0.5){ > $arrayB[5][$j]=$large; > $arrayB[6][$j]=$array[$pos]; > }else{ > $arrayB[5][$j]=$large; > $arrayB[6][$j]='N'; > } > } > for($i=0;$i<=5;$i++){ > for($j=0;$j<=15;$j++){ > print OUT " "; > } > for($j=0;$j<=$#arrayA;$j++){ > if($arrayB[$i][$j]>0){ > printf OUT "%.2f\t",$arrayB[$i][$j]; > }else{ > print OUT "0\t"; > } > } > print OUT "\n"; > } > for($j=0;$j<=15;$j++){ > print OUT " "; > } > for($j=0;$j<=$#arrayA;$j++){ > print OUT "$arrayB[6][$j]\t"; > } > print OUT "\n"; > for($j=0;$j<=15;$j++){ > print OUTA " "; > } > for($j=0;$j<=$#arrayA;$j++){ > print OUTA "$arrayB[6][$j]"; > } > print OUTA "\n"; > @arrayB=undef; > } > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Nov 3 22:34:30 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 3 Nov 2010 21:34:30 -0500 Subject: [Bioperl-l] Another Taxonomy modules to CPAN In-Reply-To: <4CD12E99.7080701@uv.es> References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk> <4C860148.3030000@fmi.ch> <007501cb4e6d$9b2c3ac0$d184b040$@edu.hk> <4C8606FA.3000509@fmi.ch> <4CD12E99.7080701@uv.es> Message-ID: Miguel, (Caveat: You should also ask this on the perl module-authors list, just in case: http://lists.perl.org/list/module-authors.html) Not sure how the other devs feel, but I personally don't think the Bio* namespace is reserved only for BioPerl modules (see Bio::Phylo, for example). It's a fairly generic top-level name. The only worry I have is if these are too similar to current BioPerl modules; Bio::Taxonomy and Bio::DB::Taxonomy already have namespaces in CPAN related to BioPerl modules. Saying that, tagging them as *::Lite might be fine, as long as the documentation indicated these are not related to BioPerl. Anyone else want to chime in? Maybe releasing them as a top-level Taxonomy? chris On Nov 3, 2010, at 4:42 AM, Miguel Pignatelli wrote: > Hi all, > > I have written a couple of modules that overlap certain functionality with Bio::DB::Taxonomy and Bio::Taxon. I had to write them because certain constraints in the environment I had to run it (GRID) made impossible to use a bioperl based solution. > > > The main features of these modules are: > > + No dependencies of non-standard Perl modules > + NCBI and RDP based taxonomies supported > + Very fast and low memory footprint -- orders of magnitude faster than Bioperl modules (for the tasks they are designed for --). > > Of course, they do not compete with Bio::DB::Taxonomy and Bio::Taxon in completeness or integration with other tools (e.g. rest of bioperl suit) but they are handy for mapping very large datasets (for example blast results) with the NCBI or RDP Taxonomy. > > The modules are: > > Taxonomy::Base -- Finds ancestors, ranks, converts between > names, ranks and IDs, etc... > > Taxonomy::RDP -- Reads the taxonomic tree from the RDP xml file > > Taxonomy::NCBI -- Reads the taxonomic tree from flat NCBI files > (nodes.dmp and names.dmp) > (Similar to Bio::DB::Taxonomy::flatfile) > > Taxonomy::NCBI::Gi2taxid -- Converts very fast and efficiently > NCBI GIs to Taxids. > Uses a binary lookup table. > > These modules are being used by several groups now -- mainly working with large metagenomics datasets -- and I am considering uploading them to CPAN, but I am not clear on where these modules should be placed there. > > How do you think I should name these modules? (e.g. where these modules should live in CPAN?) Their natural place could be under Bio::DB::Taxonomy, maybe Bio::DB::Taxonomy::Lite / Bio::DB::Taxonomy::Lite::NCBI / etc...? Is this possible (and convenient) without being part of Bioperl? Any other suggestions? > > Thank you very much in advance, > > M; > > ---------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Thu Nov 4 04:54:33 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 4 Nov 2010 09:54:33 +0100 Subject: [Bioperl-l] Another Taxonomy modules to CPAN In-Reply-To: References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk> <4C860148.3030000@fmi.ch> <007501cb4e6d$9b2c3ac0$d184b040$@edu.hk> <4C8606FA.3000509@fmi.ch> <4CD12E99.7080701@uv.es> Message-ID: <4F9F218F-22EC-46B8-8BAB-82BD0956FB76@sbc.su.se> I agree with Chris. You're welcome to put your code in the Bio:: namespace ? it's not reserved for BioPerl ? but since there are already modules with Taxonomy in their names, it would be likely to cause confusion. So, if your modules are renamed to be sufficiently distinguishable from existing Taxonomy modules, putting them under Bio:: should be no problem. Otherwise, I would go whatever the modules list folks recommend. Dave From miguel.pignatelli at uv.es Thu Nov 4 07:13:20 2010 From: miguel.pignatelli at uv.es (Miguel Pignatelli) Date: Thu, 04 Nov 2010 12:13:20 +0100 Subject: [Bioperl-l] SFF format support In-Reply-To: <46FE17F5-32A8-40F8-AAF7-70010B7A0831@illinois.edu> References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk> <4C860148.3030000@fmi.ch> <007501cb4e6d$9b2c3ac0$d184b040$@edu.hk> <4C8606FA.3000509@fmi.ch> <4CD12E99.7080701@uv.es> <4CD168D6.8030100@uv.es> <46FE17F5-32A8-40F8-AAF7-70010B7A0831@illinois.edu> Message-ID: <4CD29550.6030005@uv.es> Thanks Chris, I will get in touch again when I am able to hook a working module (I will check Biopython's Bio.Seq.SffIO) M; Chris Fields wrote: > Sure, you are more than welcome to add this. Our main source code repository in now on github (so one can fork the code, hack away, and submit pull requests), but we also accept patches. > > I suggest, though, for long-term maintenance you could be added as a collaborator. > > chris > > On Nov 3, 2010, at 8:51 AM, Miguel Pignatelli wrote: > >> Hi all, >> >> I have seen in the Nextgen section of the bioperl wiki (http://www.bioperl.org/wiki/Nextgen_in_Bioperl) that SFF support is in the wish list. I have some code written for parsing SFF files that I can refactor and contribute with it. >> >> Does anyone already took this? >> >> Best regards, >> >> M; >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From biopython at maubp.freeserve.co.uk Thu Nov 4 07:40:13 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 4 Nov 2010 11:40:13 +0000 Subject: [Bioperl-l] SFF format support In-Reply-To: <4CD29550.6030005@uv.es> References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk> <4C860148.3030000@fmi.ch> <007501cb4e6d$9b2c3ac0$d184b040$@edu.hk> <4C8606FA.3000509@fmi.ch> <4CD12E99.7080701@uv.es> <4CD168D6.8030100@uv.es> <46FE17F5-32A8-40F8-AAF7-70010B7A0831@illinois.edu> <4CD29550.6030005@uv.es> Message-ID: 2010/11/4 Miguel Pignatelli : > Thanks Chris, > > I will get in touch again when I am able to hook a working module (I will > check Biopython's Bio.Seq.SffIO) > > M; Hi Miguel, In case it helps, the Biopython SFF code is here - this includes embedded documentation with test examples: http://github.com/biopython/biopython/blob/master/Bio/SeqIO/SffIO.py We also have some test SFF files here (see the README file for details): http://github.com/biopython/biopython/tree/master/Tests/Roche/ Peter From pignatelli_mig at gva.es Thu Nov 4 07:49:07 2010 From: pignatelli_mig at gva.es (Miguel Pignatelli) Date: Thu, 04 Nov 2010 12:49:07 +0100 Subject: [Bioperl-l] SFF format support In-Reply-To: References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk> <4C860148.3030000@fmi.ch> <007501cb4e6d$9b2c3ac0$d184b040$@edu.hk> <4C8606FA.3000509@fmi.ch> <4CD12E99.7080701@uv.es> <4CD168D6.8030100@uv.es> <46FE17F5-32A8-40F8-AAF7-70010B7A0831@illinois.edu> <4CD29550.6030005@uv.es> Message-ID: <4CD29DB3.9010805@gva.es> Thanks Peter, I have already took a look at the module. It will certainly help a lot. I also have some real SFF files from different collaborations that I can use for testing. Let's roll up the sleeves... M; Peter wrote: > 2010/11/4 Miguel Pignatelli : >> Thanks Chris, >> >> I will get in touch again when I am able to hook a working module (I will >> check Biopython's Bio.Seq.SffIO) >> >> M; > > Hi Miguel, > > In case it helps, the Biopython SFF code is here - this includes embedded > documentation with test examples: > > http://github.com/biopython/biopython/blob/master/Bio/SeqIO/SffIO.py > > We also have some test SFF files here (see the README file for details): > > http://github.com/biopython/biopython/tree/master/Tests/Roche/ > > Peter From hlapp at drycafe.net Thu Nov 4 12:12:04 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Thu, 4 Nov 2010 12:12:04 -0400 Subject: [Bioperl-l] Another Taxonomy modules to CPAN In-Reply-To: References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk> <4C860148.3030000@fmi.ch> <007501cb4e6d$9b2c3ac0$d184b040$@edu.hk> <4C8606FA.3000509@fmi.ch> <4CD12E99.7080701@uv.es> Message-ID: <40104040-BCBF-41D0-86EC-65741F1195A6@drycafe.net> I agree - Bio:: isn't exclusive to BioPerl, but in choosing module name and namespace, as well as in documentation, try to minimize potential confusion for users. -hilmar On Nov 3, 2010, at 10:34 PM, Chris Fields wrote: > Miguel, > > (Caveat: You should also ask this on the perl module-authors list, > just in case: http://lists.perl.org/list/module-authors.html) > > Not sure how the other devs feel, but I personally don't think the > Bio* namespace is reserved only for BioPerl modules (see Bio::Phylo, > for example). It's a fairly generic top-level name. The only worry > I have is if these are too similar to current BioPerl modules; > Bio::Taxonomy and Bio::DB::Taxonomy already have namespaces in CPAN > related to BioPerl modules. > > Saying that, tagging them as *::Lite might be fine, as long as the > documentation indicated these are not related to BioPerl. Anyone > else want to chime in? Maybe releasing them as a top-level Taxonomy? > > chris > > On Nov 3, 2010, at 4:42 AM, Miguel Pignatelli wrote: > >> Hi all, >> >> I have written a couple of modules that overlap certain >> functionality with Bio::DB::Taxonomy and Bio::Taxon. I had to write >> them because certain constraints in the environment I had to run it >> (GRID) made impossible to use a bioperl based solution. >> >> >> The main features of these modules are: >> >> + No dependencies of non-standard Perl modules >> + NCBI and RDP based taxonomies supported >> + Very fast and low memory footprint -- orders of magnitude faster >> than Bioperl modules (for the tasks they are designed for --). >> >> Of course, they do not compete with Bio::DB::Taxonomy and >> Bio::Taxon in completeness or integration with other tools (e.g. >> rest of bioperl suit) but they are handy for mapping very large >> datasets (for example blast results) with the NCBI or RDP Taxonomy. >> >> The modules are: >> >> Taxonomy::Base -- Finds ancestors, ranks, converts between >> names, ranks and IDs, etc... >> >> Taxonomy::RDP -- Reads the taxonomic tree from the RDP xml file >> >> Taxonomy::NCBI -- Reads the taxonomic tree from flat NCBI files >> (nodes.dmp and names.dmp) >> (Similar to Bio::DB::Taxonomy::flatfile) >> >> Taxonomy::NCBI::Gi2taxid -- Converts very fast and efficiently >> NCBI GIs to Taxids. >> Uses a binary lookup table. >> >> These modules are being used by several groups now -- mainly >> working with large metagenomics datasets -- and I am considering >> uploading them to CPAN, but I am not clear on where these modules >> should be placed there. >> >> How do you think I should name these modules? (e.g. where these >> modules should live in CPAN?) Their natural place could be under >> Bio::DB::Taxonomy, maybe Bio::DB::Taxonomy::Lite / >> Bio::DB::Taxonomy::Lite::NCBI / etc...? Is this possible (and >> convenient) without being part of Bioperl? Any other suggestions? >> >> Thank you very much in advance, >> >> M; >> >> ---------------------------------------------------- >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From kellert at ohsu.edu Thu Nov 4 20:45:25 2010 From: kellert at ohsu.edu (Tom Keller) Date: Thu, 4 Nov 2010 17:45:25 -0700 Subject: [Bioperl-l] parsing PubMed retrievals Message-ID: <36BB76FD-C321-4973-ADA6-80066433C0E6@ohsu.edu> Greetings, I'm getting the following error from the bioplerl 1.6.1 example $ perl biblio-eutils-example.pl Can't call method "text" on an undefined value at /Library/Perl/5.10.0/Bio/DB/Biblio/eutils.pm line 378. Any suggestions for getting this to work? thanks, Tom MMI DNA Services Core Facility 503-494-2442 kellert at ohsu.edu Office: 6588 RJH (CROET/BasicScience) From cjfields at illinois.edu Fri Nov 5 00:28:01 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 4 Nov 2010 23:28:01 -0500 Subject: [Bioperl-l] parsing PubMed retrievals In-Reply-To: <36BB76FD-C321-4973-ADA6-80066433C0E6@ohsu.edu> References: <36BB76FD-C321-4973-ADA6-80066433C0E6@ohsu.edu> Message-ID: It seems that in some cases a webenv/querykey are not returned. I have committed a fix to github for this which seems to fix the problem from our end. https://github.com/bioperl/bioperl-live/commit/7c64e8410d291dbd7097b2cd8fc948d95063d153 chris On Nov 4, 2010, at 7:45 PM, Tom Keller wrote: > Greetings, > I'm getting the following error from the bioplerl 1.6.1 example > > $ perl biblio-eutils-example.pl > Can't call method "text" on an undefined value at /Library/Perl/5.10.0/Bio/DB/Biblio/eutils.pm line 378. > > > Any suggestions for getting this to work? > > thanks, > > > Tom > MMI DNA Services Core Facility > 503-494-2442 > kellert at ohsu.edu > Office: 6588 RJH (CROET/BasicScience) > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From miguel.pignatelli at uv.es Fri Nov 5 05:31:46 2010 From: miguel.pignatelli at uv.es (Miguel Pignatelli) Date: Fri, 05 Nov 2010 10:31:46 +0100 Subject: [Bioperl-l] Another Taxonomy modules to CPAN In-Reply-To: <40104040-BCBF-41D0-86EC-65741F1195A6@drycafe.net> References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk> <4C860148.3030000@fmi.ch> <007501cb4e6d$9b2c3ac0$d184b040$@edu.hk> <4C8606FA.3000509@fmi.ch> <4CD12E99.7080701@uv.es> <40104040-BCBF-41D0-86EC-65741F1195A6@drycafe.net> Message-ID: <4CD3CF02.9010608@uv.es> Hi all, In the perl module-authors list I have been suggested two nice alternatives: Bio::Lite::Taxonomy::* BioX::Taxonomy::* If the devs don't have any objection/preference I will follow the first one. Regards, M; Hilmar Lapp wrote: > I agree - Bio:: isn't exclusive to BioPerl, but in choosing module name > and namespace, as well as in documentation, try to minimize potential > confusion for users. > > -hilmar > On Nov 3, 2010, at 10:34 PM, Chris Fields wrote: > >> Miguel, >> >> (Caveat: You should also ask this on the perl module-authors list, >> just in case: http://lists.perl.org/list/module-authors.html) >> >> Not sure how the other devs feel, but I personally don't think the >> Bio* namespace is reserved only for BioPerl modules (see Bio::Phylo, >> for example). It's a fairly generic top-level name. The only worry I >> have is if these are too similar to current BioPerl modules; >> Bio::Taxonomy and Bio::DB::Taxonomy already have namespaces in CPAN >> related to BioPerl modules. >> >> Saying that, tagging them as *::Lite might be fine, as long as the >> documentation indicated these are not related to BioPerl. Anyone else >> want to chime in? Maybe releasing them as a top-level Taxonomy? >> >> chris >> >> On Nov 3, 2010, at 4:42 AM, Miguel Pignatelli wrote: >> >>> Hi all, >>> >>> I have written a couple of modules that overlap certain functionality >>> with Bio::DB::Taxonomy and Bio::Taxon. I had to write them because >>> certain constraints in the environment I had to run it (GRID) made >>> impossible to use a bioperl based solution. >>> >>> >>> The main features of these modules are: >>> >>> + No dependencies of non-standard Perl modules >>> + NCBI and RDP based taxonomies supported >>> + Very fast and low memory footprint -- orders of magnitude faster >>> than Bioperl modules (for the tasks they are designed for --). >>> >>> Of course, they do not compete with Bio::DB::Taxonomy and Bio::Taxon >>> in completeness or integration with other tools (e.g. rest of bioperl >>> suit) but they are handy for mapping very large datasets (for example >>> blast results) with the NCBI or RDP Taxonomy. >>> >>> The modules are: >>> >>> Taxonomy::Base -- Finds ancestors, ranks, converts between >>> names, ranks and IDs, etc... >>> >>> Taxonomy::RDP -- Reads the taxonomic tree from the RDP xml file >>> >>> Taxonomy::NCBI -- Reads the taxonomic tree from flat NCBI files >>> (nodes.dmp and names.dmp) >>> (Similar to Bio::DB::Taxonomy::flatfile) >>> >>> Taxonomy::NCBI::Gi2taxid -- Converts very fast and efficiently >>> NCBI GIs to Taxids. >>> Uses a binary lookup table. >>> >>> These modules are being used by several groups now -- mainly working >>> with large metagenomics datasets -- and I am considering uploading >>> them to CPAN, but I am not clear on where these modules should be >>> placed there. >>> >>> How do you think I should name these modules? (e.g. where these >>> modules should live in CPAN?) Their natural place could be under >>> Bio::DB::Taxonomy, maybe Bio::DB::Taxonomy::Lite / >>> Bio::DB::Taxonomy::Lite::NCBI / etc...? Is this possible (and >>> convenient) without being part of Bioperl? Any other suggestions? >>> >>> Thank you very much in advance, >>> >>> M; >>> >>> ---------------------------------------------------- >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From David.Messina at sbc.su.se Fri Nov 5 06:36:38 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 5 Nov 2010 11:36:38 +0100 Subject: [Bioperl-l] Another Taxonomy modules to CPAN In-Reply-To: <4CD3CF02.9010608@uv.es> References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk> <4C860148.3030000@fmi.ch> <007501cb4e6d$9b2c3ac0$d184b040$@edu.hk> <4C8606FA.3000509@fmi.ch> <4CD12E99.7080701@uv.es> <40104040-BCBF-41D0-86EC-65741F1195A6@drycafe.net> <4CD3CF02.9010608@uv.es> Message-ID: <5E0858F2-3957-46E1-BCD7-0F75FF44DE7D@sbc.su.se> > Bio::Lite::Taxonomy::* > BioX::Taxonomy::* Either works for me. Thanks for asking, Miguel, and I'm looking forward to checking them out! Dave From kai.blin at biotech.uni-tuebingen.de Fri Nov 5 07:33:44 2010 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Fri, 05 Nov 2010 12:33:44 +0100 Subject: [Bioperl-l] Review request: Merged hmmer2/3 parser Message-ID: <4CD3EB98.3050508@biotech.uni-tuebingen.de> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi folks, https://github.com/kblin/bioperl-live/commits/hmmer_merged_parser contains my attempt to get a hmmer parser that is able to detect which hmmer version output it's given and which parser to call in turn. I chose to sneak in a Bio::SearchIO::hmmer class into the object hierarchy that the hmmer2 parser (renamed to Bio::SearchIO::hmmer2) and the hmmer3 parser subclass from. I'm not sure if this is the best way to go, though, and I'd like to get some feedback on the implementation. The last six patches on https://github.com/kblin/bioperl-live/commits/hmmer_merged_parser are the full changeset, the core of the change is in https://github.com/kblin/bioperl-live/commit/2d4816f1d67880c69948b2ce529fdeb52e0d843f Thanks in advance for your time, Kai - -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Institute for Microbiology and Infection Medicine Division of Microbiology/Biotechnology Eberhard-Karls-Universit?t T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Germany Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJM0+uBAAoJEKM5lwBiwTTP7+8IAN6ivTfTa2NNCCvL7ImXiXEU LF5BbYIEgtyu+7NZIvX0/eLrs9PBZ4BqOl+LqyiYP0GSOsgX1I7ndc2n67o06n+h z+HGcNUScDGxx1jGheC4nWWs3sfWrS4KvxMlM6XlXGF/ioai9Im9JLKidsqRXPr0 Sf1r3GHYeNQPZ3eGdXhb7mEv2Ps7OOWyGGfZqyokoaTLItiLPrhpNCVg6liz0lh1 Mb0s3vpCRgsQpWqBJuwkkWFmS5ReokFu9Bnf0xXjGbGFHDGkI9YLXRtcbwu5kaBq V52CS9SOuPJvDLPoCBzvDkEI6+tXeEjutoG52wQ8F0tgpbdrMlahlrze3gvCHJw= =IGwD -----END PGP SIGNATURE----- From David.Messina at sbc.su.se Fri Nov 5 08:35:32 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 5 Nov 2010 13:35:32 +0100 Subject: [Bioperl-l] Review request: Merged hmmer2/3 parser In-Reply-To: <4CD3EB98.3050508@biotech.uni-tuebingen.de> References: <4CD3EB98.3050508@biotech.uni-tuebingen.de> Message-ID: I may have missed some recent off-list discussion, but I thought we explicitly talked about *not* merging these since it will complicate future maintenance and Hmmer2 will be obsolete in the near future. Dave From kai.blin at biotech.uni-tuebingen.de Fri Nov 5 08:56:23 2010 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Fri, 05 Nov 2010 13:56:23 +0100 Subject: [Bioperl-l] Review request: Merged hmmer2/3 parser In-Reply-To: References: <4CD3EB98.3050508@biotech.uni-tuebingen.de> Message-ID: <4CD3FEF7.9010806@biotech.uni-tuebingen.de> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 2010-11-05 13:35, Dave Messina wrote: > I may have missed some recent off-list discussion, but I thought we > explicitly talked about *not* merging these since it will complicate > future maintenance and Hmmer2 will be obsolete in the near future. Yes, that's what I thought until about last week. Then I was bitten by the fact that Hmmer3 can't do global alignments, so you can't reliably use Hmmer3 to extract protein domains using the domain's motif. As this seems to be a common use case, I think we'll end up supporting hmmer2 for quite a while now. There's currently no plans by the Hmmer authors to add globlal and glocal (whole profile alignments against parts of the sequence) alignments to hmmer3. They suggested to keep using hmmer2 for now. For this reason I decided to try and merge the two parsers to a point where users don't have to care which version of hmmer their file is. Cheers, Kai - -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Institute for Microbiology and Infection Medicine Division of Microbiology/Biotechnology Eberhard-Karls-Universit?t T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Germany Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJM0/7uAAoJEKM5lwBiwTTPkMEH/0KYT7QpTt35YolKZJhaAw3X F9FvQvPhMc9CydFgXpvCwhaN9uNhezb+wBTbfKPjElvwnW6fLvuLt2nWm/8Syou8 2HjTF1/V5efe72J/GhLd1FlfWuBwnXZv+X1k2Qgmgxhol1QinP3LENgn6KybD7vs mibZBpyVKPjDZJMM6WVsLqG71MINEdEeZd1ziPkBtt2wHRE2k/H/IrotPsoJ6/6c DCeWqofsRFN1UfUWBkcGRo54Ixx4dRzi7R4lbKlGzfhDZcs0LbjAGmyfbmhPVbDR cM7iqvUaaMNxAotz6Cmf7LVd2tGYzSadXGRKz2hO3lTDu74CpJmksKI4gJegkts= =Ncfq -----END PGP SIGNATURE----- From cjfields at illinois.edu Fri Nov 5 10:17:03 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 5 Nov 2010 09:17:03 -0500 Subject: [Bioperl-l] Review request: Merged hmmer2/3 parser In-Reply-To: <4CD3FEF7.9010806@biotech.uni-tuebingen.de> References: <4CD3EB98.3050508@biotech.uni-tuebingen.de> <4CD3FEF7.9010806@biotech.uni-tuebingen.de> Message-ID: <283FB1D4-559F-4971-A743-CB24254C23DC@illinois.edu> On Nov 5, 2010, at 7:56 AM, Kai Blin wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 2010-11-05 13:35, Dave Messina wrote: >> I may have missed some recent off-list discussion, but I thought we >> explicitly talked about *not* merging these since it will complicate >> future maintenance and Hmmer2 will be obsolete in the near future. > > Yes, that's what I thought until about last week. Then I was bitten by > the fact that Hmmer3 can't do global alignments, so you can't reliably > use Hmmer3 to extract protein domains using the domain's motif. As this > seems to be a common use case, I think we'll end up supporting hmmer2 > for quite a while now. > > There's currently no plans by the Hmmer authors to add globlal and > glocal (whole profile alignments against parts of the sequence) > alignments to hmmer3. They suggested to keep using hmmer2 for now. > > For this reason I decided to try and merge the two parsers to a point > where users don't have to care which version of hmmer their file is. > > Cheers, > Kai Makes sense to me. We're tackling some Bio::Tree revisions next week, my guess is we can also get this added in. I'm really keen on getting 1.6.2 out sometime soon, before things get really crazy for me this spring. Key things to test: can specify the exact hmmer2/3 parser directly? my $in = Bio::SearchIO->new(-format => 'hmmer3', -file => 'foo') Can one explicitly specify the hmmer parser variant? my $in = Bio::SearchIO->new(-format => 'hmmer', -version => 3, -file => 'foo') This will more than likely pop up. I did something with Bio::SeqIO::fastq to allow specifying sanger/illumina/solexa variants, this would be similar in respect. chris From kai.blin at biotech.uni-tuebingen.de Fri Nov 5 10:53:24 2010 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Fri, 05 Nov 2010 15:53:24 +0100 Subject: [Bioperl-l] Review request: Merged hmmer2/3 parser In-Reply-To: <283FB1D4-559F-4971-A743-CB24254C23DC@illinois.edu> References: <4CD3EB98.3050508@biotech.uni-tuebingen.de> <4CD3FEF7.9010806@biotech.uni-tuebingen.de> <283FB1D4-559F-4971-A743-CB24254C23DC@illinois.edu> Message-ID: <4CD41A64.4070104@biotech.uni-tuebingen.de> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 2010-11-05 15:17, Chris Fields wrote: > Makes sense to me. We're tackling some Bio::Tree revisions next > week, my guess is we can also get this added in. I'm really keen on > getting 1.6.2 out sometime soon, before things get really crazy for > me this spring. I changed my mind about the hmmer parser just in time then, I guess. :) > Key things to test: can specify the exact hmmer2/3 parser directly? > > my $in = Bio::SearchIO->new(-format => 'hmmer3', -file => 'foo') That's how I started the development. But I just see I changed all tests over, so I can't prove it. I've added some tests to show this is equivalent. (907b7d0a2b93d4961f64) > Can one explicitly specify the hmmer parser variant? > > my $in = Bio::SearchIO->new(-format => 'hmmer', -version => 3, -file > => 'foo') I didn't think about this, but that certainly seems reasonable. If somebody specifies a format explicitly, we can short-circuit the other detection code. I've added code and a couple of tests to support this. See c9cd75df492dbef90fff. Thanks for the input, Kai - -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Institute for Microbiology and Infection Medicine Division of Microbiology/Biotechnology Eberhard-Karls-Universit?t T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Germany Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJM1BpdAAoJEKM5lwBiwTTPD2YH/Aig8kd+EV7CTC/88yx7QjBJ YLY7e4eKHpuAMPPaUXGt+aOvIKbpO4Hfajvze1BCNnpM415RFNjHRvlEwPfyCISN ma+LcBsosXmr2bJhkSSfx2Hgjv95wZG446nUtsaPdIxOWXMpujeYAN14uHihEek5 KDiXpOGYhDQt7o02wOIIxrDPH0gJ2HF+YWye1W5qqPRvHxKLA1gjXizv+MYvLD9/ GAWHzswddFQrBy6g++zj9hrJykteVoQCW2B6fBBr5o4BhthtIYwznIerqgoeDHAL m0yw/rIhMnVEBb2oLIVwIDRZ32MtCsIEamox9JIV6boppoEIdiacp8bHPv6ZXW4= =h/HN -----END PGP SIGNATURE----- From cjfields at illinois.edu Fri Nov 5 11:06:01 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 5 Nov 2010 10:06:01 -0500 Subject: [Bioperl-l] Blat wrapper rewrite (and a PSL 'bonus') Message-ID: <45EEAF86-33A9-4F5B-9958-85698B2353CF@illinois.edu> All, I have a local Blat wrapper refactoring that I had to quickly write up for some local work. Unfortunately, the original Blat wrapper in BioPerl-Run (Bio::Tools::Run::Alignment::Blat) had parameters that weren't compatible with more recent versions of Blat, and which didn't even use most passed parameters, so it was unfortunately pretty much useless in my hands (one couldn't designate whether the query/db was dna, prot, dnax, etc). Anyone object to just scrapping the original Blat wrapper in place of this one (which allows all current Blat parameters)? I plan on having it pass the original tests and adding a bit more. As a small semi-'bonus', along with this I am also writing up additional code for Bio::SearchIO::psl that allows switching parsing logic to cluster distinct regional hits; basically, same query = Result, each Hit = distinct region, each set of blocks = HSP. This is in response from some users here to making the bp_search2gff.pl script work with PSL a little better. Should be added in over the next few weeks. chris From cjfields at illinois.edu Fri Nov 5 11:43:13 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 5 Nov 2010 10:43:13 -0500 Subject: [Bioperl-l] Fwd: [Utilities-announce] NCBI E-Utility GEO Database name change References: Message-ID: <0831372A-B40C-4B62-B423-5BEB7B5D3543@illinois.edu> FYI, for those GEO users out there. -c Begin forwarded message: > From: > Date: November 5, 2010 10:39:27 AM CDT > To: NLM/NCBI List utilities-announce > Subject: [Utilities-announce] NCBI E-Utility GEO Database name change > Reply-To: utilities-announce at ncbi.nlm.nih.gov > > Dear E-Utility Users, > > Recently the name of the GEO Profiles database used within the E-utilities changed from 'geo' to 'geoprofiles'. While the old name (&db=geo) will still function, users are encouraged to change their requests to use the new name (db=geoprofiles). ELink users should be aware that all linknames including 'geo' will no longer function. Instead, these names should include 'geoprofiles' rather than 'geo'. For example, the linkname of links from Gene to GEO Profiles is now &linkname=gene_geoprofiles. > > Thank you. > > > > _______________________________________________ > Utilities-announce mailing list > http://www.ncbi.nlm.nih.gov/mailman/listinfo/utilities-announce From David.Messina at sbc.su.se Fri Nov 5 12:11:41 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 5 Nov 2010 17:11:41 +0100 Subject: [Bioperl-l] Blat wrapper rewrite (and a PSL 'bonus') In-Reply-To: <45EEAF86-33A9-4F5B-9958-85698B2353CF@illinois.edu> References: <45EEAF86-33A9-4F5B-9958-85698B2353CF@illinois.edu> Message-ID: On Nov 5, 2010, at 16:06, Chris Fields wrote: > Anyone object to just scrapping the original Blat wrapper in place of this one (which allows all current Blat parameters)? Sounds great! Go for it. Dave From njauxiongjie at gmail.com Fri Nov 5 05:12:33 2010 From: njauxiongjie at gmail.com (njauxiongjie) Date: Fri, 05 Nov 2010 02:12:33 -0700 (PDT) Subject: [Bioperl-l] question about Bio::Tools::Run::RemoteBlast Message-ID: <4cd3ca81.123f970a.39e2.ffff8e89@mx.google.com> Hi, I have used the scripts list in section of "SYNOPSIS" in th webpage http://search.cpan.org/~cjfields/BioPerl-1.6.1/Bio/Tools/Run/RemoteBlast.pm. there are two line of the code like below: my $filename = $result->query_name()."\.out"; $factory->save_output($filename); this saved the blast result in separated files by the query name. my question is: how can i save the blast results of all querys in one file? From alpita at uvigo.es Fri Nov 5 11:52:58 2010 From: alpita at uvigo.es (alpita at uvigo.es) Date: Fri, 05 Nov 2010 16:52:58 +0100 Subject: [Bioperl-l] Primer3 Message-ID: <20101105165258.88a1uizkeg4k8skg@correoweb.uvigo.es> Hi I am a PhD-student working with EST sequences. I have got an input file for Primer3 and the program does not run. I try directly in the windows console and with Bioperl. I suppose the problem is with the syntax, because I have never used these tools, so I would appreciate some help. How can I invoked my input file in primer 3? If it is easy to run it with BioPerl, tell me Thank you very much From cjfields at illinois.edu Fri Nov 5 13:27:13 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 5 Nov 2010 12:27:13 -0500 Subject: [Bioperl-l] Primer3 In-Reply-To: <20101105165258.88a1uizkeg4k8skg@correoweb.uvigo.es> References: <20101105165258.88a1uizkeg4k8skg@correoweb.uvigo.es> Message-ID: <8BBED864-BB9A-4FF7-9B09-2549D927F858@illinois.edu> This all depends on what version of Primer3 you are using; the Primer3 tools in bioperl-live and bioperl-run do not work with primer3 v2 due to significant changes in the primer3 API. However, I wrote up a full refactor of these tools, along with tests, here: https://github.com/cjfields/Bio-Tools-Primer3Redux This includes both the wrapper and the parser (see the test files for examples on how to run them). They have been renamed because of a change in the module API. Let me know if they work for you. chris On Nov 5, 2010, at 10:52 AM, alpita at uvigo.es wrote: > Hi > > I am a PhD-student working with EST sequences. I have got an input file for Primer3 and the program does not run. I try directly in the windows console and with Bioperl. I suppose the problem is with the syntax, because I have never used these tools, so I would appreciate some help. > > How can I invoked my input file in primer 3? If it is easy to run it with BioPerl, tell me > > Thank you very much > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Fri Nov 5 16:57:56 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 5 Nov 2010 15:57:56 -0500 Subject: [Bioperl-l] Blat wrapper rewrite (and a PSL 'bonus') In-Reply-To: References: <45EEAF86-33A9-4F5B-9958-85698B2353CF@illinois.edu> Message-ID: <374003E7-0F9F-466F-99D9-89F85F22CFF9@illinois.edu> On Nov 5, 2010, at 11:11 AM, Dave Messina wrote: > On Nov 5, 2010, at 16:06, Chris Fields wrote: > >> Anyone object to just scrapping the original Blat wrapper in place of this one (which allows all current Blat parameters)? > > > Sounds great! Go for it. > > > Dave Okay, now in bioperl-run, master branch. This passes all prior tests; I added a few more. chris From gabbyteku at gmail.com Fri Nov 5 17:17:55 2010 From: gabbyteku at gmail.com (gabriel teku) Date: Fri, 5 Nov 2010 23:17:55 +0200 Subject: [Bioperl-l] Bad request from efetch Bio::DB::EUtilities Message-ID: Hi I don't know what the problem is, but the efetch code snippet throws a Bad request error, while( my $id = <$in> ){ # $in, file contains list of uids chomp $id; my $eut_obj = Bio::DB::EUtilities->new( -eutil => 'efetch', -email => ' myemail at gmail.com', -db => 'geoprofile', -id => $id, ); open( my $tmpOut, '>' ,'doc' ) or die "Can't open doc: $!"; eval{ $eut_obj->get_Response(-cb => sub {my ($data) = @_; print $tmpOut $data ); }; ..... } What is wrong with the request? Thanks in advance From cselig01 at students.poly.edu Sat Nov 6 00:07:16 2010 From: cselig01 at students.poly.edu (Chet Seligman) Date: Sat, 6 Nov 2010 04:07:16 +0000 Subject: [Bioperl-l] Problem installing Bio::SeqIO, ::SearchIO, ; ; Graphics in perl 5.8 in windows 7 Message-ID: I'd like to use ppm or cpan. Does anyone know which repositories to use for the modules in the subject line? Chet Seligman From Kevin.M.Brown at asu.edu Mon Nov 8 10:35:56 2010 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Mon, 8 Nov 2010 08:35:56 -0700 Subject: [Bioperl-l] Problem installing Bio::SeqIO, ::SearchIO, ; ; Graphics in perl 5.8 in windows 7 In-Reply-To: References: Message-ID: <1A4207F8295607498283FE9E93B775B4072A9051@EX02.asurite.ad.asu.edu> Follow the directions at: http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows To get the main parts of BioPerl installed. Bio::Graphics is now a separate module http://search.cpan.org/~lds/Bio-Graphics-2.15/ > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Chet Seligman > Sent: Friday, November 05, 2010 9:07 PM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Problem installing Bio::SeqIO, > ::SearchIO, ; ; Graphics in perl 5.8 in windows 7 > > > I'd like to use ppm or cpan. > Does anyone know which repositories to use for the modules in > the subject line? > > Chet Seligman > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From dan.bolser at gmail.com Tue Nov 9 05:41:53 2010 From: dan.bolser at gmail.com (Dan Bolser) Date: Tue, 9 Nov 2010 10:41:53 +0000 Subject: [Bioperl-l] BP reports "All tests successful.", but some failed ... Message-ID: I see the following error in the output of ./Build test: ... t/LocalDB/SeqFeature_BDB.t ................... ok t/LocalDB/SeqFeature_mysql.t ................. 1/84 DBI connect('database=test','',...) failed: Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2) at Bio/DB/SeqFeature/Store/DBI/mysql.pm line 217 sh: -user: command not found t/LocalDB/SeqFeature_mysql.t ................. ok ... Related to a bug I filed a while back: http://bugzilla.open-bio.org/show_bug.cgi?id=2899 IIRC there are a few bugs in Bio::SeqFeature::Store that are hidden due to this initial failure. Anyway, the point is that later I see: All tests successful. Files=348, Tests=24136, 249 wallclock secs ( 8.99 usr 2.04 sys + 197.16 cusr 26.85 csys = 235.04 CPU) Result: PASS How come we get "All tests successful." when there has clearly been an error in "t/LocalDB/SeqFeature_mysql.t"? BioPerl version is git bcd3b66f0422493e5f1d7c05220a4e58cced6313 Cheers, Dan. # The place to be irc://irc.freenode.net/#BioPerl # Sometimes good irc://irc.perl.org/#gmod From cjfields at illinois.edu Tue Nov 9 07:58:40 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 9 Nov 2010 06:58:40 -0600 Subject: [Bioperl-l] bioperl-primer3 In-Reply-To: <20101109130552.5pq9y28dwcwkgsww@correoweb.uvigo.es> References: <20101109130552.5pq9y28dwcwkgsww@correoweb.uvigo.es> Message-ID: <532C3550-5AA1-4A47-9C8D-82899E78B7E5@illinois.edu> On Nov 9, 2010, at 6:05 AM, alpita at uvigo.es wrote: > Hi Chris > > As I wrote last day, I have never used bioperl and Primer3. All of this is completely new for me. It seems very interesting but I am in a student exchange, so I have got no-time to deep into it now (I will do this in the future, because I think it is very interesting to my work). > > I want to design some primers, so I have used the tools of Thomas Thiel (MISA and p3_in) to obtain the input file of primer3 software. The next step would be to invoked the primer3 in the windows console (directly with the primer3 software or with bioperl tools) but I always have got problems with error messages. I suppose that I am writting some errors. > > If you can, I would greatly appreciate you to tell me what tool to use and how to write it in the windows console. > > Thank you very much for you effort. I hope to be improving slowly in this task > Regards Please make responses to the main bioperl list. I rewrote the main set of tools for Primer3, which I believe i pointed out to you previously (unless I'm mistaken): https://github.com/cjfields/Bio-Tools-Primer3Redux I can't tell you what to use for Windows: I don't develop on that platform. Any patches are welcome in case you find problems there and fix them. chris From njauxiongjie at gmail.com Tue Nov 9 04:05:04 2010 From: njauxiongjie at gmail.com (njauxiongjie) Date: Tue, 09 Nov 2010 01:05:04 -0800 (PST) Subject: [Bioperl-l] question about Bio::Tools::Run::RemoteBlast Message-ID: <4cd90ec0.0c44970a.506d.2a81@mx.google.com> Hi, I have used the scripts list in section of "SYNOPSIS" in th webpage http://search.cpan.org/~cjfields/BioPerl-1.6.1/Bio/Tools/Run/RemoteBlast.pm. there are two line of the code like below: my $filename = $result->query_name()."\.out"; $factory->save_output($filename); this saved the blast result in separated files by the query name. my question is: how can i save the blast results of all querys in one file? From singh.amarv at epa.gov Tue Nov 9 13:38:18 2010 From: singh.amarv at epa.gov (Amar Singh) Date: Tue, 9 Nov 2010 18:38:18 +0000 (UTC) Subject: [Bioperl-l] parsing PubMed retrievals References: <36BB76FD-C321-4973-ADA6-80066433C0E6@ohsu.edu> Message-ID: Tom Keller ohsu.edu> writes: > > Greetings, > I'm getting the following error from the bioplerl 1.6.1 example > > $ perl biblio-eutils-example.pl > Can't call method "text" on an undefined value at /Library/Perl/5.10.0/Bio/DB/Biblio/eutils.pm line 378. > > Any suggestions for getting this to work? > > thanks, > > Tom > MMI DNA Services Core Facility > 503-494-2442 > kellert at ohsu.edu ohsu.edu> > Office: 6588 RJH (CROET/BasicScience) > Tom, I am also facing the same problem. This was working a month back and now when I wanted to test new thing it is giving me the same error. I think this is due to some changes in the way we access the data. Will keep updated if I find a solution and same way you too. Thanks, Amar From Russell.Smithies at agresearch.co.nz Tue Nov 9 15:08:23 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Wed, 10 Nov 2010 09:08:23 +1300 Subject: [Bioperl-l] question about Bio::Tools::Run::RemoteBlast In-Reply-To: <4cd90ec0.0c44970a.506d.2a81@mx.google.com> References: <4cd90ec0.0c44970a.506d.2a81@mx.google.com> Message-ID: <18DF7D20DFEC044098A1062202F5FFF3313BCA1138@exchsth.agresearch.co.nz> Not sure if this will work but have you tried using STDOUT as the filename then piping all the output to a single file? The other (much simpler) option is just concatenate all the outputs into a single file when it's finished. --Russell -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of njauxiongjie Sent: Tuesday, 9 November 2010 10:05 p.m. To: bioperl-l Subject: [Bioperl-l] question about Bio::Tools::Run::RemoteBlast Hi, I have used the scripts list in section of "SYNOPSIS" in th webpage http://search.cpan.org/~cjfields/BioPerl-1.6.1/Bio/Tools/Run/RemoteBlast.pm. there are two line of the code like below: my $filename = $result->query_name()."\.out"; $factory->save_output($filename); this saved the blast result in separated files by the query name. my question is: how can i save the blast results of all querys in one file? _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From florent.angly at gmail.com Tue Nov 9 19:54:58 2010 From: florent.angly at gmail.com (Florent Angly) Date: Wed, 10 Nov 2010 10:54:58 +1000 Subject: [Bioperl-l] question about Bio::Tools::Run::RemoteBlast In-Reply-To: <4cd90ec0.0c44970a.506d.2a81@mx.google.com> References: <4cd90ec0.0c44970a.506d.2a81@mx.google.com> Message-ID: <4CD9ED62.50102@gmail.com> Wouldn't using a single filename be the way to go? > my $filename = "blast_ressults.out"; > $factory->save_output($filename); Florent On 09/11/10 19:05, njauxiongjie wrote: > Hi, > > I have used the scripts list in section of "SYNOPSIS" in th webpage http://search.cpan.org/~cjfields/BioPerl-1.6.1/Bio/Tools/Run/RemoteBlast.pm. > > there are two line of the code like below: > my $filename = $result->query_name()."\.out"; > $factory->save_output($filename); > this saved the blast result in separated files by the query name. > > my question is: how can i save the blast results of all querys in one file? > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Nov 9 22:46:00 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 9 Nov 2010 21:46:00 -0600 Subject: [Bioperl-l] question about Bio::Tools::Run::RemoteBlast In-Reply-To: <4CD9ED62.50102@gmail.com> References: <4cd90ec0.0c44970a.506d.2a81@mx.google.com> <4CD9ED62.50102@gmail.com> Message-ID: <14F0D22D-6AEC-4453-B089-8BDF9E9A481E@illinois.edu> Only if the output appends to that file (I don't recall personally, so it's worth a try). chris On Nov 9, 2010, at 6:54 PM, Florent Angly wrote: > Wouldn't using a single filename be the way to go? >> my $filename = "blast_ressults.out"; >> $factory->save_output($filename); > Florent > > > On 09/11/10 19:05, njauxiongjie wrote: >> Hi, >> >> I have used the scripts list in section of "SYNOPSIS" in th webpage http://search.cpan.org/~cjfields/BioPerl-1.6.1/Bio/Tools/Run/RemoteBlast.pm. >> >> there are two line of the code like below: >> my $filename = $result->query_name()."\.out"; >> $factory->save_output($filename); >> this saved the blast result in separated files by the query name. >> >> my question is: how can i save the blast results of all querys in one file? >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From roy.chaudhuri at gmail.com Wed Nov 10 11:11:27 2010 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Wed, 10 Nov 2010 16:11:27 +0000 Subject: [Bioperl-l] Does $tree->remove_Node prune or splice? In-Reply-To: <927BEE54-ABC2-428A-AF42-DDEBDEABB878@tamu.edu> References: <927BEE54-ABC2-428A-AF42-DDEBDEABB878@tamu.edu> Message-ID: <4CDAC42F.9020708@gmail.com> Hi Jim, I don't think you ever got a reply to this. I think you want $tree->splice() rather than $tree->remove_Node(). Cheers, Roy. On 21/10/2010 18:52, Jim Hu wrote: > I'm trying to take an unreadable tree from PFAM and prune it down to > just show the paralogs from E. coli. > > When you run remove_Node on an internal node, is it supposed to > reconnect the ancestors to the descendants? This seems to be way to > aggressive in what it's removing. > > while( my $tree = $treeio->next_tree ) { for my $node ( > $tree->get_nodes ) { if ($node->id =~ m/_ECOLI/){ # leave this node > alone # print $node->id."\n"; }else{ if ($node->is_Leaf){ > $tree->remove_Node($node); }else{ my $num_children = > scalar($node->each_Descendent); if ($num_children == 1){ print > "removing ".$node->id."\n"; $tree->remove_Node($node); } } } } > $treeio_out->write_tree($tree); } > > The idea is that when I've pruned the non-ECOLI leaves, some internal > nodes will not be branches anymore. But at the end, all the ECOLI > nodes are gone, presumably because an ancestor got removed and the > whole branch was lost. Is there a way to remove just a node and > graft the tree back together? My plan, once I get that working is to > repeat the traversal until the output stops changing. > > Jim > > > ===================================== Jim Hu Associate Professor > Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. > College Station, TX 77843-2128 979-862-4054 > > > > _______________________________________________ Bioperl-l mailing > list Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From chandan.kr.singh at gmail.com Thu Nov 11 06:10:25 2010 From: chandan.kr.singh at gmail.com (CHANDAN SINGH) Date: Thu, 11 Nov 2010 16:40:25 +0530 Subject: [Bioperl-l] each_gene_symbol() is missing from Bio::Phenotype::OMIM::OMIMentry Message-ID: Hi Christian The implementation of each_gene_symbol() is missing from Bio::Phenotype::OMIM::OMIMentry even though the method has been referred in the module and also in OMIMparser. I'm sure it was just a overlook; in which case will you be implementing the method in the near future. Thanks Chandan From cjfields at illinois.edu Sun Nov 14 16:49:52 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 14 Nov 2010 15:49:52 -0600 Subject: [Bioperl-l] GitHub down Message-ID: Just a note that GitHub (and the main bioperl repo) appear to be down at the moment with a major disruption of service. If anyone absolutely needs code, we have a synced repo set up here: http://repo.or.cz/w/bioperl-live.git http://repo.or.cz/w/bioperl-db.git http://repo.or.cz/w/bioperl-run.git http://repo.or.cz/w/bioperl-network.git chris From maj at fortinbras.us Mon Nov 15 20:08:25 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 15 Nov 2010 20:08:25 -0500 Subject: [Bioperl-l] use Bio::Search::Tiling::MapTiling In-Reply-To: <72BADBE572A8CA4889991C87507FA1EE22DE278045@NIHMLBX07.nih.gov> References: <72BADBE572A8CA4889991C87507FA1EE22DE278045@NIHMLBX07.nih.gov> Message-ID: Hi Wei, You'll need to install the trunk code to access this method; the "release" is woefully out of date (not that I'm helping much on that these days). Use git to obtain the latests bioperl-live and bioperl-run from https://github.com/bioperl/, or the synchronized repos http://repo.or.cz/w/bioperl-live.git http://repo.or.cz/w/bioperl-run.git cheers, MAJ ----- Original Message ----- From: Shao, Wei (NIH/NCI) [C] To: Mark A Jensen Sent: Monday, November 15, 2010 10:47 AM Subject: use Bio::Search::Tiling::MapTiling Dear Dr. Jensen, I am looking for a bioperl module that can be used to concatenate HSP from a blast into a sequence. I found that your Bio::Search::Tiling::MapTilingcan do exactly that. I used your example script to test it. I got an error message:?Can't locate object method "get_tiled_alns" via package "Bio::Search::Tiling::MapTiling" at get_hsp.pl line 29, line 2630?.It seems that method ?get_tiled_alns? is not in the bioperl we installed. Is that possible? The script I used is this one: use Bio::SearchIO;use Bio::Search::Tiling::MapTiling # Note that to get one hit, the user first blasts# the set of contigs against single sequence, the reference sequence.# The result of this BLAST run is in 'contig_tile.bls' $blio = Bio::SearchIO->new( -file => 'contig_tile.bls');$result = $blio->next_result;$hit = $result->next_hit;$tiling = Bio::Search::Tiling::MapTiling->new($hit);@alns = $tiling->get_tiled_alns('query'); # here's the concatenation:$concat_seq_obj = $alns[0]->get_seq_by_id('query'); Best regards, Wei Shao, Ph.D. [Contractor] Bioinformatics Analyst IV Advanced Biomedical Computing Center/HIV Drug Resistance Program SAIC-Frederick, Inc. National Cancer Institute at Frederick P.O. Box B, Frederick, MD 21702 Phone: 301/846-6021 Fax: 301/846-6013 Email: shaow at mail.nih.gov NOTICE: This communication may contain privileged or other confidential information. If you are not the intended recipient, or believe that you have received this communication in error, please do not print, copy, retransmit, disseminate or otherwise use the information. Please indicate to the sender that you have received this email in error and delete the copy you received. From dimitark at bii.a-star.edu.sg Mon Nov 15 21:44:15 2010 From: dimitark at bii.a-star.edu.sg (Dimitar Kenanov) Date: Tue, 16 Nov 2010 10:44:15 +0800 Subject: [Bioperl-l] about tblastn and strand Message-ID: <4CE1EFFF.70304@bii.a-star.edu.sg> Hi guys, i have a simple question. How exactly is defined on which strand is located a certain HSP in tblastn? I know it is used the strand function but i would like to know how is implemented. I ask cos here i made a perl script using bioperl but other people want to implement it in Java. In the output of tblastn there is no strand string only frame+ or frame-. Do you just take the + or - to determine on which strand is the HSP? Thank you for your time and help. Dimitar From mmuratet at hudsonalpha.org Tue Nov 16 16:56:08 2010 From: mmuratet at hudsonalpha.org (Michael Muratet) Date: Tue, 16 Nov 2010 15:56:08 -0600 Subject: [Bioperl-l] Obtaining Refseq status with eutils Message-ID: Greetings I have been trying to get to the Refseq status info that shows up NCBI gene webpage. My latest attempt is: my @accs = qw(SAAV_0049); my $factory = Bio::DB::EUtilities->new(-eutil => 'esearch', -email => 'mmuratet at hudsonalpha.org ', -db => 'gene', -term => join(',', at accs)); my @uids = $factory->get_ids; $factory->reset_parameters(-eutil => 'elink', -dbfrom => 'gene', -db => 'nuccore', -linkname => 'gene_nuccore_refseqgene', -id => \@uids); $factory->get_Response(-file => 'temp.txt'); All I get back is the ID number I gave it. Also, $factory->next_DocSum returns nothing. A big part of the problem is that I don't know what field I'm looking for. I am also unsure that I am using the elink interface properly. Does anyone know how to get to the Refseq data for a gene? I am looking for evidence of expression of gene models--is there a better annotation to use? Thanks Mike Michael Muratet, Ph.D. Senior Scientist HudsonAlpha Institute for Biotechnology mmuratet at hudsonalpha.org (256) 327-0473 (p) (256) 327-0966 (f) Room 4005 601 Genome Way Huntsville, Alabama 35806 From pcantalupo at gmail.com Wed Nov 17 12:38:05 2010 From: pcantalupo at gmail.com (Paul Cantalupo) Date: Wed, 17 Nov 2010 12:38:05 -0500 Subject: [Bioperl-l] Added '.qual' suffix detection in _guess_format Message-ID: Hi, Below is a patch for allowing suffix detection of .qual files in Bio/SeqIO.pm --- Bio/SeqIO.pm | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/Bio/SeqIO.pm b/Bio/SeqIO.pm index fe49822..cf6d3a2 100644 --- a/Bio/SeqIO.pm +++ b/Bio/SeqIO.pm @@ -651,6 +651,7 @@ sub _guess_format { return 'phd' if /\.(phd|phred)$/i; return 'pir' if /\.pir$/i; return 'pln' if /\.pln$/i; + return 'qual' if /\.qual$/i; return 'raw' if /\.(txt)$/i; return 'scf' if /\.scf$/i; return 'swiss' if /\.(swiss|sp)$/i; -- 1.6.4.2 Paul Paul Cantalupo Research Specialist/Systems Programmer University of Pittsburgh Pittsburgh, PA 15260 From maj at fortinbras.us Wed Nov 17 17:59:25 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 17 Nov 2010 22:59:25 +0000 Subject: [Bioperl-l] Fwd: [Bioperl-guts-l] GO annotation Message-ID: redirect to list -----Original Message----- From: Christos Noutsos [mailto:cnoutsos at cshl.edu] Sent: Wednesday, November 17, 2010 05:35 PM To: bioperl-guts-l at lists.open-bio.org Subject: [Bioperl-guts-l] GO annotation Hi all, I have several clusters of coregulated genes from Arabidopsis and I would like to check their GO annotation? Is there a module in bioperl which I can use? Thank you for your help Christos _______________________________________________ Bioperl-guts-l mailing list Bioperl-guts-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l From lskatz at gatech.edu Wed Nov 17 22:20:03 2010 From: lskatz at gatech.edu (Lee Katz) Date: Wed, 17 Nov 2010 22:20:03 -0500 Subject: [Bioperl-l] Status of assembly modules Message-ID: I have read on the BioPerl site that a 454 ace is not standardized due to its coordinate system. How can I convert it to the standard ace file? When I run this code either by using contig or assembly objects, I get an error. Can't call method "get_consensus_sequence" on an undefined value at Bio/Assembly/IO/ace.pm line 280, line 93349. sub _newblerAceToAce($args){ my($self,$args)=@_; my $ace454=Bio::Assembly::IO->new(-file=>$$args{ace454Path},-format=>"ace",-variant=>'454'); my $ace=Bio::Assembly::IO->new(-file=>">$$args{acePath}",-format=>"ace"); #while(my $contig=$ace454->next_contig){ while(my $scaffold=$ace454->next_assembly){ print Dumper $scaffold; } return $$args{acePath}; } On Fri, Jun 18, 2010 at 5:59 AM, wrote: > Message: 13 > Date: Fri, 18 Jun 2010 15:39:39 +1000 > From: Florent Angly > Subject: [Bioperl-l] Status of assembly modules > To: Joshua Udall , bioperl-l List > > Message-ID: <4C1B069B.5020105 at gmail.com> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > Hi Joshua, > > Yes, there have been interesting improvements in the assembly BioPerl > module since v1.6.1. You can find these changes in the development > version of BioPerl at http://github.com/bioperl/. I'll take this > opportunity to introduce people who don't follow the commit messages to > the new features that have been introduced: > > First, there is support for more file formats from high-throughput > platforms, including those generated by de novo assembly and comparative > assembly tools, such as: > * Roche 454 GS Assembler, aka Newbler (the ACE-454 variant) > * Maq > * Sam > * Bowtie > There is support for running a lot more of these tools in Bioperl-run > Bio::Tools::Run : > * Roche 454 GS Assembler, aka Newbler > * Minimo > * Maq > * Samtools > * Bowtie > In terms of writing assembly file, I added the option to write ACE > files, which is quite useful because maybe assembly programs recognize > this format. So now you can read assemblies, modify them as you see fit > and exporting them to other programs by writing the modified assembly in > an ACE file. > The internals of the IO parsers have acquired some granularity as it is > now possible to read/write assembly files entirely, or one contig at a > time. This is terrific to reduce memory usage. > > That's about it... > > Regards, > > Florent > > PS/ Josh, you filed bug reports related to several of these issues > (http://bugzilla.open-bio.org/show_bug.cgi?id=2726, > http://bugzilla.open-bio.org/show_bug.cgi?id=2483). I am closing the > ones that were not closed yet and thank you for submitting patches. > > > On 18/06/10 14:00, Joshua Udall wrote: > > Florent - > > > > I didn't want to ask a direct question on-list to perhaps avoid > > confusion. Were you able to improve/submit a ContigIO to bioperl that > > works with one entry at a time (instead of slurping the entire ace > > file)? > > > > > -- Lee Katz http://leeskatz.com From rmb32 at cornell.edu Thu Nov 18 02:03:46 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Wed, 17 Nov 2010 23:03:46 -0800 Subject: [Bioperl-l] Added '.qual' suffix detection in _guess_format In-Reply-To: References: Message-ID: <4CE4CFD2.3000901@cornell.edu> Applied! Thanks Paul! Rob Paul Cantalupo wrote: > Hi, > > Below is a patch for allowing suffix detection of .qual files in Bio/SeqIO.pm > > > --- > Bio/SeqIO.pm | 1 + > 1 files changed, 1 insertions(+), 0 deletions(-) > > diff --git a/Bio/SeqIO.pm b/Bio/SeqIO.pm > index fe49822..cf6d3a2 100644 > --- a/Bio/SeqIO.pm > +++ b/Bio/SeqIO.pm > @@ -651,6 +651,7 @@ sub _guess_format { > return 'phd' if /\.(phd|phred)$/i; > return 'pir' if /\.pir$/i; > return 'pln' if /\.pln$/i; > + return 'qual' if /\.qual$/i; > return 'raw' if /\.(txt)$/i; > return 'scf' if /\.scf$/i; > return 'swiss' if /\.(swiss|sp)$/i; From pcantalupo at gmail.com Thu Nov 18 10:15:04 2010 From: pcantalupo at gmail.com (Paul Cantalupo) Date: Thu, 18 Nov 2010 10:15:04 -0500 Subject: [Bioperl-l] git patch for Bio/Seq/PrimaryQual.pm Message-ID: Hi, I'm resending this to the list (without an attachment) since it got flagged as having a suspicious header. I simply added "length" to the foreach loop in sub "to_string" of Bio/Seq/PrimaryQual.pm to match documentation of "to_string": --- Bio/Seq/PrimaryQual.pm | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/Bio/Seq/PrimaryQual.pm b/Bio/Seq/PrimaryQual.pm index 4935639..30383fc 100644 --- a/Bio/Seq/PrimaryQual.pm +++ b/Bio/Seq/PrimaryQual.pm @@ -461,7 +461,7 @@ sub qualat { sub to_string { my ($self,$out,$result) = shift; $out = "qual: ".join(',',@{$self->qual()}); - foreach (qw(display_id accession_number primary_id desc id)) { + foreach (qw(display_id accession_number primary_id desc id length)) { $result = $self->$_(); if (!$result) { $result = ""; } $out .= "$_: $result\n"; -- 1.6.4.2 Paul Cantalupo Research Specialist/Systems Programmer University of Pittsburgh Pittsburgh, PA 15260 From pcantalupo at gmail.com Wed Nov 17 10:55:52 2010 From: pcantalupo at gmail.com (Paul Cantalupo) Date: Wed, 17 Nov 2010 10:55:52 -0500 Subject: [Bioperl-l] git patch for Bio/Seq/PrimaryQual.pm Message-ID: Hi, I simply added "length" to the foreach loop in sub "to_string" of Bio/Seq/PrimaryQual.pm to match documentation of "to_string" Paul Paul Cantalupo Research Specialist/Systems Programmer University of Pittsburgh Pittsburgh, PA 15260 -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-Added-length-to-foreach-loop-to-match-documentation-.patch Type: application/octet-stream Size: 866 bytes Desc: not available URL: From twaddlac at gmail.com Fri Nov 19 15:03:14 2010 From: twaddlac at gmail.com (Alan Twaddle) Date: Fri, 19 Nov 2010 15:03:14 -0500 Subject: [Bioperl-l] bioperl-ext package installation error Message-ID: Hello all, I am trying to write a simple script to parse an ABI file into usable data which requires the staden io_lib of which I installed. However, am receiving an error message when trying to run my script that says the following: ------------- EXCEPTION: Bio::Root::SystemException ------------- MSG: Bio::SeqIO::staden::read is not available; make sure the bioperl-ext package has been installed successfully! STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:368 STACK: Bio::SeqIO::abi::_initialize /usr/local/share/perl/5.10.1/Bio/SeqIO/ abi.pm:100 STACK: Bio::SeqIO::new /usr/local/share/perl/5.10.1/Bio/SeqIO.pm:360 STACK: Bio::SeqIO::new /usr/local/share/perl/5.10.1/Bio/SeqIO.pm:390 STACK: AB1_parser.pl:10 I didn't receive any error messages when installing the staden package so I'm assuming that there's some path specified in some file that is pointing to the wrong place. I don't know if this is the case but if you have any suggestions about how to fix this I would greatly appreciate it! Thank you very much! -- Alan Twaddle, B.S. MUC class of 2010 From cjfields at illinois.edu Fri Nov 19 23:33:14 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 19 Nov 2010 22:33:14 -0600 Subject: [Bioperl-l] git patch for Bio/Seq/PrimaryQual.pm In-Reply-To: References: Message-ID: <644615D3-8F59-4E59-B7A1-6BCE6159E9C9@illinois.edu> Paul, Committed this to github master branch. Thanks for pointing that out. chris On Nov 17, 2010, at 9:55 AM, Paul Cantalupo wrote: > Hi, > > I simply added "length" to the foreach loop in sub "to_string" of > Bio/Seq/PrimaryQual.pm to match documentation of "to_string" > > Paul > > > Paul Cantalupo > Research Specialist/Systems Programmer > University of Pittsburgh > Pittsburgh, PA 15260 > <0001-Added-length-to-foreach-loop-to-match-documentation-.patch>_______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Fri Nov 19 23:50:17 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 19 Nov 2010 22:50:17 -0600 Subject: [Bioperl-l] bioperl-ext package installation error In-Reply-To: References: Message-ID: Alan, You need to install the bioperl-ext module Bio::SeqIO::staden::read, which uses XS bindings to io_lib, but there's a bit of a caveat: we're not really maintaining that code anymore. Saying that, it does work using io_lib version 1.12.2 (my local version). Here's the code, please make sure to note the README (specifically, the version of io_lib you will need and the problems you may run into): https://github.com/bioperl/bioperl-ext/tree/master/Bio/SeqIO/staden/ chris On Nov 19, 2010, at 2:03 PM, Alan Twaddle wrote: > Hello all, > > I am trying to write a simple script to parse an ABI file into usable > data which requires the staden io_lib of which I installed. However, am > receiving an error message when trying to run my script that says the > following: > ------------- EXCEPTION: Bio::Root::SystemException ------------- > MSG: Bio::SeqIO::staden::read is not available; make sure the bioperl-ext > package has been installed successfully! > STACK: Error::throw > STACK: Bio::Root::Root::throw > /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:368 > STACK: Bio::SeqIO::abi::_initialize /usr/local/share/perl/5.10.1/Bio/SeqIO/ > abi.pm:100 > STACK: Bio::SeqIO::new /usr/local/share/perl/5.10.1/Bio/SeqIO.pm:360 > STACK: Bio::SeqIO::new /usr/local/share/perl/5.10.1/Bio/SeqIO.pm:390 > STACK: AB1_parser.pl:10 > > I didn't receive any error messages when installing the staden package so > I'm assuming that there's some path specified in some file that is pointing > to the wrong place. I don't know if this is the case but if you have any > suggestions about how to fix this I would greatly appreciate it! Thank you > very much! > > -- > > Alan Twaddle, B.S. > MUC class of 2010 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jordi.durban at gmail.com Mon Nov 22 15:01:34 2010 From: jordi.durban at gmail.com (Jordi Durban) Date: Mon, 22 Nov 2010 21:01:34 +0100 Subject: [Bioperl-l] parse multi xml Message-ID: Hi all, I'm a newbie in the list although I've been using bioperl for 2 years. Now I have a problem with a XML file and I don't know how to parse it. That file has 795 xml top tags (thta's is ) because they resulted from Blast2go software the usage and I suppose the file is the outcome of multiple blast results concatenation. Well, I would like to split all 795 different xml chunks in 795 different files in order to parse them looking for the best hit. The problem appears using the blastxml parse (*Bio::SearchIO::blastxml) *because (and that's a personal opinion) there's another top tag not expected and I get a error message once the first blast result was parsed. How can I do that split function? I hope I was clear Thanks -- Jordi From biopython at maubp.freeserve.co.uk Mon Nov 22 15:23:29 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 22 Nov 2010 20:23:29 +0000 Subject: [Bioperl-l] parse multi xml In-Reply-To: References: Message-ID: On Mon, Nov 22, 2010 at 8:01 PM, Jordi Durban wrote: > Hi all, > I'm a newbie in the list although I've been using bioperl for 2 years. > Now I have a problem with a XML file and I don't know how to parse it. > That file has 795 xml top tags (thta's is ) because they > resulted from ?Blast2go software the usage and I suppose the file is the > outcome of multiple blast results concatenation. Such a file is NOT a valid XML file (but see below), you can't just concatenate XML files. I'm pretty sure people have posted scripts to fix such files on the blast2go mailing list. > Well, I would like to split all 795 different xml chunks in 795 different > files in order to parse them looking for the best hit. > The problem appears using the blastxml parse > (*Bio::SearchIO::blastxml) *because > (and that's a personal opinion) there's another top tag not expected > and I get a error message once the first blast result was parsed. > How can I do that split function? > I hope I was clear > Thanks Historically the NCBI standalone BLAST used to create these concatenated XML files when used on multiple queries. It has since been fixed, but perhaps BioPerl has code still in it to handle these legacy invalid XML files? My suggestion (until a BioPerl guru speaks up) would be to split the file into chunks (in memory) by looking for the string , and parsing each chunk individually. Each chunk should be a valid XML file on its own. Peter From lskatz at gatech.edu Mon Nov 22 15:18:10 2010 From: lskatz at gatech.edu (Lee Katz) Date: Mon, 22 Nov 2010 15:18:10 -0500 Subject: [Bioperl-l] Re(2): Status of assembly modules Message-ID: I figured it out (I haven't tested much though). To whoever works on Assembly::IO::ace.pm: I changed a regular expression on line 231 because the contig object was not initializing properly. For some reason the 454 ace file had adopted the reference assembly's ID and therefore there was a GI number followed by a pipe. The pipe was not captured with \w+. I think that the regex will be safe with \s(\S+)\s. if (/^CO\s(\S+)\s(\d+)\s(\d+)\s(\d+)\s(\w+)/xms) { # New contig starts! #if (/^CO\s(\w+)\s(\d+)\s(\d+)\s(\d+)\s(\w+)/xms) { # New contig starts! On Thu, Nov 18, 2010 at 12:04 PM, wrote: > Message: 3 > Date: Wed, 17 Nov 2010 22:20:03 -0500 > From: Lee Katz > Subject: Re: [Bioperl-l] Status of assembly modules > To: bioperl-l at lists.open-bio.org > Message-ID: > > Content-Type: text/plain; charset=UTF-8 > > I have read on the BioPerl site that a 454 ace is not standardized due to > its coordinate system. How can I convert it to the standard ace file? > > When I run this code either by using contig or assembly objects, I get an > error. > Can't call method "get_consensus_sequence" on an undefined value at > Bio/Assembly/IO/ace.pm line 280, line 93349. > > sub _newblerAceToAce($args){ > my($self,$args)=@_; > my > > $ace454=Bio::Assembly::IO->new(-file=>$args{ace454Path},-format=>"ace",-variant=>'454'); > my > $ace=Bio::Assembly::IO->new(-file=>">$args{acePath}",-format=>"ace"); > #while(my $contig=$ace454->next_contig){ > while(my $scaffold=$ace454->next_assembly){ > print Dumper $scaffold; > } > return $args{acePath}; > } > -- Lee Katz http://leeskatz.com From thomas.sharpton at gmail.com Mon Nov 22 17:28:52 2010 From: thomas.sharpton at gmail.com (Thomas Sharpton) Date: Mon, 22 Nov 2010 14:28:52 -0800 Subject: [Bioperl-l] bioperl-hmmer3 question In-Reply-To: References: Message-ID: <085040DC-A2D4-4BD3-8598-AC4D6945FE89@gmail.com> Hi Evan, Glad to hear this software is a promising solution to your needs. I just tried running the test script under Bio/t/SearchIO/hmmer3.t and everything passed. Can you verify that you are using an up-to-date version of bioperl-live? You can grab a snapshot from github if you haven't already: https://github.com/bioperl/bioperl-live Also, I'm cc'ing this to the bioperl list so that others may benefit from our discourse (and chime in - Kai Blin, who created the test tools, and others have made substantial contributions and improvements to the original code and they might have good suggestions). T On Nov 22, 2010, at 1:38 PM, Evan Staton wrote: > Hi Thomas, > > Thanks for providing the new bioperl methods for parsing hmmer3 > reports. Everything seems to be working nicely with the test data > and my own reports except for getting the $hsp->query_string and > $hsp->hit_string. I noticed that these HSP routines were not working > with the test script/data (for me anyway) so I thought I would ask > before spending days trying to solve this one. > > Thanks, > > Evan > > -- > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > S. Evan Staton > PhD Student - Burke Lab > University of Georgia > Department of Genetics > 3507 Miller Plant Sciences > Athens, GA 30602 From jordi.durban at gmail.com Mon Nov 22 18:34:03 2010 From: jordi.durban at gmail.com (Jordi Durban) Date: Tue, 23 Nov 2010 00:34:03 +0100 Subject: [Bioperl-l] parse multi xml In-Reply-To: References:

Message-ID: Thanks Peter. That's exactly what I was looking for but so far I've not been able to do that properly. Any ideas?? 2010/11/22 Peter > On Mon, Nov 22, 2010 at 8:01 PM, Jordi Durban > wrote: > > Hi all, > > I'm a newbie in the list although I've been using bioperl for 2 years. > > Now I have a problem with a XML file and I don't know how to parse it. > > That file has 795 xml top tags (thta's is ) because > they > > resulted from Blast2go software the usage and I suppose the file is the > > outcome of multiple blast results concatenation. > > Such a file is NOT a valid XML file (but see below), you can't just > concatenate XML files. I'm pretty sure people have posted scripts > to fix such files on the blast2go mailing list. > > > Well, I would like to split all 795 different xml chunks in 795 different > > files in order to parse them looking for the best hit. > > The problem appears using the blastxml parse > > (*Bio::SearchIO::blastxml) *because > > (and that's a personal opinion) there's another top tag not expected > > and I get a error message once the first blast result was parsed. > > How can I do that split function? > > I hope I was clear > > Thanks > > Historically the NCBI standalone BLAST used to create these > concatenated XML files when used on multiple queries. It has > since been fixed, but perhaps BioPerl has code still in it to > handle these legacy invalid XML files? > > My suggestion (until a BioPerl guru speaks up) would be to > split the file into chunks (in memory) by looking for the string > , and parsing each chunk individually. > Each chunk should be a valid XML file on its own. > > Peter > -- Jordi From cjfields at illinois.edu Mon Nov 22 19:27:43 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 22 Nov 2010 18:27:43 -0600 Subject: [Bioperl-l] parse multi xml In-Reply-To: References:

Message-ID: <348A592A-D634-4C1C-9F05-DB20FE87F61C@illinois.edu> On Nov 22, 2010, at 2:23 PM, Peter wrote: > On Mon, Nov 22, 2010 at 8:01 PM, Jordi Durban wrote: >> Hi all, >> I'm a newbie in the list although I've been using bioperl for 2 years. >> Now I have a problem with a XML file and I don't know how to parse it. >> That file has 795 xml top tags (thta's is ) because they >> resulted from Blast2go software the usage and I suppose the file is the >> outcome of multiple blast results concatenation. > > Such a file is NOT a valid XML file (but see below), you can't just > concatenate XML files. I'm pretty sure people have posted scripts > to fix such files on the blast2go mailing list. > >> Well, I would like to split all 795 different xml chunks in 795 different >> files in order to parse them looking for the best hit. >> The problem appears using the blastxml parse >> (*Bio::SearchIO::blastxml) *because >> (and that's a personal opinion) there's another top tag not expected >> and I get a error message once the first blast result was parsed. >> How can I do that split function? >> I hope I was clear >> Thanks > > Historically the NCBI standalone BLAST used to create these > concatenated XML files when used on multiple queries. It has > since been fixed, but perhaps BioPerl has code still in it to > handle these legacy invalid XML files? It does (last I looked). > My suggestion (until a BioPerl guru speaks up) would be to > split the file into chunks (in memory) by looking for the string > , and parsing each chunk individually. > Each chunk should be a valid XML file on its own. > > Peter Or, better yet, push the blast2go folks to create valid XML output or use an updated version of BLAST. This bug was fixed about 3 years ago. chris From statonse at uga.edu Mon Nov 22 18:41:26 2010 From: statonse at uga.edu (Evan Staton) Date: Mon, 22 Nov 2010 18:41:26 -0500 Subject: [Bioperl-l] bioperl-hmmer3 question In-Reply-To: <085040DC-A2D4-4BD3-8598-AC4D6945FE89@gmail.com> References: <085040DC-A2D4-4BD3-8598-AC4D6945FE89@gmail.com> Message-ID: Hi Thomas, Everything works with the latest release of bioperl-live. My bioperl was 1.6.1 and I was using bioperl-hmmer3 from github as a separate package (I did not realize it had been incorporated into the latest distribution). All the tests passed and all the routines worked before except for the $hsp->query_string and $hsp->hit_string, but that is resolved now. Thanks for the quick response, it saved me a lot of time. Evan On Mon, Nov 22, 2010 at 5:28 PM, Thomas Sharpton wrote: > Hi Evan, > > Glad to hear this software is a promising solution to your needs. I just > tried running the test script under Bio/t/SearchIO/hmmer3.t and everything > passed. Can you verify that you are using an up-to-date version of > bioperl-live? You can grab a snapshot from github if you haven't already: > > https://github.com/bioperl/bioperl-live > > Also, I'm cc'ing this to the bioperl list so that others may benefit from > our discourse (and chime in - Kai Blin, who created the test tools, and > others have made substantial contributions and improvements to the original > code and they might have good suggestions). > > T > > > > On Nov 22, 2010, at 1:38 PM, Evan Staton wrote: > > Hi Thomas, > > Thanks for providing the new bioperl methods for parsing hmmer3 reports. > Everything seems to be working nicely with the test data and my own reports > except for getting the $hsp->query_string and $hsp->hit_string. I noticed > that these HSP routines were not working with the test script/data (for me > anyway) so I thought I would ask before spending days trying to solve this > one. > > Thanks, > > Evan > > -- > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > S. Evan Staton > PhD Student - Burke Lab > University of Georgia > Department of Genetics > 3507 Miller Plant Sciences > Athens, GA 30602 > > > -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ S. Evan Staton PhD Student - Burke Lab University of Georgia Department of Genetics 3507 Miller Plant Sciences Athens, GA 30602 From kris.richardson at tufts.edu Tue Nov 23 12:10:56 2010 From: kris.richardson at tufts.edu (kris richardson) Date: Tue, 23 Nov 2010 12:10:56 -0500 Subject: [Bioperl-l] GeneMapper Message-ID: Dear Members, I have a list mRNA target positions, for various mRNAs and I also have the RefSeq accession NM_#s for their genes and mRNA (ex; gene: NM_198155 --- mRNA positions: NM_004649: 800-890). I would like to use this info to pull the chromosomal coordinates of NM_198155 corresponding to the resulting transcript positions NM:_004649:800-890. I am familiarizing myself with bioperl and GeneMapper seems like it can provide this info. However, the documentation on the module code is sparse, and there are no examples. I was wondering if anyone has experience with this and could point me in the right direction, or perhaps provide some example code? Regards, Kris From kai.blin at biotech.uni-tuebingen.de Wed Nov 24 01:24:22 2010 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Wed, 24 Nov 2010 07:24:22 +0100 Subject: [Bioperl-l] bioperl-hmmer3 question In-Reply-To: References: <085040DC-A2D4-4BD3-8598-AC4D6945FE89@gmail.com> Message-ID: <4CECAF96.7010706@biotech.uni-tuebingen.de> On 2010-11-23 00:41, Evan Staton wrote: Hi Evan, > Everything works with the latest release of bioperl-live. My bioperl was > 1.6.1 and I was using bioperl-hmmer3 from github as a separate package (I > did not realize it had been incorporated into the latest distribution). All > the tests passed and all the routines worked before except for the > $hsp->query_string and $hsp->hit_string, but that is resolved now. I recently fixed this in bioperl-live when I bumped into that feature missing. I didn't bother to update the bioperl-hmmer3 repository. Perhaps it should be deleted to avoid the confusion. Also, in bioperl-live, the Bio::SearchIO::hmmer parser now supports hmmer2 and hmmer3 parsing, but by all means go for the specific parser class if you know the version of hmmer result you're loading. :) On a side note, even though you're probably aware of this... HMMer3 only does local alignments, so it's not really good for domain extraction. It is very fast, but if your software uses a profile based search to extract specific amino acid motifs HMMer3 might let you down. Just as a caveat that has bitten me recently. Cheers, Kai -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Institute for Microbiology and Infection Medicine Division of Microbiology/Biotechnology Eberhard-Karls-University of T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Deutschland Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben From dimitark at bii.a-star.edu.sg Wed Nov 24 21:38:52 2010 From: dimitark at bii.a-star.edu.sg (Dimitar Kenanov) Date: Thu, 25 Nov 2010 10:38:52 +0800 Subject: [Bioperl-l] update: genpept module Message-ID: <4CEDCC3C.9070707@bii.a-star.edu.sg> Hi guys, just managed to get GenPept to give me the fasta in the full form. I modified GenPept in this way: sub new { my($class, @args) = @_; my $self = $class->SUPER::new(@args); my ($verbose,$format)=$self->_rearrange([qw(VERBOSE FORMAT)], at args); #DIMITAR $DEFAULTFORMAT='fasta' if (defined $format); #DIMITAR $self->request_format($self->default_format($format)); return $self; } Cheers Dimitar From cjfields at illinois.edu Wed Nov 24 22:24:40 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 24 Nov 2010 21:24:40 -0600 Subject: [Bioperl-l] update: genpept module In-Reply-To: <4CEDCC3C.9070707@bii.a-star.edu.sg> References: <4CEDCC3C.9070707@bii.a-star.edu.sg> Message-ID: <03227AD9-092D-43C4-B0CF-28A7D093B35B@illinois.edu> Dimitar, Missed your original post, but was it reported as a bug? chris On Nov 24, 2010, at 8:38 PM, Dimitar Kenanov wrote: > Hi guys, > just managed to get GenPept to give me the fasta in the full form. I modified GenPept in this way: > > sub new { > my($class, @args) = @_; > my $self = $class->SUPER::new(@args); > my ($verbose,$format)=$self->_rearrange([qw(VERBOSE FORMAT)], at args); #DIMITAR > $DEFAULTFORMAT='fasta' if (defined $format); #DIMITAR > $self->request_format($self->default_format($format)); > return $self; > } > > Cheers > Dimitar > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason at bioperl.org Wed Nov 24 22:57:11 2010 From: jason at bioperl.org (Jason Stajich) Date: Wed, 24 Nov 2010 19:57:11 -0800 Subject: [Bioperl-l] update: genpept module In-Reply-To: <03227AD9-092D-43C4-B0CF-28A7D093B35B@illinois.edu> References: <4CEDCC3C.9070707@bii.a-star.edu.sg> <03227AD9-092D-43C4-B0CF-28A7D093B35B@illinois.edu> Message-ID: <4CEDDE97.5020406@bioperl.org> This doesn't seem like the right fix -- you don't want to updated a default variable -- why didn't you just use it like this: ? my $db = new Bio::DB::GenPept->new(-format => 'fasta'); Chris Fields wrote, On 11/24/10 7:24 PM: > Dimitar, > > Missed your original post, but was it reported as a bug? > > chris > > On Nov 24, 2010, at 8:38 PM, Dimitar Kenanov wrote: > >> Hi guys, >> just managed to get GenPept to give me the fasta in the full form. I modified GenPept in this way: >> >> sub new { >> my($class, @args) = @_; >> my $self = $class->SUPER::new(@args); >> my ($verbose,$format)=$self->_rearrange([qw(VERBOSE FORMAT)], at args); #DIMITAR >> $DEFAULTFORMAT='fasta' if (defined $format); #DIMITAR >> $self->request_format($self->default_format($format)); >> return $self; >> } >> >> Cheers >> Dimitar >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org From dimitark at bii.a-star.edu.sg Thu Nov 25 00:34:39 2010 From: dimitark at bii.a-star.edu.sg (Dimitar Kenanov) Date: Thu, 25 Nov 2010 13:34:39 +0800 Subject: [Bioperl-l] update: genpept module In-Reply-To: <4CEDDE97.5020406@bioperl.org> References: <4CEDCC3C.9070707@bii.a-star.edu.sg> <03227AD9-092D-43C4-B0CF-28A7D093B35B@illinois.edu> <4CEDDE97.5020406@bioperl.org> Message-ID: <4CEDF56F.3020409@bii.a-star.edu.sg> I used it like this in the script: if( $options{'-db'} eq 'protein' ) { ### DIMITAR ### if( $retformat eq 'fasta'){ $dbh = Bio::DB::GenPept->new(-verbose => $debug, -format => 'Fasta'); ### END DIMITAR ### }else{ $dbh = Bio::DB::GenPept->new(-verbose => $debug); } } i added $retformat to getoptlong as well. but that didnt work well as did for GenBank. May be GenBank and GenPept should be made to work in the same way. Now they are a bit different. I tried to see why they react different but i couldnt understand why. now my GenPept new method looks like this: sub new { my($class, @args) = @_; my $self = $class->SUPER::new(@args); my ($verbose,$format)=$self->_rearrange([qw(VERBOSE FORMAT)], at args);#dimitar $DEFAULTFORMAT=$format if (defined $format);#dimitar $self->request_format($self->default_format($format)); return $self; } this way i can get sequences in any format which Eutils is allowing. On 11/25/2010 11:57 AM, Jason Stajich wrote: > This doesn't seem like the right fix -- you don't want to updated a > default variable -- why didn't you just use it like this: ? > > my $db = new Bio::DB::GenPept->new(-format => 'fasta'); > > > > Chris Fields wrote, On 11/24/10 7:24 PM: >> Dimitar, >> >> Missed your original post, but was it reported as a bug? >> >> chris >> >> On Nov 24, 2010, at 8:38 PM, Dimitar Kenanov wrote: >> >>> Hi guys, >>> just managed to get GenPept to give me the fasta in the full form. I >>> modified GenPept in this way: >>> >>> sub new { >>> my($class, @args) = @_; >>> my $self = $class->SUPER::new(@args); >>> my ($verbose,$format)=$self->_rearrange([qw(VERBOSE >>> FORMAT)], at args); #DIMITAR >>> $DEFAULTFORMAT='fasta' if (defined $format); #DIMITAR >>> $self->request_format($self->default_format($format)); >>> return $self; >>> } >>> >>> Cheers >>> Dimitar >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > On 11/25/2010 11:57 AM, Jason Stajich wrote: > This doesn't seem like the right fix -- you don't want to updated a > default variable -- why didn't you just use it like this: ? > > my $db = new Bio::DB::GenPept->new(-format => 'fasta'); > > > > Chris Fields wrote, On 11/24/10 7:24 PM: >> Dimitar, >> >> Missed your original post, but was it reported as a bug? >> >> chris >> >> On Nov 24, 2010, at 8:38 PM, Dimitar Kenanov wrote: >> >>> Hi guys, >>> just managed to get GenPept to give me the fasta in the full form. I >>> modified GenPept in this way: >>> >>> sub new { >>> my($class, @args) = @_; >>> my $self = $class->SUPER::new(@args); >>> my ($verbose,$format)=$self->_rearrange([qw(VERBOSE >>> FORMAT)], at args); #DIMITAR >>> $DEFAULTFORMAT='fasta' if (defined $format); #DIMITAR >>> $self->request_format($self->default_format($format)); >>> return $self; >>> } >>> >>> Cheers >>> Dimitar >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From dimitark at bii.a-star.edu.sg Thu Nov 25 02:05:36 2010 From: dimitark at bii.a-star.edu.sg (Dimitar Kenanov) Date: Thu, 25 Nov 2010 15:05:36 +0800 Subject: [Bioperl-l] about NCBI seq reports Message-ID: <4CEE0AC0.6050104@bii.a-star.edu.sg> Hi again, now i can get the sequences in full fasta as NCBI provides them. But can I also get and the GenPept reports in some way? I tried using 'download_query_genbank.pl' but i always get some fasta seq but not the GenPept report even though the generated by the script request is valid. When i put it in browser i get the report file but the script itself returns fasta. I think is limitation of the modules which the scripts uses, right? Or am i mistaken. Cheers Dimitar From Marc.Perry at oicr.on.ca Fri Nov 26 23:53:48 2010 From: Marc.Perry at oicr.on.ca (Marc Perry) Date: Fri, 26 Nov 2010 23:53:48 -0500 Subject: [Bioperl-l] Bio::Restriction::Analysis.pm; 'sizes' method is broken Message-ID: Hi, I was following along in the Bioperl tutorial, and in the POD for this module and I discovered that the 'sizes' method was not working as advertised (in my hands). For the input file I was using the complete genome of bacteriophage lambda as a plain vanilla fasta file. Here is the script I used: =+=+=+= #!/usr/bin/perl use strict; use warnings; use Bio::SeqIO; use Bio::Restriction::Analysis; use Bio::Restriction::Enzyme; use Bio::Restriction::EnzymeCollection; my $input = shift or die "No fasta file on command line."; my $seqio_obj = Bio::SeqIO->new(-file => $input, -format => "fasta" ); my $seq = $seqio_obj->next_seq; my $all_enz = Bio::Restriction::EnzymeCollection->new(); my $h3 = $all_enz->get_enzyme('HindIII'); my $ra = Bio::Restriction::Analysis->new(-seq => $seq); print join "\n", $ra->sizes($h3), "\n\n"; print join "\n", $ra->sizes($h3, 0, 1), "\n\n"; print join "\n", $ra->sizes($h3, 1, 1), "\n\n"; exit; =+=+=+= And here is the output: 48502 48502 48.502 As I recall, there are six HindIII sites in lambda, which should yield 7 fragments from a linear genome. Here is the subroutine code from github: sub sizes { my ($self, $enz, $kb, $sort) = @_; $self->throw('no enzyme selected to get fragments for') unless $enz; $self->cut unless $self->{'_cut'}; my @frag; my $lastsite=0; foreach my $site (@{$self->{'_cut_positions'}->{$enz}}) { # BUG $kb ? push (@frag, (int($site-($lastsite))/100)/10) : push (@frag, $site-($lastsite)); $lastsite=$site; } $kb ? push (@frag, (int($self->{'_seq'}->length-($lastsite))/100)/10) : push (@frag, $self->{'_seq'}->length-($lastsite)); if ($self->{'_seq'}->is_circular) { my $first=shift @frag; my $last=pop @frag; push @frag, ($first+$last); } $sort ? @frag = sort {$b <=> $a} @frag : 1; return @frag; } I eventually tracked the bug down to the indicated line. As written, we are feeding in the enzyme object as a hash key instead of the enzyme's name, and everybody is unhappy. Here is the solution that I hacked out: my $name = $enz->{_seq}->{display_id}; foreach my $site (@{$self->{'_cut_positions'}->{$name}}) { And now my script yields this output: 23130 2027 2322 9416 564 6682 4361 23130 9416 6682 4361 2322 2027 564 23.13 9.416 6.682 4.361 2.322 2.027 0.564 which makes me happier. Oh, the example shown in the POD is also incorrect; using it like this throws an exception: You should be able to do these: # to see all the fragment sizes, print join "\n", @{$re->sizes($enz)}, "\n"; # to see all the fragment sizes sorted print join "\n", @{$re->sizes($enz, 0, 1)}, "\n"; # to see all the fragment sizes in kb sorted print join "\n", @{$re->sizes($enz, 1, 1)}, "\n"; --Marc Marc Perry Scientific Associate Ontario Institute for Cancer Research MaRS Centre, South Tower 101 College Street, Suite 800 Toronto, Ontario, Canada M5G 0A3 Tel: 416-673-8593 Toll-free: 1-866-678-6427 Cell: 416-904-8037 www.oicr.on.ca This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all From cjfields at illinois.edu Sat Nov 27 00:15:46 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 26 Nov 2010 23:15:46 -0600 Subject: [Bioperl-l] Bio::Restriction::Analysis.pm; 'sizes' method is broken In-Reply-To: References: Message-ID: <66A351B0-D483-490A-90CC-BF7C596D6505@illinois.edu> Marc, I've entered this into bugzilla for tracking (the best place for this, BTW). I can't promise when we'll get to it, but since there is a proposed fix it may be handled fairly soon. Thanks for pointing this out! chris On Nov 26, 2010, at 10:53 PM, Marc Perry wrote: > Hi, > > I was following along in the Bioperl tutorial, and in the POD for this module and I discovered that the 'sizes' method was not working as advertised (in my hands). For the input file I was using the complete genome of bacteriophage lambda as a plain vanilla fasta file. Here is the script I used: > > =+=+=+= > #!/usr/bin/perl > > use strict; > use warnings; > use Bio::SeqIO; > use Bio::Restriction::Analysis; > use Bio::Restriction::Enzyme; > use Bio::Restriction::EnzymeCollection; > > my $input = shift or die "No fasta file on command line."; > > my $seqio_obj = Bio::SeqIO->new(-file => $input, > -format => "fasta" ); > > my $seq = $seqio_obj->next_seq; > > my $all_enz = Bio::Restriction::EnzymeCollection->new(); > > my $h3 = $all_enz->get_enzyme('HindIII'); > > my $ra = Bio::Restriction::Analysis->new(-seq => $seq); > > print join "\n", $ra->sizes($h3), "\n\n"; > print join "\n", $ra->sizes($h3, 0, 1), "\n\n"; > print join "\n", $ra->sizes($h3, 1, 1), "\n\n"; > > exit; > =+=+=+= > > And here is the output: > > 48502 > > > 48502 > > > 48.502 > > As I recall, there are six HindIII sites in lambda, which should yield 7 fragments from a linear genome. Here is the subroutine code from github: > > > sub sizes { > > my ($self, $enz, $kb, $sort) = @_; > > $self->throw('no enzyme selected to get fragments for') > > unless $enz; > > $self->cut unless $self->{'_cut'}; > > my @frag; my $lastsite=0; > > foreach my $site (@{$self->{'_cut_positions'}->{$enz}}) { # BUG > > $kb ? push (@frag, (int($site-($lastsite))/100)/10) > > : push (@frag, $site-($lastsite)); > > $lastsite=$site; > > } > > $kb ? push (@frag, (int($self->{'_seq'}->length-($lastsite))/100)/10) > > : push (@frag, $self->{'_seq'}->length-($lastsite)); > > if ($self->{'_seq'}->is_circular) { > > my $first=shift @frag; > > my $last=pop @frag; > > push @frag, ($first+$last); > > } > > $sort ? @frag = sort {$b <=> $a} @frag : 1; > > > > return @frag; > > } > > > I eventually tracked the bug down to the indicated line. As written, we are feeding in the enzyme object as a hash key instead of the enzyme's name, and everybody is unhappy. Here is the solution that I hacked out: > > my $name = $enz->{_seq}->{display_id}; > foreach my $site (@{$self->{'_cut_positions'}->{$name}}) { > > And now my script yields this output: > > 23130 > 2027 > 2322 > 9416 > 564 > 6682 > 4361 > > > 23130 > 9416 > 6682 > 4361 > 2322 > 2027 > 564 > > > 23.13 > 9.416 > 6.682 > 4.361 > 2.322 > 2.027 > 0.564 > > which makes me happier. Oh, the example shown in the POD is also incorrect; using it like this throws an exception: > > > You should be able to do these: > > > > # to see all the fragment sizes, > > print join "\n", @{$re->sizes($enz)}, "\n"; > > # to see all the fragment sizes sorted > > print join "\n", @{$re->sizes($enz, 0, 1)}, "\n"; > > # to see all the fragment sizes in kb sorted > > print join "\n", @{$re->sizes($enz, 1, 1)}, "\n"; > > --Marc > > Marc Perry > Scientific Associate > > Ontario Institute for Cancer Research > MaRS Centre, South Tower > 101 College Street, Suite 800 > Toronto, Ontario, Canada M5G 0A3 > > Tel: 416-673-8593 > Toll-free: 1-866-678-6427 > Cell: 416-904-8037 > www.oicr.on.ca > > This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From dimitark at bii.a-star.edu.sg Mon Nov 29 04:35:26 2010 From: dimitark at bii.a-star.edu.sg (Dimitar Kenanov) Date: Mon, 29 Nov 2010 17:35:26 +0800 Subject: [Bioperl-l] genbank Message-ID: <4CF373DE.4070902@bii.a-star.edu.sg> Hi again, it seems that when i download (with 'download_query_genbank.pl') the whole proteome from NCBI in fasta format it is first being downloaded and from it is being created some kind of SeqFastaSpeedFactory and after that from it is being copied to the output file. But i want to download and write to output file one by one so i can see the download progress(which is working for genbank data). Its frustrating :) Any ideas where to look for solution Cheers Dimitar From cjfields at illinois.edu Mon Nov 29 09:39:16 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 29 Nov 2010 08:39:16 -0600 Subject: [Bioperl-l] genbank In-Reply-To: <4CF373DE.4070902@bii.a-star.edu.sg> References: <4CF373DE.4070902@bii.a-star.edu.sg> Message-ID: On Nov 29, 2010, at 3:35 AM, Dimitar Kenanov wrote: > Hi again, > it seems that when i download (with 'download_query_genbank.pl') the whole proteome from NCBI in fasta format it is first being downloaded and from it is being created some kind of SeqFastaSpeedFactory and after that from it is being copied to the output file. But i want to download and write to output file one by one so i can see the download progress(which is working for genbank data). > > Its frustrating :) > > Any ideas where to look for solution > Cheers > Dimitar You can't do this with the default script, but you can use a modified version and, where you are retrieving a sequence stream, in the last four lines: my $stream = $dbh->get_Stream_by_query($query); while( my $seq = $stream->next_seq ) { $out->write_seq($seq); } insert an iterator in the loop that indicates progress. Realize the sequence data is processed through Bio::SeqIO, so it won't be exactly the same as what is retrieved from GenBank, but it should be very close. If you want raw sequence, you can use Bio::DB::EUtilities, but it's a bit more complicated. chris From dimitark at bii.a-star.edu.sg Wed Nov 24 21:20:28 2010 From: dimitark at bii.a-star.edu.sg (Dimitar Kenanov) Date: Thu, 25 Nov 2010 10:20:28 +0800 Subject: [Bioperl-l] question about GenPept.pm Message-ID: <4CEDC7EC.1000903@bii.a-star.edu.sg> Hi guys, i want to get some genomes and proteomes from NCBI in fasta format. I found i have to use 'download_query_genbank.pl' for that. It works but not as i would like. It uses the modules GenPept and GenBank. They retrieve the data in fasta but in different format than i want. Example: a) i want the fasta to be like the following: >gi|5834889|ref|NP_006959.1|COX3_10021 cytochrome c oxidase subunit III [Caenorhabditis elegans] here sequense... b) but it comes like this: >COX3_10021 cytochrome c oxidase subunit III [Caenorhabditis elegans] here sequense... But i need the gi and NP as well. So i dug up a bit and after playing with 'download_query_genbank.pl' i managed to make GenBank to give the fasta seqs in the format i want. I made the following changes: 1. added $retformat option for Getopt 2.modified this section: if( $options{'-db'} eq 'protein' ) { ### DIMITAR ### if( $retformat eq 'fasta'){ $dbh = Bio::DB::GenPept->new(-verbose => $debug, -format => 'Fasta'); ### END DIMITAR ### }else{ $dbh = Bio::DB::GenPept->new(-verbose => $debug); } } else { ### DIMITAR ### if( $retformat eq 'fasta'){ $dbh = Bio::DB::GenBank->new(-verbose => $debug, -format => 'Fasta'); ### END DIMITAR ### }else{ $dbh = Bio::DB::GenBank->new(-verbose => $debug); } } But i go problem with GenPept. I still cant get the seqs in full fasta format as i explained above. Its interesting cos both modules GenPept and GenBank are almost identical except that GenBank uses the new method of NCBIHelper while GenPept has its own which still uses the NCBIHelper's as well. With my modification i pass the format i want but then somehow it reverts to the default set in GenPept which is 'gp' while i need it to be 'fasta'. If i change the defaultformat in GenPept to fasta it works but thats just doing the job without adding the needed flexibility. Any help would be appreciated. I will try to find solution as well. Cheers PS: i attache the modified 'download_query_genbank.pl' -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: download_query_genbank.pl URL: From dimitark at bii.a-star.edu.sg Mon Nov 29 02:17:50 2010 From: dimitark at bii.a-star.edu.sg (Dimitar Kenanov) Date: Mon, 29 Nov 2010 15:17:50 +0800 Subject: [Bioperl-l] about genpept and genbank Message-ID: <4CF3539E.8030002@bii.a-star.edu.sg> hi guys, i dug up a bit more and found a solution. i think the problem is that in 'download_query_genbank.pl' there is the 'format' option which only is responsible for the format of the output file to which seqs are downloaded. While i need 'ret_type' format option according to NCBI Eutils. So now i found that i can use GenBank.pm for all DBs in NCBI. I restored the GenPept as it was in original but i do not use it, its more tricky than GenBank.pm. Now modified 'download_query_genbank.pl' to use only GenBank and added one more option 'retformat' which can be set according to NCBI 'rettype'. Also added a simple progress bar to follow the download. Unfortunately it is working only if im downloading genbank data(Bio::SeqIO::genbank) when i download fasta(Bio::SeqIO::fasta) i cant use it. Seems that the fasta stream is a big chunk or somewhere the flush is set to zero. I dont know yet, still trying to figgure that out. The modified script is attached. Cheers Dimitar -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: download_query_genbank.pl URL: From lsbrath at gmail.com Mon Nov 29 16:22:09 2010 From: lsbrath at gmail.com (Mgavi Brathwaite) Date: Mon, 29 Nov 2010 16:22:09 -0500 Subject: [Bioperl-l] question about Bio::Tools::Run::RemoteBlast In-Reply-To: <14F0D22D-6AEC-4453-B089-8BDF9E9A481E@illinois.edu> References: <4cd90ec0.0c44970a.506d.2a81@mx.google.com> <4CD9ED62.50102@gmail.com> <14F0D22D-6AEC-4453-B089-8BDF9E9A481E@illinois.edu> Message-ID: Hello, When I run Bio::Tools::Run::RemoteBlast I get an error when I enter the line use Bio::SeqIO that says *can't locate Bio/SeqIO.pm in @INC*. I downloaded the package but can't figure out what the deal is. Any suggestions. LomSpace On Tue, Nov 9, 2010 at 10:46 PM, Chris Fields wrote: > Only if the output appends to that file (I don't recall personally, so it's > worth a try). > > chris > > On Nov 9, 2010, at 6:54 PM, Florent Angly wrote: > > > Wouldn't using a single filename be the way to go? > >> my $filename = "blast_ressults.out"; > >> $factory->save_output($filename); > > Florent > > > > > > On 09/11/10 19:05, njauxiongjie wrote: > >> Hi, > >> > >> I have used the scripts list in section of "SYNOPSIS" in th webpage > http://search.cpan.org/~cjfields/BioPerl-1.6.1/Bio/Tools/Run/RemoteBlast.pm > . > >> > >> there are two line of the code like below: > >> my $filename = $result->query_name()."\.out"; > >> $factory->save_output($filename); > >> this saved the blast result in separated files by the query name. > >> > >> my question is: how can i save the blast results of all querys in one > file? > >> > >> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Mon Nov 29 16:40:08 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 29 Nov 2010 15:40:08 -0600 Subject: [Bioperl-l] question about Bio::Tools::Run::RemoteBlast In-Reply-To: References: <4cd90ec0.0c44970a.506d.2a81@mx.google.com> <4CD9ED62.50102@gmail.com> <14F0D22D-6AEC-4453-B089-8BDF9E9A481E@illinois.edu> Message-ID: You haven't installed BioPerl (you need more than Bio::Tools::Run::RemoteBlast). Please see bioperl.org for installation instructions. chris On Nov 29, 2010, at 3:22 PM, Mgavi Brathwaite wrote: > Hello, > > When I run > Bio::Tools::Run::RemoteBlast I get an error when I enter the line use > Bio::SeqIO that says *can't locate Bio/SeqIO.pm in @INC*. I downloaded the > package but can't figure out what the deal is. Any suggestions. > > LomSpace > > On Tue, Nov 9, 2010 at 10:46 PM, Chris Fields wrote: > >> Only if the output appends to that file (I don't recall personally, so it's >> worth a try). >> >> chris >> >> On Nov 9, 2010, at 6:54 PM, Florent Angly wrote: >> >>> Wouldn't using a single filename be the way to go? >>>> my $filename = "blast_ressults.out"; >>>> $factory->save_output($filename); >>> Florent >>> >>> >>> On 09/11/10 19:05, njauxiongjie wrote: >>>> Hi, >>>> >>>> I have used the scripts list in section of "SYNOPSIS" in th webpage >> http://search.cpan.org/~cjfields/BioPerl-1.6.1/Bio/Tools/Run/RemoteBlast.pm >> . >>>> >>>> there are two line of the code like below: >>>> my $filename = $result->query_name()."\.out"; >>>> $factory->save_output($filename); >>>> this saved the blast result in separated files by the query name. >>>> >>>> my question is: how can i save the blast results of all querys in one >> file? >>>> >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason at bioperl.org Mon Nov 29 21:19:56 2010 From: jason at bioperl.org (Jason Stajich) Date: Mon, 29 Nov 2010 18:19:56 -0800 Subject: [Bioperl-l] genbank In-Reply-To: References: <4CF373DE.4070902@bii.a-star.edu.sg> Message-ID: <4CF45F4C.7050102@bioperl.org> Dimitar - In terms of your question - a GenBank db query previously (ie 4-5 years ago when this was written) WOULD NOT return a sequence if a GenPept ID was specified so we had to have a separate module for GenBank and GenPept db querying since there was a different set of parameters - I think that changed so that most of the queries can run through GenBank I see that must have been improved at NCBI. For the record if you want the full GenPept record with features and annotations you just request a different db, in this case 'gb' for genbank instead of the fasta source As in: http://gist.github.com/721012 But maybe you already figured it out? -jason Chris Fields wrote, On 11/29/10 6:39 AM: > On Nov 29, 2010, at 3:35 AM, Dimitar Kenanov wrote: > >> Hi again, >> it seems that when i download (with 'download_query_genbank.pl') the whole proteome from NCBI in fasta format it is first being downloaded and from it is being created some kind of SeqFastaSpeedFactory and after that from it is being copied to the output file. But i want to download and write to output file one by one so i can see the download progress(which is working for genbank data). >> >> Its frustrating :) >> >> Any ideas where to look for solution >> Cheers >> Dimitar > > You can't do this with the default script, but you can use a modified version and, where you are retrieving a sequence stream, in the last four lines: > > my $stream = $dbh->get_Stream_by_query($query); > while( my $seq = $stream->next_seq ) { > $out->write_seq($seq); > } > > insert an iterator in the loop that indicates progress. Realize the sequence data is processed through Bio::SeqIO, so it won't be exactly the same as what is retrieved from GenBank, but it should be very close. > > If you want raw sequence, you can use Bio::DB::EUtilities, but it's a bit more complicated. > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org From dimitark at bii.a-star.edu.sg Mon Nov 29 21:54:50 2010 From: dimitark at bii.a-star.edu.sg (Dimitar Kenanov) Date: Tue, 30 Nov 2010 10:54:50 +0800 Subject: [Bioperl-l] genbank In-Reply-To: <4CF45F4C.7050102@bioperl.org> References: <4CF373DE.4070902@bii.a-star.edu.sg> <4CF45F4C.7050102@bioperl.org> Message-ID: <4CF4677A.5010604@bii.a-star.edu.sg> On 11/30/2010 10:19 AM, Jason Stajich wrote: > Dimitar - > > In terms of your question - a GenBank db query previously (ie 4-5 > years ago when this was written) WOULD NOT return a sequence if a > GenPept ID was specified so we had to have a separate module for > GenBank and GenPept db querying since there was a different set of > parameters - I think that changed so that most of the queries can run > through GenBank > > I see that must have been improved at NCBI. For the record if you > want the full GenPept record with features and annotations you just > request a different db, in this case 'gb' for genbank instead of the > fasta source > As in: http://gist.github.com/721012 > > But maybe you already figured it out? > > -jason > Chris Fields wrote, On 11/29/10 6:39 AM: >> On Nov 29, 2010, at 3:35 AM, Dimitar Kenanov wrote: >> >>> Hi again, >>> it seems that when i download (with 'download_query_genbank.pl') the >>> whole proteome from NCBI in fasta format it is first being >>> downloaded and from it is being created some kind of >>> SeqFastaSpeedFactory and after that from it is being copied to the >>> output file. But i want to download and write to output file one by >>> one so i can see the download progress(which is working for genbank >>> data). >>> >>> Its frustrating :) >>> >>> Any ideas where to look for solution >>> Cheers >>> Dimitar >> >> You can't do this with the default script, but you can use a modified >> version and, where you are retrieving a sequence stream, in the last >> four lines: >> >> my $stream = $dbh->get_Stream_by_query($query); >> while( my $seq = $stream->next_seq ) { >> $out->write_seq($seq); >> } >> >> insert an iterator in the loop that indicates progress. Realize the >> sequence data is processed through Bio::SeqIO, so it won't be exactly >> the same as what is retrieved from GenBank, but it should be very close. >> >> If you want raw sequence, you can use Bio::DB::EUtilities, but it's a >> bit more complicated. >> >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > Thank you Jason, i figured that out yes :) I know am wandering how to get the GI list so i can download by ID or ACC and not by query when i download fasta. I have the simple code: ------------- my $query_str='Caenorhabditis elegans[organism] AND refseq[filter]'; my $query=Bio::DB::Query::GenBank->new(-db=>'protein', -query=>$query_str); my $count=$query->count; my @ids=$query->ids; <------- error msg here ------------ but i get an error msg: MSG: Id list has been truncated even after maxids requested. How can i get the ID/ACC? Any idea? Thank you Dimitar -- Dimitar Kenanov Post doctoral fellow Bioinformatics Institute A*STAR Singapore tel: +65 6478 8514 email: dimitark at bii.a-star.edu.sg From dimitark at bii.a-star.edu.sg Mon Nov 29 20:39:07 2010 From: dimitark at bii.a-star.edu.sg (Dimitar Kenanov) Date: Tue, 30 Nov 2010 09:39:07 +0800 Subject: [Bioperl-l] Bioperl-l Digest, Vol 91, Issue 20 In-Reply-To: References: Message-ID: <4CF455BB.4060608@bii.a-star.edu.sg> On 11/30/2010 01:00 AM, bioperl-l-request at lists.open-bio.org wrote: > Send Bioperl-l mailing list submissions to > bioperl-l at lists.open-bio.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://lists.open-bio.org/mailman/listinfo/bioperl-l > or, via email, send a message with subject or body 'help' to > bioperl-l-request at lists.open-bio.org > > You can reach the person managing the list at > bioperl-l-owner at lists.open-bio.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Bioperl-l digest..." > > > Today's Topics: > > 1. genbank (Dimitar Kenanov) > 2. Re: genbank (Chris Fields) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Mon, 29 Nov 2010 17:35:26 +0800 > From: Dimitar Kenanov > Subject: [Bioperl-l] genbank > To: "'bioperl-l at bioperl.org'" > Message-ID:<4CF373DE.4070902 at bii.a-star.edu.sg> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > Hi again, > it seems that when i download (with 'download_query_genbank.pl') the > whole proteome from NCBI in fasta format it is first being downloaded > and from it is being created some kind of SeqFastaSpeedFactory and after > that from it is being copied to the output file. But i want to download > and write to output file one by one so i can see the download > progress(which is working for genbank data). > > Its frustrating :) > > Any ideas where to look for solution > Cheers > Dimitar > > > > > ------------------------------ > > Message: 2 > Date: Mon, 29 Nov 2010 08:39:16 -0600 > From: Chris Fields > Subject: Re: [Bioperl-l] genbank > To: Dimitar Kenanov > Cc: "'bioperl-l at bioperl.org'" > Message-ID: > Content-Type: text/plain; charset=us-ascii > > On Nov 29, 2010, at 3:35 AM, Dimitar Kenanov wrote: > > >> Hi again, >> it seems that when i download (with 'download_query_genbank.pl') the whole proteome from NCBI in fasta format it is first being downloaded and from it is being created some kind of SeqFastaSpeedFactory and after that from it is being copied to the output file. But i want to download and write to output file one by one so i can see the download progress(which is working for genbank data). >> >> Its frustrating :) >> >> Any ideas where to look for solution >> Cheers >> Dimitar >> > You can't do this with the default script, but you can use a modified version and, where you are retrieving a sequence stream, in the last four lines: > > my $stream = $dbh->get_Stream_by_query($query); > while( my $seq = $stream->next_seq ) { > $out->write_seq($seq); > } > > insert an iterator in the loop that indicates progress. Realize the sequence data is processed through Bio::SeqIO, so it won't be exactly the same as what is retrieved from GenBank, but it should be very close. > > If you want raw sequence, you can use Bio::DB::EUtilities, but it's a bit more complicated. > > chris > > > ------------------------------ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > End of Bioperl-l Digest, Vol 91, Issue 20 > ***************************************** > > Hi, thank you for the info. I already have inserted a progress bar(Term::ProgressBar) in the last four lines. The problem is that i see the progress at the end. I see directly 100%done. See the attached script. What i was reading in the modules underlying the script the way the stream is constructed it should be able to be read from while is being downloaded. But when i get fasta seqs with NCBI rettype=fasta it is not possible. -- Dimitar Kenanov Post doctoral fellow Bioinformatics Institute A*STAR Singapore tel: +65 6478 8514 email: dimitark at bii.a-star.edu.sg -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: download_query_genbank.pl URL: From dimitark at bii.a-star.edu.sg Mon Nov 29 20:50:42 2010 From: dimitark at bii.a-star.edu.sg (Dimitar Kenanov) Date: Tue, 30 Nov 2010 09:50:42 +0800 Subject: [Bioperl-l] genbank In-Reply-To: References: <4CF373DE.4070902@bii.a-star.edu.sg> Message-ID: <4CF45872.1060405@bii.a-star.edu.sg> On 11/29/2010 10:39 PM, Chris Fields wrote: > On Nov 29, 2010, at 3:35 AM, Dimitar Kenanov wrote: > > >> Hi again, >> it seems that when i download (with 'download_query_genbank.pl') the whole proteome from NCBI in fasta format it is first being downloaded and from it is being created some kind of SeqFastaSpeedFactory and after that from it is being copied to the output file. But i want to download and write to output file one by one so i can see the download progress(which is working for genbank data). >> >> Its frustrating :) >> >> Any ideas where to look for solution >> Cheers >> Dimitar >> > You can't do this with the default script, but you can use a modified version and, where you are retrieving a sequence stream, in the last four lines: > > my $stream = $dbh->get_Stream_by_query($query); > while( my $seq = $stream->next_seq ) { > $out->write_seq($seq); > } > > insert an iterator in the loop that indicates progress. Realize the sequence data is processed through Bio::SeqIO, so it won't be exactly the same as what is retrieved from GenBank, but it should be very close. > > If you want raw sequence, you can use Bio::DB::EUtilities, but it's a bit more complicated. > > chris > Hi, thank you for the info. I already have inserted a progress bar(Term::ProgressBar) in the last four lines. The problem is that i see the progress at the end. I see directly 100%done. See the attached script. What i was reading in the modules underlying the script the way the stream is constructed it should be able to be read from while is being downloaded. But when i get fasta seqs with NCBI rettype=fasta it is not possible. -- Dimitar Kenanov Post doctoral fellow Bioinformatics Institute A*STAR Singapore tel: +65 6478 8514 email: dimitark at bii.a-star.edu.sg -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: download_query_genbank.pl URL: From cjfields at illinois.edu Mon Nov 29 22:42:14 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 29 Nov 2010 21:42:14 -0600 Subject: [Bioperl-l] genbank In-Reply-To: <4CF4677A.5010604@bii.a-star.edu.sg> References: <4CF373DE.4070902@bii.a-star.edu.sg> <4CF45F4C.7050102@bioperl.org> <4CF4677A.5010604@bii.a-star.edu.sg> Message-ID: <696A04DF-8735-4630-B233-E0189B252CF9@illinois.edu> On Nov 29, 2010, at 8:54 PM, Dimitar Kenanov wrote: > On 11/30/2010 10:19 AM, Jason Stajich wrote: >> Dimitar - >> >> In terms of your question - a GenBank db query previously (ie 4-5 years ago when this was written) WOULD NOT return a sequence if a GenPept ID was specified so we had to have a separate module for GenBank and GenPept db querying since there was a different set of parameters - I think that changed so that most of the queries can run through GenBank >> >> I see that must have been improved at NCBI. For the record if you want the full GenPept record with features and annotations you just request a different db, in this case 'gb' for genbank instead of the fasta source >> As in: http://gist.github.com/721012 >> >> But maybe you already figured it out? >> >> -jason >> Chris Fields wrote, On 11/29/10 6:39 AM: >>> On Nov 29, 2010, at 3:35 AM, Dimitar Kenanov wrote: >>> >>>> Hi again, >>>> it seems that when i download (with 'download_query_genbank.pl') the whole proteome from NCBI in fasta format it is first being downloaded and from it is being created some kind of SeqFastaSpeedFactory and after that from it is being copied to the output file. But i want to download and write to output file one by one so i can see the download progress(which is working for genbank data). >>>> >>>> Its frustrating :) >>>> >>>> Any ideas where to look for solution >>>> Cheers >>>> Dimitar >>> >>> You can't do this with the default script, but you can use a modified version and, where you are retrieving a sequence stream, in the last four lines: >>> >>> my $stream = $dbh->get_Stream_by_query($query); >>> while( my $seq = $stream->next_seq ) { >>> $out->write_seq($seq); >>> } >>> >>> insert an iterator in the loop that indicates progress. Realize the sequence data is processed through Bio::SeqIO, so it won't be exactly the same as what is retrieved from GenBank, but it should be very close. >>> >>> If you want raw sequence, you can use Bio::DB::EUtilities, but it's a bit more complicated. >>> >>> chris >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > Thank you Jason, > i figured that out yes :) > I know am wandering how to get the GI list so i can download by ID or ACC and not by query when i download fasta. I have the simple code: > ------------- > my $query_str='Caenorhabditis elegans[organism] AND refseq[filter]'; > my $query=Bio::DB::Query::GenBank->new(-db=>'protein', > -query=>$query_str); > > my $count=$query->count; > my @ids=$query->ids; <------- error msg here > ------------ > but i get an error msg: > > MSG: Id list has been truncated even after maxids requested. > > How can i get the ID/ACC? Any idea? > > Thank you > Dimitar Dimitar, You are retrieving a huge number of IDs; if you print out $count above, the total is 23906. Set -maxids to raise the returned default maximum number of IDs higher: ---------------------------------------------- use Bio::DB::Query::GenBank; my $query_str='Caenorhabditis elegans[organism] AND refseq[filter]'; my $query=Bio::DB::Query::GenBank->new(-maxids => 40000, -db=>'protein', -query=>$query_str); my $count=$query->count; say "Count: $count"; my @ids=$query->ids; say scalar(@ids); # equal to $count ---------------------------------------------- Realize, though, you must submit these in batches of ~300 if retrieving sequences (IIRC, Bio::DB::GenBank only uses GET instead of POST, so there is a URL length limit). Bio::DB::EUtilities can retrieve more, about ~3000 or so in a batch, when using POST. chris From dimitark at bii.a-star.edu.sg Tue Nov 30 04:04:30 2010 From: dimitark at bii.a-star.edu.sg (Dimitar Kenanov) Date: Tue, 30 Nov 2010 17:04:30 +0800 Subject: [Bioperl-l] genbank In-Reply-To: <696A04DF-8735-4630-B233-E0189B252CF9@illinois.edu> References: <4CF373DE.4070902@bii.a-star.edu.sg> <4CF45F4C.7050102@bioperl.org> <4CF4677A.5010604@bii.a-star.edu.sg> <696A04DF-8735-4630-B233-E0189B252CF9@illinois.edu> Message-ID: <4CF4BE1E.6090100@bii.a-star.edu.sg> On 11/30/2010 11:42 AM, Chris Fields wrote: > On Nov 29, 2010, at 8:54 PM, Dimitar Kenanov wrote: > > >> On 11/30/2010 10:19 AM, Jason Stajich wrote: >> >>> Dimitar - >>> >>> In terms of your question - a GenBank db query previously (ie 4-5 years ago when this was written) WOULD NOT return a sequence if a GenPept ID was specified so we had to have a separate module for GenBank and GenPept db querying since there was a different set of parameters - I think that changed so that most of the queries can run through GenBank >>> >>> I see that must have been improved at NCBI. For the record if you want the full GenPept record with features and annotations you just request a different db, in this case 'gb' for genbank instead of the fasta source >>> As in: http://gist.github.com/721012 >>> >>> But maybe you already figured it out? >>> >>> -jason >>> Chris Fields wrote, On 11/29/10 6:39 AM: >>> >>>> On Nov 29, 2010, at 3:35 AM, Dimitar Kenanov wrote: >>>> >>>> >>>>> Hi again, >>>>> it seems that when i download (with 'download_query_genbank.pl') the whole proteome from NCBI in fasta format it is first being downloaded and from it is being created some kind of SeqFastaSpeedFactory and after that from it is being copied to the output file. But i want to download and write to output file one by one so i can see the download progress(which is working for genbank data). >>>>> >>>>> Its frustrating :) >>>>> >>>>> Any ideas where to look for solution >>>>> Cheers >>>>> Dimitar >>>>> >>>> You can't do this with the default script, but you can use a modified version and, where you are retrieving a sequence stream, in the last four lines: >>>> >>>> my $stream = $dbh->get_Stream_by_query($query); >>>> while( my $seq = $stream->next_seq ) { >>>> $out->write_seq($seq); >>>> } >>>> >>>> insert an iterator in the loop that indicates progress. Realize the sequence data is processed through Bio::SeqIO, so it won't be exactly the same as what is retrieved from GenBank, but it should be very close. >>>> >>>> If you want raw sequence, you can use Bio::DB::EUtilities, but it's a bit more complicated. >>>> >>>> chris >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >> Thank you Jason, >> i figured that out yes :) >> I know am wandering how to get the GI list so i can download by ID or ACC and not by query when i download fasta. I have the simple code: >> ------------- >> my $query_str='Caenorhabditis elegans[organism] AND refseq[filter]'; >> my $query=Bio::DB::Query::GenBank->new(-db=>'protein', >> -query=>$query_str); >> >> my $count=$query->count; >> my @ids=$query->ids;<------- error msg here >> ------------ >> but i get an error msg: >> >> MSG: Id list has been truncated even after maxids requested. >> >> How can i get the ID/ACC? Any idea? >> >> Thank you >> Dimitar >> > Dimitar, > > You are retrieving a huge number of IDs; if you print out $count above, the total is 23906. Set -maxids to raise the returned default maximum number of IDs higher: > > ---------------------------------------------- > use Bio::DB::Query::GenBank; > > my $query_str='Caenorhabditis elegans[organism] AND refseq[filter]'; > my $query=Bio::DB::Query::GenBank->new(-maxids => 40000, > -db=>'protein', > -query=>$query_str); > > my $count=$query->count; > say "Count: $count"; > > my @ids=$query->ids; > say scalar(@ids); # equal to $count > ---------------------------------------------- > > Realize, though, you must submit these in batches of ~300 if retrieving sequences (IIRC, Bio::DB::GenBank only uses GET instead of POST, so there is a URL length limit). Bio::DB::EUtilities can retrieve more, about ~3000 or so in a batch, when using POST. > > chris > > > > Hi again, i managed to solve my problem. It may be dirty but it works the way i want :) I reworked the 'download_query_genbank.pl' (attached). Now i can get the seqs in full fasta for proteomes and genomes and the genpept report files for the proteomes. For DB handle i only use GenPept now cos it gives me stream which i can track with term::progressbar. For the output i use 2 cases: ----------------- while( my $seq = $stream->next_seq ) { #DIMITAR my($gi,$locus,$refnum,$desc,$seqstr); if($retformat eq 'fasta'){ <-------------------------| for the fasta as i want it check_progress($prgs,$seqnum,$count); $locus=$seq->display_id; $refnum=$seq->accession_number; $gi=$seq->primary_id; $desc=$seq->desc; $desc=~s/\.$//; $seqstr=$seq->seq; print $fhout ">gi\|$gi\|ref\|$refnum\|$locus $desc\n$seqstr\n"; }else{ check_progress($prgs,$seqnum,$count); $out->write_seq($seq); <--------------------------| for the genbank reports } $seqnum++; #DIMITAR # $out->write_seq($seq);#original } ------------------ Thank you for your help and time. Cheers Dimitar -- Dimitar Kenanov Post doctoral fellow Bioinformatics Institute A*STAR Singapore tel: +65 6478 8514 email: dimitark at bii.a-star.edu.sg -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: download_query_genbank.pl URL: From jason at bioperl.org Tue Nov 30 12:06:08 2010 From: jason at bioperl.org (Jason Stajich) Date: Tue, 30 Nov 2010 09:06:08 -0800 Subject: [Bioperl-l] genbank In-Reply-To: <4CF4BE1E.6090100@bii.a-star.edu.sg> References: <4CF373DE.4070902@bii.a-star.edu.sg> <4CF45F4C.7050102@bioperl.org> <4CF4677A.5010604@bii.a-star.edu.sg> <696A04DF-8735-4630-B233-E0189B252CF9@illinois.edu> <4CF4BE1E.6090100@bii.a-star.edu.sg> Message-ID: <4CF52F00.5000803@bioperl.org> great - the whole point of the scripts are as examples really, not that you need to send patches back to show everything that you modified, but that you modify to use modules and code to do whatever special thing you want. The hope is that the modules are flexible enough that you can write the script to accomplish your goal. BTW - the one thing you can't recover from the GBK version of the file is the source of the accession number -- you have hardcoded in 'ref' but it can be 'gb', 'emb', 'sp' etc this field isn't part of the genbank record unfortunately -- one can come up with a pattern based on knowledge of accession number formats but I don't know that anyone has really been that worried about that sort of thing to try and write something for it. >> > Hi again, > i managed to solve my problem. It may be dirty but it works the way i > want :) > I reworked the 'download_query_genbank.pl' (attached). Now i can get > the seqs in full fasta for proteomes and genomes and the genpept > report files for the proteomes. > For DB handle i only use GenPept now cos it gives me stream which i > can track with term::progressbar. > > For the output i use 2 cases: > ----------------- > while( my $seq = $stream->next_seq ) { > #DIMITAR > my($gi,$locus,$refnum,$desc,$seqstr); > if($retformat eq 'fasta'){ <-------------------------| for the > fasta as i want it > check_progress($prgs,$seqnum,$count); > $locus=$seq->display_id; > $refnum=$seq->accession_number; > $gi=$seq->primary_id; > $desc=$seq->desc; > $desc=~s/\.$//; > $seqstr=$seq->seq; > print $fhout ">gi\|$gi\|ref\|$refnum\|$locus $desc\n$seqstr\n"; > }else{ > check_progress($prgs,$seqnum,$count); > $out->write_seq($seq); <--------------------------| for the > genbank reports > } > $seqnum++; > #DIMITAR > > # $out->write_seq($seq);#original > > } > ------------------ > > Thank you for your help and time. > > Cheers > Dimitar > -- Jason Stajich jason at bioperl.org From chiragmatkarbioinfo at gmail.com Mon Nov 1 02:58:55 2010 From: chiragmatkarbioinfo at gmail.com (chirag matkar) Date: Mon, 1 Nov 2010 13:58:55 +0700 Subject: [Bioperl-l] how to download PDB files using Bioperl script In-Reply-To: References: <01f901cb7203$f66e4040$e34ac0c0$%yin@ucd.ie> Message-ID: Use Perl Mechanize Module to fetch pdb data in bulk Example , Each pdb file is saved in path http://www.rcsb.org/pdb/files/1HKB.pdb use WWW::Mechanize; use Storable; $url = 'http://www.rcsb.org/pdb/files/1HKB.pdb'; $m = WWW::Mechanize->new(); $m->get($url); $c = $m->content; print $c; Just create a filehandle to fetch pdb id from text file and create a new object for each id and loop it to fetch data On Wed, Oct 27, 2010 at 11:53 PM, Christopher Bottoms wrote: > Ashwani, > > Do you need to download the files once or does this need to be automated? > > If you just need to do it once, check out > http://www.rcsb.org/pdb/download/download.do for downloading multiple > files. > > If you need to automate the process, let me know and I'll help you > figure it out. The easiest way I can think of, which I used to do, is > downloading them from the ftp site. > > Sincerely, > > Christopher Bottoms > > On Fri, Oct 22, 2010 at 11:12 AM, Jun Yin wrote: >> Hi, Ashwani, >> >> I havenot found any module in BioPerl for downloading PDB files, though >> Bio::Structure::IO::pdb can parse PDB files. >> >> However, PDB provides RESTful service >> (http://www.rcsb.org/pdb/software/rest.do). You can write perl scripts to >> batch downloading the proteins. >> >> Cheers, >> Jun Yin >> Ph.D. student in U.C.D. >> >> Bioinformatics Laboratory >> Conway Institute >> University College Dublin >> >> >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org >> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of ashwani sharma >> Sent: Thursday, October 21, 2010 9:39 AM >> To: bioperl-l at lists.open-bio.org >> Subject: [Bioperl-l] how to download PDB files using Bioperl script >> >> Hi All, >> >> >> I have around 150 pdb file names and I need to download them from Protein >> Data Bank. I wonder if someone could tell me how to do it by using Bioperl. >> >> Thanks in advance. >> >> Regards, >> Ashwani >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> __________ Information from ESET Smart Security, version of virus signature >> database 5377 (20100818) __________ >> >> The message was checked by ESET Smart Security. >> >> http://www.eset.com >> >> >> >> >> __________ Information from ESET Smart Security, version of virus signature >> database 5377 (20100818) __________ >> >> The message was checked by ESET Smart Security. >> >> http://www.eset.com >> >> >> >> __________ Information from ESET Smart Security, version of virus signature >> database 5377 (20100818) __________ >> >> The message was checked by ESET Smart Security. >> >> http://www.eset.com >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Regards, Chirag Matkar From Russell.Smithies at agresearch.co.nz Tue Nov 2 22:11:36 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Wed, 3 Nov 2010 15:11:36 +1300 Subject: [Bioperl-l] how to download PDB files using Bioperl script In-Reply-To: References: <01f901cb7203$f66e4040$e34ac0c0$%yin@ucd.ie>